A Group Theoretic Model for Information
Abstract
In this paper we formalize the notions of information elements and information lattices, first proposed by Shannon. Exploiting this formalization, we identify a comprehensive parallelism between information lattices and subgroup lattices. Qualitatively, we demonstrate isomorphisms between information lattices and subgroup lattices. Quantitatively, we establish a decisive approximation relation between the entropy structures of information lattices and the logindex structures of the corresponding subgroup lattices. This approximation extends the approximation for joint entropies carried out previously by Chan and Yeung. As a consequence of our approximation result, we show that any continuous law holds in general for the entropies of information elements if and only if the same law holds in general for the logindices of subgroups. As an application, by constructing subgroup counterexamples we find surprisingly that common information, unlike joint information, obeys neither the submodularity nor the supermodularity law. We emphasize that the notion of information elements is conceptually significant—formalizing it helps to reveal the deep connection between information theory and group theory. The parallelism established in this paper admits an appealing groupaction explanation and provides useful insights into the intrinsic structure among information elements from a grouptheoretic perspective.
Information element, information lattice, group theory, lattice theory, subgroup lattice, information inequality, subgroup approximation, information law, submodularity, supermodularity, common information, joint information, entropy, fundamental region, isomorphism
I Introduction
Information theory was born with the celebrated entropy formula measuring the amount of information for the purpose of communication. However, a suitable mathematical model for information itself remained elusive over the last sixty years. It is reasonable to assume that information theorists have had certain intuitive conceptions of information, but in this paper we seek a mathematic model for such a conception. In particular, building on Shannon’s work [1], we formalize the notion of information elements to capture the syntactical essence of information, and identify information elements with algebras and samplespacepartitions. As we shall see in the following, by building such a mathematical model for information and identifying the lattice structure among information elements, the seemingly surprising connection between information theory and group theory, established by Chan and Yeung [2], is revealed via isomorphism relations between information lattices and subgroup lattices. Consequently, a fullyfledged and decisive approximation relation between the entropy structure of information lattices and the subgroupindex structure of corresponding subgroup lattices is obtained.
We first motivate our formal definition for the notion of information elements.
Ia Informationally Equivalent Random Variables
Recall the profound insight offered by Shannon [3] on the essence of communication: “the fundamental problem of communication is that of reproducing at one point exactly or approximately a message selected at another point.” Consider the following motivating example. Suppose a message, in English, is delivered from person A to person B. Then, the message is translated and delivered in German by person B to person C (perhaps because person C does not know English). Assuming the translation is faithful, person C should receive the message that person A intends to convey. Reflecting upon this example, we see that the message (information) assumes two different “representations” over the process of the entire communication—one in English and the other in German, but the message (information) itself remains the same. Similarly, coders (decoders), essential components of communication systems, perform the similar function of “translating” one representation of the same information to another one. This suggests that “information” itself should be defined in a translation invariant way. This “translationinvariant” quality is precisely how we seek to characterize information.
To introduce our formal definition for information elements to capture the essence of information itself, we note that information theory is built within the probabilistic framework, in which onetime information sources are usually modeled by random variables. Therefore, we start in the following with the concept of informational equivalence between random variables and develop the formal concept of information elements from first principles.
Recall that, given a probability space and a measurable space , a random variable is a measurable function from to . The set is usually called the state space of the random variable, and is a algebra on . The set is usually called the sample space; is a algebra on , usually called the event space; and denotes a probability measure on the measurable space .
To illustrate the idea of informational equivalence, consider a random variable and another random variable , where the function is bijective. Certainly, the two random variables and are technically different for they have different codomains. However, it is intuitively clear that that they are “equivalent” in some sense. In particular, one can infer the exact state of by observing that of , and vice versa. For this reason, we may say that the two random variables and carry the same piece of information. Note that the algebras induced by and coincide with each other. In fact, two random variables such that the state of one can be inferred from that of the other induce the same algebra. This leads to the following definition for information equivalence.
Definition 1
We say that two random variables and are informationally equivalent, denoted , if the algebras induced by and coincide.
It is easy to verify that the “beinginformationalequivalent” relation is an equivalence relation. The definition reflects our intuition, as demonstrate in the previous motivating examples, that two random variables carry the same piece information if and only if they induce the same algebra. This motivates the following definition for information elements to capture the syntactical essence of information itself.
Definition 2
An information element is an equivalence class of random variables with respect to the “beinginformationallyequivalent” relation.
We call the random variables in the equivalent class of an information element representing random variables of . Or, we say that a random variable represents .
We believe that our definition of information elements reflects exactly Shannon’s original intention [1]:
Thus we are led to define the actual information of a stochastic process as that which is common to all stochastic processes which may be obtained from the original by reversible encoding operations.
Intuitive (also informal) discussion on identifying “information” with algebras surfaces often in probability theory, martingale theory, and mathematical finance. In probability theory, see for example [4], the concept of conditional probability is usually introduced with discussion of treating the algebras conditioned on as the “partial information” available to “observers.” In martingale theory and mathematical finance, see for example [5, 6], filtrations—increasing sequences of algebras—are often interpreted as records of the information available over time.
IA1 A Few Observations
Proposition 1
If , then .
(Throughout the paper, we use to denote the entropy of random variable .)
The conserve to Proposition 1 fails—two random variables with a same entropy do not necessarily carry the same information. For example, consider two binary random variables , where and is uniform on . Suppose if and 1 otherwise, and if and 1 otherwise. Clearly, we have , but one can readily agree that and do not carry the same information. Therefore, the notion of “informationallyequivalent” is stronger than that of “identicallydistributed.”
On the other hand, we see that the notion of “informationallyequivalent” is weaker than that of “beingequal.”
Proposition 2
If , then .
The converse to Proposition 2 fails as well, since two informationally equivalent random variable and may have totally different state spaces, so that it does not even make sense to say .
As shown in the following proposition, the notion of “informational equivalence” characterizes a kind of state space invariant “equalness.”
Proposition 3
Two random variables and with state spaces and , respectively, are informationally equivalent if and only if there exists a onetoone correspondence such that .
Remark: Throughout the paper, we fix a probability space unless otherwise stated. For ease of presentation, we confine ourselves in the following to finite discrete random variables. However, most of the definitions and results can be applied to more general settings without significant difficulties.
IB Identifying Information Elements via algebras and SampleSpacePartitions
Since the algebras induced by informationally equivalent random variables are the same, we can unambiguously identify information elements with algebras. Moreover, because we deal with finite discrete random variables exclusively in this paper, we can afford to discuss algebras more explicitly as follows.
Recall that a partition of a set is a collection of disjoint subsets of such that . (Throughout the paper, we use the bracket notation to denote the generic index set .) The elements of a partition are usually called the parts of . It is well known that there is a natural onetoone correspondence between partitions of the sample space and the algebras—any given algebra of a sample space can be generated uniquely, via union operation, from the atomic events of the algebra, while the collection of the atomic events forms a partition of the sample space. For example, for a random variable , the atomic events of the algebra induced by are . For this reason, from now on, we shall identify an information element by either its algebra or its corresponding sample space partition.
It is well known that the number of distinct partitions of a set of size is the th Bell number and that the Stirling number of the second kind counts the number of ways to partition a set of elements into nonempty parts. These two numbers, crucial to the remarkable results obtained by Orlitsky et al. in [7], suggest a possibly interesting connection between the notion of information elements discussed in this paper and the “patterns” studied in [7].
IC Shannon’s Legacy
As we mentioned before, the notion of information elements was originally proposed by Shannon in [1]. In the same paper, Shannon also proposed a partial order for information elements and a lattice structure for collections of information elements. We follow Shannon and call such lattices information lattices in the following.
Abstracting the notion of information elements out of their representations—random variables—is a conceptual leap, analogous to the leap from the concrete calculation with matrices to the study of abstract vector spaces. To this end, we formalize both the ideas of information elements and information lattices. By identifying information elements with samplespacepartitions, we are equipped to establish a comprehensive parallelism between information lattices and subgroup lattices. Qualitatively, we demonstrate isomorphisms between information lattices and certain subgroup lattices. With such isomorphisms established, quantitatively, we establish an approximation for the entropy structure of information lattices, consisting of joint, common, and many other information elements, using the logindex structures of their counterpart subgroup lattices. Our approximation subsumes the approximation carried out only for joint information elements by Chan and Yeung [2]. Building on [2], the parallelism identified in this paper reveals an intimate connection between information theory and group theory and suggests that group theory may provide suitable mathematical language to describe and study laws of information.
The fullfledged parallelism between information lattices and subgroup lattices established in paper is one of our main contributions. With this intrinsic mathematical structure among multiple information elements being uncovered, we anticipate more systematic attacks on certain network information problems, where a better understanding of intricate internal structures among multiple information elements is in urgent need. Indeed, the ideas of information elements and information lattices were originally motivated by network communication problems—in [1], Shannon wrote:
The present note outlines a new approach to information theory which is aimed specifically at the analysis of certain communication problems in which there exist a number of sources simultaneously in operation.
and
Another more general problem is that of a communication system consisting of a large number of transmitting and receiving points with some type of interconnecting network between the various points. The problem here is to formulate the best system design whereby, in some sense, the best overall use of the available facilities is made.
It is not hard to see that Shannon was attempting to solve nowwellknown network coding capacity problems.
Certainly, we do not claim that all the ideas in this paper are our own. For example, as we pointed out previously, the notions of information elements and information lattices were proposed as early as the 1950s by Shannon [1]. However, this paper of Shannon’s is not well recognized, perhaps owing to the abstruseness of the ideas. Formalizing these ideas and connecting them to current research is one of the primary goals of this paper. For all other results and ideas that have been previously published, we separate them from those of our own by giving detailed references to their original sources.
ID Organization
The paper is organized as follows. In Section II, we introduce a “beingricherthan” partial order between information elements and study the information lattices induced by this partial order. In Section III, we formally establish isomorphisms between information lattices and subgroup lattices. Section IV is devoted to the quantitative aspects of information lattices. We show that the entropy structure of information lattices can be approximated by the logindex structure of their corresponding subgroup lattices. As a consequence of this approximation result, in Section V, we show that any continuous law holds for the entropies of common and joint information if and only if the same law holds for the logindices of subgroups. As an application of this result, we show a result, which is rather surprising, that unlike joint information neither the submodularity nor the supermodularity law holds for common information in general. We conclude the paper with a discussion in Section VI.
Ii Information Lattices
Iia “Beingricherthan” Partial Order
Recall that every information element can be identified with its corresponding samplespacepartition. Consider two samplespacepartitions and . We say that is finer than , or is coarser than , if each part of is contained in some part of .
Definition 3
For two information elements and , we say that is richer than , or is poorer than , if the samplespacepartition of is finer than that of . In this case, we write .
It is easy to verify that the above defined “beingricherthan” relation is a partial order.
We have the following immediate observations:
Proposition 4
if and only if .
As a corollary to the above proposition, we have
Proposition 5
If , then .
The converse of Proposition 5 does not hold in general.
With respect to representative random variables of information elements, we have
Proposition 6
Suppose random variables and represent information elements and respectively. Then, if and only if for some function .
The “beingricherthan” relation is very important to information theory, because it characterizes the only universal informationtheoretic constraint put on all deterministic coders (decoders)—the input information element of any coder is always richer than the output information element. For example, partially via this principle, Yan et al. recently characterized the capacity region of general acyclic multisource multisink networks [9]. Harvey et al. [10] obtained an improved computable outer bound for general network coding capacity regions by applying this same principle under a different name called information dominance—the authors of the paper acknowledged: “…information dominance plays a key role in our investigation of network capacity.”
IiB Information Lattices
Recall that a lattice is a set endowed with a partial order in which any two elements have a unique supremum and a unique infimum with respect to the partial order. Conventionally, the supremum of two lattice elements and is also called the join of and ; the infimum is also called the meet. In our case, with respect to the “beingricherthan” partial order, the supremum of two information elements and , denoted , is the poorest among all the information elements that are richer than both and . Conversely, the infimum of and , denoted , is the richest among all the information elements that are poorer than both and . In the following, we also use to denote the join of and , and the meet.
Definition 4
An information lattice is a set of information elements that is closed under the join and meet operations.
Recall the onetoone correspondence between information elements and samplespacepartitions. Consequently, each information lattice corresponds to a partition lattice (with respect to the “beingfinerthan” partial order on partitions), and vice versa. This formally confirms the assertions made in [1]: “they (information lattices) are at least as general as the class of finite partition lattices.”
Since the collection of information lattices could be as general as that of partition lattices, we should not expect any special lattice properties to hold generally for all information lattices, because it is wellknown that any finite lattice can be embedded in a finite partition lattice [11]. Therefore, it is not surprising to learn that information lattices are in general not distributive, not even modular.
IiC Joint Information Element
The join of two information elements is straightforward. Consider two information elements and represented respectively by two random variables and . It is easy to check that the joint random variable represents the join . For this reason, we also call (or ) the joint information element of and . It is worth pointing out that the joint random variable represents equally well.
IiD Common Information Element
In [1], the meet of two information elements is called common information. More than twenties years later, the same notion of common information was independently proposed and first studied in detail by Gács and Körner [12]. For the first time, it was demonstrated that common information could be far less than mutual information. (“Mutual information” is rather a misnomer because it does not correspond naturally to any information element [12].) Unlike the case of joint information elements, characterizing common information element via their representing random variables is much more complicated. See [12, 13] for details.
In contrast to the allfamiliar joint information, common information receives far less attention. Nonetheless, it has been shown to be important to cryptography [14, 15, 16, 17], indispensable for characterizing of the capacity region of multiaccess channels with correlated sources [18], useful in studying information inequalities [19, 20], and relevant to network coding problems [21].
IiE Previously Studied Lattices in Information Theory
Iii Isomorphisms between Information Lattices and Subgroup Lattices
In this section, we discuss the qualitative aspects of the parallelism between information lattices generated from sets of information elements and subgroup lattices generated from sets of subgroups. In particularly, we establish isomorphism relations between them.
Iiia Information Lattices Generated by Information Element Sets
It is easy to verify that both the binary operations “” and “” are associative and commutative. Thus, we can readily extend them to cases of more than two information elements. Accordingly, for a given set of information elements, we denote the joint information element of the subset , , of information elements by and the common information element by .
Definition 5
Given a set of information elements, the information lattice generated by , denoted , is the smallest information lattice that contains . We call the generating set of the lattice .
It is easy to see that each information element in can be obtained from the information elements in the generating set via a sequence of join and meet operations. Note that the set of information elements forms a meet semilattice and the set forms a join semilattice. However, the union of these two semilattices does not necessarily form a lattice. To see this, consider the following example constructed with partitions (since partitions are in onetoone correspondence with information elements). Let be a collection of partitions on the set where , , , and . See Figure 1 for the Hasse diagram of the lattice generated by the collection . It is easy to see , but . Similarly, we have .
IiiB Subgroup Lattices
Consider the binary operations on subgroups—intersection and union. We know that the intersection of two subgroups is again a subgroup. However, the union does not necessarily form a subgroup. Therefore, we consider the subgroup generated from the union , denoted (or ). Similar to the case of information elements, the intersection and “” operations on subgroups are both associative and commutative. Therefore, we readily extend the two operations to the cases with more than two subgroups and, accordingly, denote the intersection of a set of subgroups by and the subgroup generated from the union by . It is easy to verify that the subgroups and are the infimum and the supremum of the set with respect to the “beingasubgroupof” partial order. For notation consistency, we also use “” to denote the intersection operation.
Note that, to keep the notation simple, we “overload” the symbols “” and “” for both the join and the meet operations with information elements and the intersection and the “uniongenerating” operations with subgroups. Their actual meaning should be clear within context.
Definition 6
A subgroup lattice is a set of subgroups that is closed under the and operations.
For example, the set of all the subgroups of a group forms a lattice.
Similar to the case of information lattices generated by sets of information elements, we consider in the following subgroup lattices generated by a set of subgroups.
Definition 7
Given a set of subgroups, the subgroup lattice generated by , denoted , is the smallest lattices that contains . We call the generating set of .
Note that the set forms a semilattice under the meet operation and the set forms a semilattice under the join operation. However, as in the case of information lattices, the union of the two semilattices does not necessarily form a lattice.
In the remainder of this section, we relate information lattices generated by sets of information elements and subgroup lattices generated by collections of subgroups and demonstrate isomorphism relations between them. For ease of presentation, as a special case we first introduce an isomorphism between information lattices generated by sets of cosetpartition information elements and their corresponding subgroup lattices.
IiiC Special Isomorphism Theorem
We endow the sample space with a group structure—the sample space in question is taken to be a group . For any subgroup of , by Lagarange’s theorem [25], the collection of its cosets forms a partition of . Certainly, the cosetpartition, as a samplespacepartition, uniquely defines an information element. A collection of subgroups of , in the same spirit, identifies a set of information elements via this subgroup–cosetpartition correspondence.
Remark: throughout the paper, groups are taken to be multiplicative, and cosets are taken to be right cosets.
It is clear that, by our construction, the information elements in and the subgroups in are in onetoone correspondence via the subgroup–cosetpartition relation. It turns out that the information elements on the entire information lattice and the subgroups on the subgroup lattice are in onetoone correspondence as well via the same subgroup–cosetpartition relation. In other words, both the join and meet operations on information lattices are faithfully “mirrored” by the join and meet operations on subgroup lattices.
Theorem 1
(Special Isomorphism Theorem) Given a set of subgroups, the subgroup lattice is isomorphic to the information lattice generated by the set of information elements, where , , are accordingly identified via the cosetpartitions of the subgroups , .
The theorem is shown by demonstrating a mapping, from the subgroup lattice to the information lattice , such that it is a latticemorphism, i.e., it honors both join and meet operations, and is bijective as well. Naturally, the mapping assigning to each subgroup the information element identified by the cosetpartition of the subgroup is such a morphism. Since this theorem and its general version, Theorem 2, are crucial to our later results—Theorems 3 and 5—and certain aspects of the reasoning are novel, we include a detailed proof for it in Appendix A.
IiiD General Isomorphism Theorem
The information lattices considered in Section IIIC is rather limited—by Lagrange’s theorem, cosetpartitions are all equal partitions. In this subsection, we consider arbitrary information lattices—we do not require the sample space to be a group. Instead, we treat a general samplespacepartition as an orbitpartition resulting from some groupaction on the sample space.
IiiD1 GroupActions and Permutation Groups
Definition 8
Given a group and a set , a groupaction of on is a function , , , that satisfies the following two conditions:

for all and ;

for all , where is the identity of .
We write to denote the groupaction.
Now, we turn to the notions of orbits and orbitpartitions. We shall see that every groupaction induces unambiguously an equivalence relation as follows. We say that and are connected under a groupaction if there exists a such that . We write . It is easy to check that this “beingconnected” relation is an equivalence relation on . By the fundamental theorem of equivalence relations, it defines a partition on .
Definition 9
Given a groupaction , we call the equivalence classes with respect to the equivalence relation , or the parts of the induced partition of , the orbits of the groupaction. Accordingly, we call the induced partition the orbitpartition of .
IiiD2 SampleSpacePartition as OrbitPartition
In fact, starting with a partition of a set , we can go in the other direction and unambiguously define a group action such that the orbitpartition of is exactly the given partition . To see this, note the following salient feature of groupactions: For any given groupaction , associated with every element in the group is a mapping from to itself and any such mappings must be bijective. This feature is the direct consequence of the group axioms. To see this, note that every group element has a unique inverse . According to the first defining property of groupactions, we have for all . This requires that the mappings associated with and to be invertible. Clearly, the identity of the group corresponds to the identity map from to .
With the observation that under groupaction every group element corresponds to a permutation of , we can treat every group as a collection of permutations that is closed under permutation composition. Specifically, for a given partition of a set , it is easy to check that all the permutations of that permute the elements of the parts of only to the elements of the same parts form a group. These permutations altogether form the socalled permutation representation of (with respect to ). For this reason in the following, without loss of generality, we treat all groups as permutation groups. We denote by the permutation group corresponding as above to a partition — acts naturally on the set by permutation, and the orbit partition of is exactly .
From group theory, we know that this orbitpartition–permutationgroupaction relation is a onetoone correspondence. Since every information element corresponds definitively to a samplespacepartition, we can identify every information element by a permutation group. Given a set of information elements, denote the set of the corresponding permutation groups by . Note that all the permutations in the permutation groups , , are permutations of the same set, namely the sample space. Hence, all the permutation groups , , are subgroups of the symmetric group , which has order . Therefore, it makes sense to take intersection and union of groups from the collection .
IiiD3 From CosetPartition to OrbitPartition—From Equal Partition to General Partition
In fact, the previously studied cosetpartitions are a special kind of orbitpartitions. They are orbitpartitions of groupactions defined by the native group multiplication. Specifically, given a subgroup of , a groupaction is defined such that for all and , where “” denotes the native binary operation of the group . The orbitpartition of such a groupaction is exactly the cosetpartition of the subgroup . Therefore, by taking a different kind of groupaction—permutation rather than group multiplication—we are freed from the “equalpartition” restriction so that we can correspond arbitrary information elements identified with arbitrary samplespacepartitions to subgroups. It turns out information lattices generated by sets of information elements and subgroup lattices generated by the corresponding sets of permutation groups remain isomorphic to each other. Thus, the isomorphism relation between information lattices and subgroup lattices holds in full generality.
IiiD4 Isomorphism Relation Remains Between Information Lattices and Subgroup Lattices
Similar to Section IIIC, we consider a set of information element. Unlike in Section IIIC, the information elements , considered here are arbitrary. As we discussed in the above, with each information element we associate a permutation group according to the orbitpartition–permutationgroupaction correspondence. Denote the set of corresponding permutation groups by .
Theorem 2
(General Isomorphism Theorem) The information lattice is isomorphic to the subgroup lattice .
Iv An Approximation Theorem
From this section on, we shift our focus to the quantitative aspects of the parallelism between information lattices and subgroup lattices. In the previous section, by generalizing from cosetpartitions to orbitpartitions, we successfully established an isomorphism between general information lattices and subgroup lattices. In this section, we shall see that not only is the qualitative structure preserved, but also the quantitative structure—the entropy structure of information lattices—is essentially captured by their isomorphic subgroup lattices.
Iva Entropies of Cosetpartition Information Elements
We start with a simple and straightforward observation for the entropies of cosetpartition information elements on information lattices.
Proposition 7
Let be a set of subgroups of group and be the set of corresponding cosetpartition information elements. The entropies of the joint and common information elements on the information lattice, generated from , can be calculated from the subgrouplattice, generated from , as follows
(1) 
and
(2) 
Note that the right hand sides of both Equation (1) and (2) are the logarithms of the indices of subgroups. In the following, we shall call them, in short, logindices.
Proposition 7 establishes a quantitative relation between the entropies of the information elements on cosetpartition information lattices and the logindices of the subgroups on the isomorphic subgroup lattices. This quantitative relation is exact. However, the scope of Proposition 7 is rather restrictive—it applies only to certain special kind of “uniform” information elements, because, by Lagrange’s theorem, all cosetpartitions are equal partitions.
In Section III, by generalizing from cosetpartitions to orbitpartitions we successfully removed the “uniformness” restriction imposed by the cosetpartition structure. At the same time, we established a new isomorphism relation, namely orbitpartition–permutationgroupaction correspondence, between information lattices and subgroup lattices. It turns out that this generalization maintains an “rough” version of the quantitative relation established in Proposition 7 between the entropies of information lattices and the logindices of their isomorphic permutationsubgroup lattices. As we shall see in the next section, the entropies of the information elements on information lattices can be approximated, up to arbitrary precision, by the logindices of the permutation groups on their isomorphic subgroup lattices.
IvB Subgroup Approximation Theorem
To discuss the approximation formally, we introduce two definitions as follows.
Definition 10
Given an information lattice generated from a set of information elements, we call the real vector
whose components are the entropies of the information elements on the information lattice generated by , listed according to a certain prescribed order, the entropy vector of , denoted .
The entropy vector captures the informational structure among the information elements of .
Definition 11
Given a subgroup lattice generated from a set of subgroups of a group , we call the real vector
whose components are the normalized logindices of the subgroups on the subgroup lattice generated by , listed according to a certain prescribed order, the normalized logindex vector of , denoted .
In the following, we assume that and are accordingly aligned.
Theorem 3
Let be a set of information elements. For any there exists an and a set of subgroups of the symmetry group of order such that
(3) 
where “” denotes the norm of real vectors.
Theorem 3 subsumes the approximation carried out by Chan and Yeung in [2], which is limited to joint entropies. The approximation procedure we carried out to prove Theorem 3 is similar to that of Chan and Yeung [2]—both use Stirling’s approximation formula for factorials. But, with the groupaction relation between information elements and permutation groups being exposed, and the isomorphism between information lattices and subgroup lattices being revealed, the approximation procedure becomes transparent and the seemingly surprising connection between information theory and group theory becomes mathematically natural. For these reasons, we included a detailed proof in Appendix B.
V Parallelism between Continuous Laws of Information Elements and those of Subgroups
As a consequence of Theorem 3, we shall see in the following that if a continuous law holds in general for information elements, then the same law must hold for the logindices of subgroups, and vice versa.
In the following, for reference and comparison purposes, we first review the known laws concerning the entropies of joint and common information elements. These laws, usually expressed in the form of information inequalities, are deemed to be fundamental to information theory [26].
Va Laws for Information Elements
VA1 NonNegativity of Entropy
Proposition 8
For any information element , we have .
VA2 Laws for Joint Information
Proposition 9
Given a set of information elements, if , , then
Proposition 10
For any two sets of information elements and , the following inequality holds:
This proposition is mathematically equivalent to the following one.
Proposition 11
For any three information elements , , and , the following inequality holds:
Note that .
Proposition 10 (or equivalently 11) is usually called the submodularity law for entropy function. Proposition 8, 9, and 10 are known, collectively, as the polymatroidal axioms [27, 28]. Up until very recently, these are the only known laws for entropies of joint information elements.
In 1998, Zhang and Yeung discovered a new information inequality, involving four information elements [28].
Proposition 12
(ZhangYeung Inequality) For any four information elements , , and , the following inequality holds:
(4) 
This newly discovered inequality, classified as a nonShannon type information inequality [26], proved that our understanding on laws governing the quantitative relations between information elements is incomplete. Recently, six more new fourvariable information inequalities were discovered by Dougherty et al. [29].
Information inequalities such as those presented above were called “laws of information” [26, 30]. Seeking new information inequalities is currently an active research topic [28, 19, 31, 32]. In fact, they should be more accurately called “laws of joint information”, since these inequalities involves only joint information only. We shall see below laws involving common information.
VA3 Common Information v.s. Mutual Information
In contrast to joint information, little research has been done to laws involving common information. So far, the only known nontrivial law involving both joint information and common information is stated in the following proposition, discovered by Gács and Körner [12].
Proposition 13
For any two information element and , the following inequality holds:
Note that and .
VA4 Laws for Common Information
Dual to the nondecreasing property of joint information, it is immediately clear that entropies of common information are nonincreasing.
Proposition 14
Given a set of information elements, if , then
Comparing to the case of joint information, one may naturally expect, as a dual counterpart of the submodularity law of joint information, a supermodularity law to hold for common information. In other words, we have the following conjecture.
Conjecture 1
For any three information elements , , and , the following inequality holds:
(5) 
We see this conjecture as natural because of the intrinsic duality between the join and meet operations of information lattices. Due to the combinatorial nature of common information [12], it is not obvious whether the conjecture holds. With the help of our approximation results established in Theorem 3 and 5, we find, surprisingly, that neither the conjecture nor its converse holds. In other words, common information observes neither the submodularity nor the supermodularity law.
VB Continuous Laws for Joint and Common Information
As a consequence of Theorem 3, we shall see in the following that if a continuous law holds for information elements, then the same law must hold for the logindices of subgroups, and vice versa. To convey this idea, we first present the simpler case involving only joint and common information elements. To state our result formally, we first introduce two definitions.
Definition 12
Given a set of information elements, consider the collection of join and meet information elements generated from . We call the real vector
whose components are the entropies of the information elements of , the entropy vector of , denoted by .
Definition 13
Given a set of subgroups of a group , consider the set of the subgroups generated from . We call the real vector
whose components are the normalized logindices of the subgroups in , the normalized logindex vector of , denoted by .
In this context, we assume that the components of both and are listed according to a common fixed order. Moreover, we note that both the vectors and have dimension .
Theorem 4
Let be a continuous function. Then, holds for all sets of information elements if and only if holds for all sets of subgroups of any group.
Theorem 4 and its generalization—Theorem 5—extend the result obtained by Chan and Yeung in [2] in the following two ways. First, Theorem 4 and 5 apply to all continuous laws, while only linear laws were considered in [2]. Even though so far we have not yet encountered any nonlinear law for entropies, it is highly plausible that nonlinear information laws may exist given the recent discovery that at least certain part of the boundary of the entropy cones involving at least four information elements are curved [33]. Second, our theorems encompass both common information and joint information, while only joint entropies were considered in [2]. For example, laws such as Propositions 13 and 14 cannot even be expressed in the setting of [2]. In fact, as we shall see later in Section VD, the laws of common information depart from those of joint information very early—unlike joint information, which obeys the submodularity law, common information admits neither submodularity nor supermodularity. For these reasons, we believe that our extending the subgroup approximation to common information is of interest in its own right.
VC Continuous Laws for General Lattice Information Elements
In this section, we extend Theorem 4 to all the information elements in information lattices, not limited to the “pure” joint and common information elements. In the following, we introduce some necessary machinery to formally present the result in full generality.
Note that an element from the lattice generated from a set has its expression built from the generating elements of the lattice in the similar way that terms are built from literals in mathematical logic. In particular, we define latticeterms as follows:
Definition 14
An expression is called a latticeterm formed from a set of literals if either is a literal from or is formed from two latticeterms with either the join or the meet symbols: , where and are latticeterms and is either the join symbol or the meet symbol .
Definition 15
Suppose that , , are latticeterms generated from a literal set of size : . We call an expression of the form
where represents a function from to and represents the entropy function, an variable generalized information expression.
We evaluate an variable generalized information expression against a set of information elements by substituting with respectively, calculating the entropy of the information elements obtained by evaluating the latticeterms according to the semantics of the join and meet operations on information elements, and then obtaining the corresponding function value. We denote this value by
Definition 16
If an variable generalized information expression is evaluated nonnegatively for any set of information elements, i.e.,
then we call
an variable information law.
Similar to generalized information expressions, we define generalized logindex expression as follows.
Definition 17
we call an expression of the form
where represents a function from to and represents the normalized logindex function of subgroups, an variable generalized logindex expression.
We evaluate an variable generalized logindex expression against a set of subgroups of a group by substituting with respectively, calculating the logindex of the subgroups obtained by evaluating the latticeterms according to the semantics of the join and meet operations on subgroups, and then obtaining the corresponding function value. We denote this value by
Definition 18
If an variable generalized logindex expression is evaluated nonnegatively for any set of subgroups of any group, i.e.,
then we call
an variable subgroup logindex law.
With the above formalism and corresponding notations, we are ready to state our equivalence result concerning the generalized information laws.
Theorem 5
Suppose that is continuous. Then an variable information law
holds if and only if the corresponding variable subgroup logindex law
holds.
To see one direction, namely that implies that , assume that there exists a set of information elements such that for some . By the continuity of the function and Theorem 3, we are guaranteed to be able to construct, from the information lattice generated from , some subgroup lattice such that the value of the function at the normalized logindices of the correspondingly constructed subgroups is arbitrarily close to . This contradicts the assumption that holds for all sets of subgroups of any group.
On the other hand, for any normalized logindices of the subgroups from subgroup lattices, it can be readily interpreted as the entropies of information elements by taking permutation representation for the subgroups on the subgroup lattice and then producing an information lattice, according to the orbitpartition–permutationgroupaction correspondence. Therefore, that holds for all sets implies that holds for all sets .
VD Common Information Observes Neither Submodularity Nor Supermodularity Laws
As discussed in the above, appealing to the duality between the join and the meet operations, one might conjecture, dual to the wellknown submodularity of joint information, that common information would observe the supermodularity law. It turns out that common information observes neither the submodularity (6) nor the supermodularity (7) law—neither of the following two inequalities holds in general:
(6)  
(7) 
Because common information is combinatorial in flavor—it depends on the “zero pattern” of joint probability matrices [12]—it is hard to directly verify the validity of (6) and (7). However, thanks to Theorem 5, we are able to construct subgroup counterexamples to invalidate (6) and (7) indirectly.
To show that (7) fails, it suffices to find three subgroups , and such that
(8) 
Consider , the symmetry group of order , and its subgroups , , and . The subgroup is the permutation group generated by permutation , by , and by . (Here, we use the standard cycle notation to represent permutations.) Consequently, we have , , and . It is easy to see that both and are dihedral groups of order 10 and that is the alternative group , hence of order . The order of is 2. Therefore, we see that the subgroups , , and satisfy (8). By Theorem 5, the supermodularity law (7) does not hold in general for common information. (Thank to Professor Eric Moorhouse for contributing this counterexample.)
Vi Discussion
This paper builds on some of Shannon’s littlerecognized legacy and adopts his interesting concepts of information elements and information lattices. We formalize all these concepts and clarify the relations between random variables and information elements, information elements and algebras, and, especially, the onetoone correspondence between information elements and samplespacepartitions. We emphasize that such formalization is conceptually significant. As demonstrated in this paper, beneficial to the formalization carried out, we are able to establish a comprehensive parallelism between information lattices and subgroup lattices. This parallelism is mathematically natural and admits intuitive groupaction explanations. It reveals an intimate connection, both structural and quantitative, between information theory and group theory. This suggests that group theory might serve a promising role as a suitable mathematical language in studying deep laws governing information.
Network information theory in general, and capacity problems for network coding specifically, depend crucially on our understanding of intricate structures among multiple information elements. By building a bridge from information theory to group theory, we can now access the set of welldeveloped tools from group theory. These tools can be brought to bear on certain formidable problems in areas such as network information theory and network coding. Along these lines, by constructing subgroup counterexamples we show that neither the submodularity nor the supermodularity law holds for common information, neither of which is obvious from traditional information theoretic perspectives.
Appendix A Proof of Theorem 1
{proof}To show two lattices are isomorphic, we need to demonstrate a mapping, from one lattice to the other, such that it is a latticemorphism—it honors both join and meet operations—and bijective as well. Instead of proving that is isomorphic to directly, we show that the dual of is isomorphic to . Figuratively speaking, the dual of a lattice is the lattice obtained by flipping upside down. Formally, the dual lattice of a lattice is the lattice defined on the same set with the partial order reversed. Accordingly, the join operation of the prime lattice corresponds to the meet operation for the dual lattice and the meet operation of to the join operation for . In the other words, we show that is isomorphic to by demonstrating a bijective mapping such that
(9) 
and
(10) 
hold for all .
Note that each subgroups on the subgroup lattice is obtained from the set via a sequence of join and meet operations and each information element on the information lattice is obtained similarly from the set . Therefore, to show that is isomorphic to , according to the induction principle, it is enough to demonstrate a bijective mapping such that

, for all and ;

For any , if and , then
(11) (12)
Naturally, we take to be the mapping that assigns to each subgroup the information element identified by the cosetpartition of the subgroup . Thus, the initial step of the induction holds by assumption. On the other hand, it is easy to see that the mapping so defined is bijective simply because different subgroups always produce different cosetpartitions and vice versa. Therefore, we are left to show that Equation (11) and (12) holds.
We first show that satisfies Equation (11). In other words, we show that the cosetpartition of the intersection subgroup is the coarsest among all the samplespacepartitions that are finer than both the cosetpartitions of and . To see this, let be a samplespacepartition that is finer than both the cosetpartitions of and and be a part of . Since is finer than the cosetpartitions of , must be contained in some coset of . For the same reason, must be contained in some coset of as well. Consequently, hold. Realizing that is a coset of , we conclude that the cosetpartition of is coarser than . Since is chosen arbitrary, this proves that the cosetpartition of the intersection subgroup is the coarsest among all the sample space partitions that are finer than both the cosetpartitions of and . Therefore, Equation (11) holds for .
The proof for Equation (12) is more complicated. We use an idea called “transitive closure”. Similarly, we need to show that the cosetpartition of the subgroup generated from the union of and is the finest among all the samplespacepartitions that are coarser than both the cosetpartitions of and . Let be a samplespacepartition that is coarser than both the cosetpartitions of and . Denote the coset partition of the subgroup by . Let be a part of . It suffices to show that is contained in some part of . Pick an element from . This element must belong to some part of . It remains to show . In other words, we need to show that for any . Note that is a part of the cosetpartition of the subgroup . In other words, is a coset of . The following reasoning depends on the following fact from group theory [25].
Proposition 15
Two elements and belong to a same (right) coset of a subgroup if and only if belongs to the subgroup.
Since and belong to a same coset of the subgroup , we have . Note that any element from can be written in the form of where and for all . Suppose . We have
In the following we shall show that belongs to by induction on the sequence .
First, we claim . To see this, note that . Since , by Proposition 15, we know that and belong to a same coset of . By assumption, the partition is coarser than the cosetpartition of , the coset must be contained in , since it already contains an element of .
For the same reason, with showed, we can see that belongs to as well, because implies and belong to a same coset of .
Continuing the above argument inductively on the sequence , we can finally have . Therefore, we have . This concludes the proof.
Appendix B Proof of Theorem 3
{proof}The approximation process is decomposed into three steps. The first step is to “dilate” the sample space such that we can turn a nonuniform probability space into a uniform probability space. The sample space partitions of the information elements are accordingly “dilated” as well. After dilating the sample space, depending on the approximation error tolerance, i.e., , we may need to further “amplify” the sample space. Then, we follow the same procedure as in Section IIID and construct a subgroup lattice using the orbitpartition–permutationgroupaction correspondence.
We assume the probability measure on the sample space are rational. In other words, the probabilities of the elementary event , are all rational numbers, namely for some . This assumption is reasonable, because any finite dimensional real vector can be approximated, up to an arbitrary precision, by some rational vector.
Let be the least common multiple of the set of denominators. We “split” each sample point in into points. Note that is integral. We need to accordingly “dilate” the sample space partitions of the information elements. Specifically, for each part of the partition of every information element , its “dilated” partition , in the dilated sample space , contains exactly all the sample points that are “split” from the sample points in . The dilated sample space has size of . To maintain the probability structure, we assign to each sample point in the dilated sample space probability . In other words, we equip the dilated sample space with a uniform probability measure. It is easy to check that the entire (quantitative) probability structure remains the same. Thus, we can consider all the information elements as if defined on the dilated probability space.
If necessary, depending on the approximation error tolerance , we may further “amplify” the dilated sample space by times by “splitting” each of its sample points into to points. At the same time, we scale the probability of each sample point in the postamplification sample space down by times to . By abusing of notation, we still use to denote the postamplification sample space. Similar to the “dilating” process, all the partitions are accordingly amplified.
Before we move to the third step, we compute entropies for information elements in terms of the cardinality of the parts of its dilated sample space partition. Consider an information element . Denote its predilation sample space partition by and its postamplification sample space partition by . It is easy to see that the entropy can be calculated as follows:
(13) 
All the entropies of the other information elements, including the joint and common information elements, on the entire information lattices can be computed in the exactly same way in terms of the cardinalities of the parts of their dilated sample space partitions.
In the third step, we follow the same procedure as in Section IIID, and construct, based on the orbitpartition–permutationgroupaction correspondence, a subgroup lattice that isomorphic to the information lattice generated by the set of information elements . More specifically, the subgroup lattice is constructed according to their “postamplification” sample space partitions.
Suppose, on the constructed subgroup lattice, the permutation groups corresponds to the information element . As in the above, the “postamplification” sample space partition of is . Then, the cardinality of the permutation group is simply
According to the isomorphism relation established in Theorem 2, the above calculations remain valid for all the subgroups on the subgroup lattices.
Recall that all the groups on the subgroup lattice are permutation groups and are all subgroups of the symmetry group of order . So the logindex of , corresponding to , is
(14) 
As we see from Equation (1) and (2) of Proposition 7, the entropies of the cosetpartition information elements on information lattices equal exactly the logindices of their subgroups on subgroup lattices. However, for the information lattice generated from general information elements, namely information elements with nonequal sample space partitions, as we see from Equation (13) and (14), the entropies of the information elements on the information lattice does not equal the logindices of their corresponding permutation groups on the subgroup lattices exactly any more. But, as we can shall see, the entropies of the information elements are well approximated by the logindices of their corresponding permutation groups. Recall the following Stirling’s approximation formula for factorials:
(15) 
“Normalizing” the logindex in Equation (14) by a factor and then substituting the factorials with the above Stirling approximation formula, we get
Note that in the above substitution process, we combined some finite terms “into” one term.
It is clear that , since forms a partition of . Therefore, we get