Partition Reduction for Lossy Data Compression Problem

# Partition Reduction for Lossy Data Compression Problem

Marek Śmieja Institute of Computer Science
Department of Mathematics and Computer Science
Jagiellonian University
Lojasiewicza 6, 30-348, Krakow, Poland
Email: marek.smieja@ii.uj.edu.pl
Jacek Tabor Institute of Computer Science
Department of Mathematics and Computer Science
Jagiellonian University
Lojasiewicza 6, 30-348, Krakow, Poland
Email: jacek.tabor@ii.uj.edu.pl
###### Abstract

We consider the computational aspects of lossy data compression problem, where the compression error is determined by a cover of the data space. We propose an algorithm which reduces the number of partitions needed to find the entropy with respect to the compression error. In particular, we show that, in the case of finite cover, the entropy is attained on some partition. We give an algorithmic construction of such partition.

## I Introduction

The basic description of lossy data compression consists of the quantization of the data space into partition and (binary) coding for this partition. Based on the approach of A. Rényi’s [1, 2] and E. C. Posner et al. [3, 4, 5], we have recently presented an idea of the entropy which allows to combine these two steps . The main advantage of our description over classical ones is that we consider general probability spaces without metric. It gives us more freedom to define the error of coding.

In this paper we concentrate on the calculation of the entropy defined in . We propose an algorithm which allows to reduce drastically the computational effort to perform the lossy data coding procedure.

To explain precisely our results let us first introduce basic definitions and give their interpretations. In this paper, if not stated otherwise, we always assume that is a subprobability space111We assume that is a measurable space and .. As it was mentioned, the procedure of lossy data coding consists of the quantization of data space into partition and binary coding for this partition. We say that family is a partition if it is countable family of measurable, pairwise disjoint subsets of such that

 μ(X∖⋃P∈PP)=0. (1)

During encoding we map every given point to the unique if and only if . Binary coding for the partition can be simply obtained by Huffman coding of elements of .

The statistical amount of information given by optimal lossy coding of by elements of partition is determined by the entropy of which is :

 h(μ;P):=∑P∈Psh(μ(P)), (2)

where , for and is the Shannon function.

The coding defined by a given partition causes specific level of error. To control the maximal error, we fix an error-control family which is just a measurable cover of . Then we consider only such partitions which are finner than i.e. we desire that for every there exists such that . If this is the case then we say that is -acceptable and we write .

To understand better the definition of the error-control family let us consider the following example.

###### Example I.1.

Let be a family of all intervals in with length . Every -acceptable partition consists of sets with diameter at most . Then, after encoding determined by such partition, every symbol can be decoded at least with the precision . The above error-control family was considered by A. Rényi [1, 2] in his definition of the entropy dimension. As the natural extensions he also studied the error-control families built by all balls with given radius in general metric spaces. Similar approach was also used by E. C. Posner [3, 4, 5] in his definition of -entropy222E. C. Posner considered in fact -entropy which differs slightly from our approach..

In the case of general measures, it seems to be more natural to vary the lengths of intervals from the error-control family. Less probable events should be coded with lower precision (longer intervals) while more probable ones with higher accuracy (shorter intervals). Our approach allows to deal easily with such situations.

To describe the best lossy coding determined by -acceptable partitions, we define the entropy of as

 H(μ;Q):=inf{h(μ;P)∈[0;∞]:P is a % partition and P≺Q}. (3)

Let us observe what is the main difficulty in the application of this approach to the lossy data coding:

###### Example I.2.

Let us consider the data space and the error-control family . In such simple situation there exists uncountable number of -acceptable partitions which have to be considered to find .

In this paper, we show how to reduce the aforementioned problem to the at most countable one. In the next section, we propose an algorithm which for a given partition , allows to construct -acceptable partition with not greater entropy than , where denotes the sigma algebra generated by (see Algorithm II.1).

As a consequence we obtain that the entropy can be realized only by partitions (see Corollary III.1). In the case of finite error-control families , we get an algorithmic construction of optimal -acceptable partition. More precisely, if is an error-control family then there exists sets such that (see Corollary III.3):

 H(μ;Q)=h(μ;{Qi∖i⋃j=1Qj}ki=1). (4)

## Ii Algorithm for Partition Reduction

In this section we present an algorithm which for a given -acceptable partition constructs -acceptable partition with not greater entropy. We give the detailed explanation that .

We first establish the notation: for a given family of subsets of and set , we denote:

 QA={Q∩A:Q∈Q}. (5)

Let be an error-control family on and let be a -acceptable partition of . We build a family according to the following algorithm:  Algorithm II.1.

initialization

while  do
Let be such that

Let be an arbitrary set
which satisfies

end while

We are going to show that Algorithm II.1 produces the partition with not greater entropy than . Before that, for the convenience of the reader, we first recall two important facts, which we will refer to in further considerations.

###### Observation II.1.

Given numbers and such that , we have:

 sh(p)+sh(q)≥sh(p+r)+sh(q−r). (6)
###### Proof:

For the proof we refer the reader to [7, Section 6] where similar problem is illustrated for . ∎

Consequence of Lebesgue Theorem (see ) Let be summable i.e. and be a sequence of functions such that , for . If is pointwise convergent, for every , then is summable and

 ∑k∈Nlimn→∞fn(k)=limn→∞∑k∈Nfn(k). (7)

Let us move to the analysis of Algorithm II.1. We first check what happens in the single iteration of the algorithm.

###### Lemma II.1.

We consider an error-control family and a -acceptable partition of . Let be such that:

 μ(Pmax)=max{μ(P):P∈P}. (8)

If is chosen so that then

 (9)
###### Proof:

Clearly, if then the inequality (9) holds trivially. Thus we assume that .

Let us observe that it is enough to consider only elements of with non zero measure – the number of such sets can be at most countable. Thus, let us assume that (the case when is finite can be treated in similar manner).

For simplicity we put . For every , we consider the sequence of sets, defined by

 Qk:=k⋃i=1(Pi∩Q). (10)

Clearly, for , we have

 Q1=P1, (11)
 Qk⊂Qk+1, (12)
 Pi∩Qk=Pi∩Q, for i≤k, (13)
 Pi∩Qk=∅, for i>k, (14)
 limn→∞μ(Qn)=μ(Q). (15)

To complete the proof it is sufficient to derive that for every , we have:

 h(μ;{Qk}∪PX∖Qk)≥h(μ;{Qk+1}∪PX∖Qk+1) (16)

and

 h(μ;{Qk}∪PX∖Qk)≥h(μ;{Q}∪PX∖Q). (17)

Let be arbitrary. Then from (13) and (14), we get

 h(μ;{Qk}∪PX∖Qk)=sh(μ(Qk))+∞∑i=2sh(μ(Pi∖Qk)) (18)
 =sh(μ(Qk))+k∑i=2sh(μ(Pi∖Q))+∞∑i=k+1sh(μ(Pi)) (19)
 =h(μ;{Qk+1}∪PX∖Qk+1)+sh(μ(Qk))−sh(μ(Qk+1)) (20)
 +sh(μ(Pk+1))−sh(μ(Pk+1∖Q)). (21)

Making use of Observation II.1, we obtain

 sh(μ(Qk))+sh(μ(Pk+1)) (22)
 ≥sh(μ(Qk+1))+sh(μ(Pk+1∖Q)), (23)

which proves (16).

To derive (17), we first use inequality (16). Then

 h(μ;{Qk}∪PX∖Qk)=sh(μ(Qk))+∞∑i=1sh(μ(Pi∖Qk)) (24)
 ≥limn→∞[sh(μ(Qn))+∞∑i=1sh(μ(Pi∖Qn))]. (25)

By (15),

 limn→∞sh(μ(Qn))=sh(μ(Q))<∞. (26)

To calculate , we will use the Consequence of Lebesgue Theorem. We consider a sequence of functions

 fn:P∋P→sh(μ(P∖Qn))∈R, for n∈N. (27)

Let us observe that the Shannon function is increasing on and decreasing on . Thus for a certain ,

 sh(μ(Pi∖Qn))≤1, for i≤m (28)

and

 sh(μ(Pi∖Qn))≤sh(μ(Pi)), for i>m, (29)

for every . Since then

 ∞∑i=1sh(μ(Pi∖Qn))≤m+∞∑i=m+1sh(μ(Pi))<∞. (30)

Moreover,

 limn→∞sh(μ(P∖Qn))=sh(μ(P∖Q)), (31)

for every .

As the sequence of functions satisfies the assumptions of the Consequence of Lebesgue Theorem then, we get

 limn→∞∞∑i=1sh(μ(Pi∖Qn))=∞∑i=1limn→∞sh(μ(Pi∖Qn)) (32)
 =∞∑i=1sh(μ(Pi∖Q))<∞. (33)

Consequently, we have

 h(μ;{Qk}∪PX∖Qk)≥limn→∞[sh(μ(Qn))+∞∑i=1sh(μ(Pi∖Qn))] (34)
 =sh(μ(Q))+∞∑i=1sh(μ(Pi∖Q))=h(μ;{Q}∪PX∖Q), (35)

which completes the proof. ∎

We are ready to summarize the analysis of Algorithm II.1. We present it in the following two theorems.

###### Theorem II.1.

Let be an error-control family on and let be a -acceptable partition of . Family constructed by the Algorithm II.1 is a partition of .

###### Proof:

Directly from the Algorithm II.1, we get that is countable family of pairwise disjoint sets.

Let us assume that , since the case when is finite family is straightforward. To prove that

 μ(X∖∞⋃i=1Ri)=0, (36)

we will use the Consequence of Lebesgue Theorem.

For every , we define a function by

 fn(P):=μ(P∖n⋃i=1Ri), for P∈P. (37)

Clearly,

 fn(P)≤μ(P), for n∈N (38)

and

 ∑P∈Pμ(P)≤1. (39)

To see that the sequence is pointwise convergent, we apply the indirect reasoning. Let and let be such that, for every ,

 fn(P)=μ(P∖n⋃i=1Ri)>ε. (40)

We put . We assume that we have already chosen sets . Since then , for every . Hence, we have

 μ(n⋃i=1Ri)=n∑i=1μ(Ri)≥nε≥1, (41)

as is a family of pairwise disjoint sets. Consequently,

 μ(P∖n⋃i=1Ri)≤0, (42)

which is the contradiction. The sequence is convergent.

Finally, making use of Lebesgue Theorem, we obtain

 μ(X∖∞⋃i=1Ri)=limn→∞μ(X∖n⋃i=1Ri) (43)
 =limn→∞∑P∈Pμ(P∖n⋃i=1Ri)=∑P∈Plimn→∞μ(P∖n⋃i=1Ri) (44)
 =∑P∈Pμ(P∖∞⋃i=1Ri)=0. (45)

###### Theorem II.2.

Let be an error-control family on and let be a -acceptable partition of . Partition built by Algorithm II.1 satisfies:

 h(μ;R)≤h(μ;P). (46)
###### Proof:

If then the inequality (46) is straightforward. Thus let us discuss the case when .

We denote , since at most countable number of elements of partition can have positive measure (the case when is finite follows similarly). We will use the notation introduced in Algorithm II.1.

Directly from Lemma II.1, we obtain

 h(μ;PXk)≥h(μ;PXk+1∪{Rk}), for k∈N. (47)

Consequently, for every , we get

 h(μ;k⋃i=1{Ri}∪PXk)≥h(μ;k+1⋃i=1{Ri}∪PXk+1). (48)

Our goal is to show that

 h(μ;k⋃i=1{Ri}∪PXk)≥h(μ;R), (49)

for every .

Making use of (48), we have

 h(μ;k⋃i=1{Ri}∪PXk) (50)
 =k∑i=1sh(μ(Ri))+∞∑i=1sh(μ(Pi∖k⋃j=1Rj)) (51)
 ≥limn→∞[n∑i=1sh(μ(Ri))+∞∑i=1sh(μ(Pi∖n⋃j=1Rj))], (52)

for every .

We will calculate using the Consequence of Lebesgue Theorem for a sequence of functions , defined by

 fn:P∋P→sh(μ(P∖n⋃j=1Rj))∈R, for n∈N. (53)

Similarly to the proof of Lemma II.1, we may assume that there exists such that

 sh(μ(Pi∖n⋃j=1Rj))<1, for i≤m (54)

and

 sh(μ(Pi∖n⋃j=1Rj))m, (55)

for every . Moreover,

 limn→∞sh(μ(P∖n⋃j=1Rj))=sh(μ(P∖∞⋃j=1Rj))=0, (56)

for every since is a partition of .

Making use of the Consequence of Lebesgue Theorem, we get

 limn→∞∞∑i=1sh(μ(Pi∖n⋃j=1Rj))=∞∑i=1sh(μ(Pi∖∞⋃j=1Rj))=0. (57)

Consequently, for every , we have

 h(μ;k⋃i=1{Ri}∪PXk) (58)
 ≥limn→∞[n∑i=1sh(μ(Ri))+∞∑i=1sh(μ(Pi∖n⋃j=1Rj))] (59)
 =∞∑i=1sh(μ(Ri))=h(μ;R), (60)

which completes the proof. ∎

## Iii Concluding Remarks

We have seen that in computing the entropy with respect to the error-control family it is sufficient to consider only partitions constructed from the sigma algebra generated by . Thus, we may rewritten the definition of the entropy with respect to :

###### Corollary III.1.

We have:

 H(μ;Q)=inf{h(μ;P)∈[0;∞]:P is a partition, P≺Q and P⊂ΣQ}. (61)

Let us observe that Algorithm II.1 shows how to find a -acceptable partition with the entropy arbitrarily close to :

###### Corollary III.2.

Let be an error-control family of . For any number , there exists partition such that

 h(μ;P)≤H(μ;Q)+ε. (62)
###### Proof:

For simplicity let us assume that (the case when is finite or uncountable follows in similar way). Then the partition , which satisfies the assertion, is of the form:

 P:=∞⋃i=1{Qσ(i)∖⋃k

for specific permutation of natural numbers. ∎

When is a finite family then the entropy of is always attained on some partition . More precisely, we have:

###### Corollary III.3.

Let be element error-control family, where . Then there exist sets , for specific , such that

 H(μ;Q)=h(μ;{Qi∖i⋃j=1Qj}ki=1). (64)

To see that the entropy with respect to arbitrary, possibly infinite, error-control family does not have to be attained on any partition, we use trivial example from [6, Example II.1]:

###### Example III.1.

Let us consider the open segment with sigma algebra generated by all Borel subsets of , Lebesgue measure and error control family, defined by

 Q={[a,b]:0

One can verify that but clearly , for every -acceptable partition .

As an open problem we leave the following question:

###### Problem III.1.

Let be an error-control family. We assume that if there exists such that , for every , then also . We ask if the entropy with respect to is realized by some -acceptable partition .

## References

•  A. Rényi, “On measures of entropy an information,” Proc. Fourth Berkeley Symp. on Math. Statist.and Prob., vol. 1, pp. 647–561, 1961.
•  ——, “On the dimension and entropy of probability distributions,” Acta Mathematica Hungarica, vol. 10, no. 1–2, pp. 193–215, 1959.
•  E. C. Posner, E. R. Rodemich, and H. Rumsey, “Epsilon entropy of stochastic processes,” The Annals of Mathematical Statistics, vol. 38, pp. 1000–1020, 1967.
•  E. C. Posner and E. R. Rodemich, “Epsilon entropy and data compression,” The Annals of Mathematical Statistics, vol. 42, pp. 2079–2125, 1971.
•  ——, “Epsilon entropy of stochastic processes with continuous paths,” The Annals of Probability, vol. 1, no. 4, pp. 674–689, 1973.
•  M. Śmieja and J. Tabor, “Entropy of the mixture of sources and entropy dimension,” to appear in IEEE Transactions on Information Theory, vol. 58, no. 5, 2012.
•  C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, 1948.
•  J. F. C. Kingman and S. J. Taylor, Introduction to measures and probability.   Cambridge University Press, 1966.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   