Subblock-Constrained Codes for Real-Time Simultaneous Energy and Information Transfer

# Subblock-Constrained Codes for Real-Time Simultaneous Energy and Information Transfer

Anshoo Tandon,   Mehul Motani,  and Lav R. Varshney,
Preprint: Submitted for publication, May 2015.
A. Tandon and M. Motani are with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583 (email: anshoo@nus.edu.sg, motani@nus.edu.sg).L. R. Varshney is with the Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (email: varshney@illinois.edu).This work was supported in part by the National Research Foundation Singapore under Grant No. NRF-CRP-8-2011-01, and by Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Centers, sponsored by MARCO and DARPA.Some results in this paper were presented in part at the IEEE SECON 2014 Workshop on Energy Harvesting Communications, June, 2014 [1], and at the 2015 International Symposium on Information Theory (ISIT), June, 2015 [2].
###### Abstract

Consider an energy-harvesting receiver that uses the same received signal both for decoding information and for harvesting energy, which is employed to power its circuitry. In the scenario where the receiver has limited battery size, a signal with bursty energy content may cause power outage at the receiver since the battery will drain during intervals with low signal energy. In this paper, we consider a discrete memoryless channel and characterize achievable information rates when the energy content in each codeword is regularized by ensuring that sufficient energy is carried within every subblock duration. In particular, we study constant subblock-composition codes (CSCCs) where all subblocks in every codeword have the same fixed composition, and this subblock-composition is chosen to maximize the rate of information transfer while meeting the energy requirement. Compared to constant composition codes (CCCs), we show that CSCCs incur a rate loss and that the error exponent for CSCCs is also related to the error exponent for CCCs by the same rate loss term. We show that CSCC capacity can be improved by allowing different subblocks to have different composition while still meeting the subblock energy constraint. We provide numerical examples highlighting the tradeoff between delivery of sufficient energy to the receiver and achieving high information transfer rates. It is observed that the ability to use energy in real-time imposes less of penalty than the ability to use information in real-time.

## I Introduction

Although wireless charging of portable electronic devices [3] and implantable biomedical devices [4] has attracted the attention of researchers over the last few years, pioneering work on wireless power transfer was conducted over a century ago by Hertz and Tesla [5]. Similarly, wireless information transfer has a rich history, including works by Popov [6], Bose [7], and Marconi [8]. In fact, Marconi’s wireless telegraph device, capable of transatlantic radio communication, helped save over 700 lives during the tragic accident of the Titanic in 1912 [9]. However, the first work in an information-theoretic setting on analyzing fundamental tradeoffs between simultaneous information and energy transfer is relatively recent [10]. The study of simultaneous information and energy transfer is relevant for communication from a powered transmitter to an energy-harvesting receiver which uses the same received signal both for decoding information and for harvesting energy. The energy harvested by the receiver is employed to power its circuitry.

The tradeoff between reliable communication and delivery of energy at the receiver was characterized in [10] using a general capacity-power function, where transmitted codewords were constrained to have average received energy exceed a threshold. This tradeoff between capacity and energy delivery was extended for frequency-selective channels in [11]. Since then, there have been numerous extensions of the capacity-power function in various settings [12, 13, 14, 15]. Biomedical applications of wireless energy and information transfer have been proposed through the use of implanted brain-machine interfaces that receive data and energy through inductive coupling [16, 4, 17].

However, in practical applications such as biomedical, imposing only an average power constraint is not sufficient; we also need to regularize the transferred energy content. This is because a codeword satisfying the average power constraint may still cause outage at the receiver if the energy content in the codeword is bursty, since the receive energy buffer with a relatively small storage capacity may drain during intervals with low signal energy. In order to regularize the energy content in the signal, we herein adopt a subblock-constrained approach where codewords are divided into smaller subblocks, and every subblock is constrained to carry sufficient energy exceeding a given threshold. The subblock length and the energy threshold may be chosen to meet the real-time energy requirement at the receiver.

An alternative to the subblock-constraint is the sliding-window constraint, which we do not consider here. Under a sliding-window constraint, each codeword provides sufficient energy within a sliding time window of certain duration. This approach was adopted in [18, 19], where the use of runlength codes for simultaneous energy and information transfer was proposed. In [20], a sliding window constraint was imposed on binary codewords and bounds on the capacity were presented for different binary input channels. Note that the sliding-window constraint is relatively tighter than the subblock-constraint, since subblock-constraint corresponds to the case where the windows are non-overlapping.

In this paper, we consider a discrete memoryless channel (DMC) and characterize achievable information rates when each subblock is constrained to carry sufficient energy. We assume that corresponding to transmission of each symbol in the input alphabet, the receiver harvests a certain amount of energy as a function of the transmitted symbol. Since different symbols may correspond to different energy levels, the requirement of sufficient energy content within a subblock imposes a constraint on the composition of each subblock. Towards meeting this subblock energy requirement, we introduce the constant subblock-composition codes (CSCCs) where all the subblocks in every codeword have the same fixed composition. This subblock-composition, quantifying the fraction of different symbols with each subblock, is chosen to maximize the rate of information transfer while meeting the energy requirement. Note that if denotes a given subblock of length , then the composition of is the distribution on defined by , where is the number of occurrences of symbol in subblock .

### I-a Our Contribution

For meeting the real-time energy requirement at a receiver which uses the received signal to simultaneously harvest energy and decode information, we propose the use of CSCCs (Sec. III-A) and establish their capacity as a function of the required energy per symbol (Sec. III-B). We show that CSCC capacity can be computed efficiently by exploiting certain symmetry properties (Sec. III-C) and present bounds on subblock length for avoiding receiver energy outage (Sec. III-D).

Compared to constant composition codes, we quantify the rate loss incurred due to the additional constraint of restricting all subblocks within codewords to have the same composition (Sec. IV-A). For a given rate of information transfer, we derive a lower bound for the error exponent using CSCC in terms of the error exponent for constant composition codes (Sec. IV-B).

We show that information rates greater than CSCC capacity can be achieved by allowing different subblocks to have different composition, while still meeting the energy requirement per subblock (Sec. V).

For enabling real-time information transfer, we consider local subblock decoding where each subblock is decoded independently (Sec. VI), and compare achievable rates using local subblock decoding with those when all the subblocks within a codeword are jointly decoded. We also provide numerical results highlighting the tradeoff between delivery of sufficient energy to the receiver and achieving high information rates (Sec. VII).

### I-B Related Work

Codes with different constraints on the codewords have been suggested in the past, depending on the constraints at the transmitter, the properties of the communication channel, or the properties of the storage medium. For digital information storage on magnetic medium [21], codewords are usually designed to meet the runlength constraint [22] or are optimized for partial response equalization with maximum-likelihood sequence detection (PRML) [23]. The study of information capacity using runlength-limited (RLL) codes on binary symmetric channels (BSC) was carried in [24, 25, 26].

A class of binary block codes called multiply constant-weight codes (MCWC), where each codeword of length is partitioned into equal parts and has weight in each part, was explored in [27] owing to their potential application in implementation of low-cost authentication methods [28]. Note that MCWC, introduced in [27] as a generalization of constant weight codes [29], are themselves a special case of CSCCs with input alphabet size equal to two. When each codeword in an MCWC is arranged as an array and additional weight constraints are imposed on all the columns, the resulting two-dimensional weight constrained codes have potential application in optical storage systems [30] and in power line communications [31].

Power line communications (PLC) requires the power output to be as constant as possible so that information transfer does not interfere with the primary function of power delivery. One way to achieve this on the PLC channel (which suffers from narrow-band interference, white Gaussian noise, and impulse noise [32]), is to employ permutation codes [33] where each codeword of length is a permutation of different frequencies, with each frequency viewed as an input symbol. Higher rates of information transfer may be achieved using constant composition codes [34] at the cost of local variation in power while ensuring that the power expended is same upon completion of each codeword. When the codeword length is a multiple of the frequency alphabet size, the composition may be chosen such that each frequency occurs equal number times in each codeword [35].

The codewords employed by an energy harvesting transmitter are constrained by the instantaneous energy available for transmission. The capacity of these constrained codes over an additive white Gaussian noise (AWGN) channel has been analyzed when the energy storage capability at the transmitter is zero [36], infinite [37], or some finite quantity [38, 39]. The capacity of an AWGN channel with processing cost at an energy harvesting transmitter was characterized in [40]. The DMC capacity using an energy harvesting transmitter equipped with a finite energy buffer was analyzed in [41]. A comprehensive summary of the recent contributions in the broad area of energy harvesting wireless communications was provided in [42].

## Ii System Model

Consider communication from a transmitter to a receiver where the receiver uses the received signal both for decoding information as well as for harvesting energy (see Fig. 1). We model the effective communication channel from the output of a digital modulator at the transmitter to the input to an information decoder at the receiver as a DMC. Note that a DMC is characterized by input alphabet , output alphabet , and a stochastic matrix with where the matrix entry is the probability that the output is when the channel input is .

A DMC is a reasonable communication channel model for simultaneous energy and information transfer. Consider, for instance, the use of a digital modulator at the transmitter which produces symbols from a signal constellation . At the receiver, the signal is split for use by the energy harvesting module and the information processing module, respectively. The input to the information decoder at the receiver comprises of one of quantized values , fed by a quantizer in the information processing path. For each quantized value , and each transmitted symbol , the likelihood can be computed based on the effective signal path from the transmit modulator to the quantizer at the receiver. The communication channel is thus a DMC with input alphabet , output alphabet , and channel transition probabilities .

In practice, the effective channels seen by the information decoder and the energy harvester may be different due to their respective pre-processing stages. A simple time-sharing approach to transmitting energy and information simultaneously was suggested in [43] via interleaving of energy signal and information-bearing signal. In [44], practical architectures for simultaneous information and energy reception were defined: an “integrated” receiver architecture has shared radio frequency chains between the energy harvester and the information decoder, whereas a “separated” architecture has different chains.

In our work, we assume a generic receiver architecture where the received signal is split between the energy harvesting path and the information processing path with a static power splitting ratio. The effective communication channel seen by the decoder in the information processing path is modeled as a DMC. We let denote the energy harvested by the harvester after the signal split at the receiver, when is transmitted. Thus, is a map from the input alphabet to the set of non-negative real numbers, and higher energy is carried by symbols having higher -value. This map is assumed to be time-invariant, and reflects the scenario where the statistical nature of the effective communication channel is due to the noise in the receiver circuitry, which does not affect the harvested energy. The quantification of abstracts the specific implementation of a chosen receiver architecture, which in turn helps to abstract the problem of the code design for simultaneous energy and information transfer from implementation details.

In order to meet the real-time energy requirement at the receiver, we partition the transmitted codeword into equal-sized subblocks (see Fig. 2) and require that transmitted symbols be chosen such that the expected harvested energy in each subblock exceeds a given threshold. This threshold is a function of the energy consumption by the receiver circuitry including the information decoder. We will denote the subblock length by and assume that the codeword length, denoted , is a multiple of . If a transmitted codeword is denoted , then the constraint on sufficient energy within each subblock can be expressed as

 1LL∑i=1b(X(j−1)L+i)≥B,  j=1,2,…,m (1)

where is the subblock index, denotes the required energy per symbol at the receiver, and is the number of subblocks in a codeword.

The choice of the subblock length depends on the energy storage capacity at the receiver; a small energy buffer generally requires relatively small value of to prevent energy outage at the receiver.

The subblock energy constraint given by (1) becomes trivial if is same for all (for instance, when the transmitted symbols belong to a phase-shift-keying constellation). However, the constraint is non-trivial when -values are not constant (for instance, using on-off keying) and threshold satisfies

 bmin

where

 bmin=minx∈Xb(x),     bmax=maxx∈Xb(x). (3)

In the rest of the paper we assume (2) is satisfied, unless otherwise stated.

For a given subblock within a codeword, if denotes the number of occurrences of symbol in the th subblock, then (1) can alternately be expressed as

 ∑x∈Xb(x)N(x)L≥B. (4)

Note that denotes the fraction of time when symbol appears in the subblock. We now introduce constant subblock-composition codes which are a nice way to meet the subblock energy constraint.

## Iii Constant Subblock-Composition Codes

### Iii-a Motivation and Definition

We have seen that for a given subblock, the energy constraint given by (1) can equivalently be expressed as (4) and this constraint is satisfied provided the fraction of time each symbol appears in the subblock is chosen appropriately. This observation motivates the use of codes where the composition of each subblock in all codewords is constant and is chosen such that (4) is satisfied. A constant subblock-composition code (CSCC) is one in which all codewords are partitioned into equal-sized subblocks and each subblock (in all codewords) has the same type . The subblock type in CSCC is chosen to satisfy the subblock energy constraint

 EP[b(X)]≜∑x∈Xb(x)P(x)=∑x∈Xb(x)N(x)L≥B. (5)

### Iii-B Capacity using CSCC

Let denote the set of all compositions for input sequences of length . For a given type , the set of sequences in with composition is denoted by and is called the type class or composition class of . In a CSCC with subblock-composition , every subblock in a codeword may be viewed as an element of .

In order to compute the capacity of a CSCC on a DMC, we may view the uses of the original channel as a single use of the induced vector channel having input alphabet and output alphabet . Since the underlying channel is memoryless, the transition probabilities for a pair of input and output vectors is the product of the corresponding transition probabilities of the underlying channel. If we let and be given input and output vectors with and , respectively, then the transition probabilities for the induced vector channel are:

 WL(yL1|xL1)=L∏i=1W(yi|xi). (6)

Since each subblock in a codeword may be chosen independently, the capacity using CSCC with subblock-composition , denoted , is equal to times the capacity of the induced vector channel with input alphabet , output alphabet , and transition probabilities given by (6). Thus if we denote and , then

 CLCSCC(P) =maxXL1∈TLPI(XL1;YL1)L (7) =maxXL1∈TLP(H(YL1)L−H(YL1|XL1)L) (8) =maxXL1∈TLP(H(YL1)L−∑Li=1H(Yi|Xi)L) (9)

where the last equality follows from the memoryless property of the channel. The maximization in (7) is over the distribution of input vector in . We will show that the maximum is achieved when the input vectors are uniformly distributed over .

###### Theorem 1.

The capacity of the induced vector-channel using CSCC with fixed subblock-composition is obtained via a uniform distribution of the input vectors in .

###### Proof:

See Appendix A. ∎

If we define the set of distributions

 ΓLB≜{P∈PL: EP[b(X)]≥B}, (10)

then the capacity using CSCC with subblock energy constraint (1), denoted , is defined as

 CLCSCC(B)=maxP∈ΓLBCLCSCC(P) (11)

### Iii-C Computing CSCC Capacity

By Theorem 1, the maximum is achieved in (9) when is uniformly distributed over . The computation of the capacity expression with increasing subblock length seems challenging since the input and output alphabet size for the induced vector channel grows exponentially with . However, we will show that the computational complexity of the CSCC capacity expression can be reduced using the following observations.

First note that the probability distribution for the output vector in the induced vector channel is given by

 PYL1(yL1)=1|TLP|∑xL1∈TLPWL(yL1|xL1), (12)

since the input vectors are uniformly distributed over . If is another output vector having the same composition as , then we have . This is because the columns and of the vector channel transition matrix are permutations of each other (see Appendix A). Thus output vectors having the same composition have equal probability. However, even though the input vectors are uniformly distributed, the output vectors in general are not uniformly distributed. Also, since the symbols within an input vector are not independent, in general we have , where denotes the probability of output scalar symbol .

Let denote the set of all compositions for output sequences of length . When is uniformly distributed over , the term in (9) can be expressed as

 H(YL1) =−∑yL1∈YLPYL1(yL1)logPYL1(yL1) (13) =−∑Q∈QL∑yL1∈TLQPYL1(yL1)logPYL1(yL1) (14) =∑Q∈QL|TLQ|PYL1(yL1)log1PYL1(yL1), (15)

where the last equality follows because is same for all . Note that we choose only one representative vector from each type class in the last equality.

Secondly, the following proposition shows that the term in (9) is same for all , since the corresponding joint probabilities are equal.

###### Proposition 1.

For a random input vector uniformly distributed over with corresponding output vector , the pairwise probability , for , satisfies

 PXY(Xi=x,Yi=y)=N(x)LW(y|x)=P(x)W(y|x). (16)
###### Proof:

Since

 PXY(Xi=x,Yi=y) = Pr(Xi=x)W(y|x), (17)

the claim will be proved if we show for all . As is uniformly distributed over , the is equal to the ratio of the number of input vectors with at index to the total number of vectors in . Since

 |TLP|=L!∏x∈XN(x)!, (18)

and the number of sequences in with at index is

 (L−1)!(N(x)−1)!∏~x≠xN(~x)!, (19)

the ratio of the quantities given by (19) and (18) is equal to . ∎

The next proposition gives a computationally efficient expression for CSCC capacity.

###### Proposition 2.

The CSCC capacity, , is given by

 maxP∈ΓLB1L∑Q∈QL|TLQ| PYL1(yL1)log1PYL1(yL1)−H(Y|X), (20)

where only one representative output vector is chosen from every type class , is given by (12), and is evaluated using the joint pairwise probability distribution given by (16).

###### Proof:

Use (9) and (11) to express . From Thm. 1, a uniform distribution over achieves capacity, and hence the entropy term in (9) can be computed using (15). The claim in Prop. 2 follows by further noting that the term in (9) is the same for all , which can be evaluated using the joint pairwise distribution in (16). ∎

### Iii-D Choice of Subblock Length L

In this subsection, we derive bounds on subblock length (as a function of the energy storage capacity at the receiver) which will ensure that the receiver never runs out of energy when the subblock-composition is chosen to satisfy (5). It will be seen that a large energy storage capacity allows for larger values of and hence results in higher rates of information transfer.

The energy storage capacity at the receiver is denoted and we assume that the receiver requires units of energy per received symbol for its processing. Let denote the level of the energy buffer at the receiver at the completion of uses of the channel. The energy update equation, for , is given by

 E(i+1)=min(Emax,|E(i)+b(Xi)−B|+), (21)

where is the symbol transmitted in the th channel use, and .

We say that an outage occurs during th channel use if

 E(i)+b(Xi)

while an overflow event occurs if

 E(i)+b(Xi)−B>Emax . (23)

We partition the input alphabet as , where

 X◃ ={x∈X|b(x)

For CSCC with subblock-composition , we define

 G=∑x∈X◃LP(x)(B−b(x)) , (26)

where will be used to characterizes some useful properties of the energy update process.

###### Lemma 1.

The energy update process satisfies the following properties for CSCC with subblock-composition :

1. If there is no energy outage or overflow during the reception of the first subblock, then .

2. If , then there is no energy outage during the reception of the first subblock.

3. If and , then .

###### Proof:

If there is no energy outage or overflow, then the total energy harvested during the reception of the first subblock is , while the total energy consumed is and claim follows since satisfies (5).

Let denote the transmitted symbol in the th channel use, , and . For , the level in the energy buffer decreases during the th channel use if and only if , and the corresponding decrease in energy level is . Since the subblock has composition , the sum of energy decrements over the reception of the first subblock is , and claim follows.

For proving claim , we note that the condition implies that there is no energy outage during the reception of the first subblock (using claim ). Further, if there is no overflow then (using claim ). In case there is energy overflow in the th channel use for any , we have , and thus . ∎

Lemma 1 is useful in proving the following theorem which gives a necessary and sufficient condition on subblock length in order to avoid outage.

###### Theorem 2.

A necessary and sufficient condition on for avoiding energy outage during the reception of CSCC codewords, with subblock-composition satisfying (5), is

 L≤Emax∑x∈X◃2P(x)(B−b(x)), (27)

with .

###### Proof:

See Appendix B. ∎

The initial condition on energy level, , may be ensured by transmitting a preamble, consisting of symbols with high energy content, before the transmission of codewords. This preamble has bounded length and hence does not affect the channel capacity.

## Iv Comparing CSCC with Constant Composition Codes

### Iv-a Rate Comparison

Similar to subblock-composition, a codeword composition represents the fraction of times each input symbol occurs in a codeword and a constant composition code (CCC) is one in which all codewords have the same composition. Note that a CSCC with subblock-composition may also be viewed as a CCC with codeword composition , since all the subblocks in CSCC have the same composition. In general for CCC, although all codewords have the same composition, different subblocks within a codeword may have different compositions. Hence CCCs are richer than CSCCs in terms of choice of symbols within each subblock. CCCs were first analyzed by Fano [45] and shown to be sufficient to achieve capacity for any discrete memoryless channel.

Let denote the maximum achievable rate using CCC with codeword composition . For (refer (10)), a CCC with codeword composition will ensure that the average received energy per symbol in a codeword is at least . However, it may violate the constraint on providing sufficient energy to the receiver within every subblock duration. For a CCC, we have [45]

 CCCC(P)=I(X;Y)=H(X)−H(X|Y). (28)

We are interested in quantifying the information rate penalty incurred by using CSCC compared to CCC, given by . This information rate penalty is the price we pay for meeting the real-time energy requirement within every subblock duration, compared to the less constrained energy requirement per codeword. Although the rate penalty can be numerically computed by explicit computation of and , the numerical approach has the limitation that the computation complexity of increases with an increase in subblock .

In CSCC, since a transmitted subblock is uniformly distributed over , we have [46, p. 26]

 H(XL1)=log|TLP|=LH(P)−Lr(L,P), (29)

where denotes a function of and given as

 r(L,P)=s(P)−12Llog(2πL)+12L∑a:P(a)>0logP(a)+ϑ(L,P)12Lln2s(P), (30)

with denoting the number of elements with , and is a real number between zero and one which is chosen so that (29) is satisfied.

We now present simple analytical bounds for this rate penalty. The following theorem shows that the rate penalty by using CSCC, relative to CCC, is bounded by .

###### Theorem 3.

The rate penalty is bounded as

 0≤CCCC(P)−CLCSCC(P)≤r(L,P). (31)

Further, there exist channels for which the rate penalty meets the upper or lower bound in (31) with equality.

###### Proof:

When is uniformly distributed over ,

 CLCSCC(P) =1L[H(XL1)−H(XL1|YL1)] (32) (a)=H(P)−r(L,P)−1LL∑i=1H(Xi|YL1,Xi−11) (b)≥H(P)−r(L,P)−1LL∑i=1H(Xi|Yi) (c)=H(P)−r(L,P)−H(X|Y) (d)=CCCC(P)−r(L,P), (33)

where denotes , follows from (29) and chain rule for entropy, follows since conditioning only reduces entropy, follows from (16), and follows from (28). Now, (31) follows from (33). Explicit channels can be constructed which meet the bounds in (31).

• for a binary symmetric channel (BSC) with crossover probability equal to .

• For a noiseless channel, we have due to equality in as .

###### Corollary 1.
 limL→∞CLCSCC(P)=CCCC(P) (34)
###### Proof:

Note that for a fixed , the value of as a function of is non-negative and falls roughly as and thus tends to zero as . Thus (34) follows by taking the limit in (31). ∎

Remark: For a fixed subblock length , the CSCC capacity can be achieved by making the number of subblocks in a codeword arbitrarily large and performing joint decoding over all the subblocks. However, when the number of subblocks in a codeword are kept constant and the subblock length is increased without bounds, then achievable rates using CSCC tend to CCC capacity. In particular, when there is only one subblock in a codeword, then the CSCC code is same as a CCC code whose capacity can be achieved by making arbitrarily large.

The upper bound (30) on the rate penalty given by is independent of the underlying channel. In general, given a communication channel, the bounds on rate penalty can be further improved. Consider, for example, a BSC with crossover probability where . For this channel, the upper bound can be tightened using Thm. 4. We first define a binary operator  and a function , respectively, as

 a⋆b ≜a(1−b)+(1−a)b. (35) h(x) ≜−xlogx−(1−x)log(1−x). (36)

We employ the above definitions to state the following theorem on bounding the rate penalty for a BSC.

###### Theorem 4.

For a BSC with crossover probability , input distribution denoted by , and we have,

 0

where is chosen such that

 h(α)=h(γ)−r(L,P),  0≤α<0.5 . (38)
###### Proof:

See Appendix C. ∎

The proof of Theorem 4 uses Mrs. Gerber’s Lemma (MGL) [47]. Using an extension [48] of MGL, the upper bound on the rate penalty can similarly be improved for general memoryless binary-input symmetric-output channels. In particular, we have the following theorem for the binary erasure channel (BEC).

###### Theorem 5.

For a BEC with erasure probability ,

 CCCC(P)−CLCSCC(P)≤(1−ϵ)r(L,P)
###### Proof:

See Appendix D. ∎

For memoryless asymmetric binary-input, binary-output channels, an alternate upper bound on the rate penalty (other than (31)) may be obtained using the equality of the channel characteristic function and the gerbator [49]. As an example, we have the following theorem for the -channel.

###### Theorem 6.

For a -channel with , and , we have

 CCCC(P)−CLCSCC(P)≤h(γ(1−p0))−h(α(1−p0)), (40)

where is given by (36), and is chosen such that

 h(α)=h(γ)−r(L,P),  0≤α<0.5 . (41)
###### Proof:

See Appendix E. ∎

The rate penalty bound given by (40) may sometimes be worse than the bound in (31), depending on and . In general, the rate penalty for the -channel can be upper bounded by .

### Iv-B Error Exponent Comparison

In this subsection, we discuss the error exponent using CSCC and show that it can be bounded as a function of the (computationally simpler) error exponent for CCC.

We now present some definitions and notations which will be used in this subsection. For a pair of random variables with , and conditional probability distribution , we will write as , as , and the distribution of as . Thus we have

 PW(y) ≜∑x∈XP(x)W(y|x),  y∈Y (42) H(W|P) ≜∑x∈XP(x)H(W(⋅|x)) (43) I(P,W) ≜H(PW)−H(W|P) . (44)

The informational divergence of distributions and is denoted as

 D(P||Q)≜∑x∈XP(x)logP(x)Q(x) . (45)

The conditional informational divergence of stochastic matrices and with respect to distribution on is denoted as

 D(V||W|P)≜∑x∈XP(x)D(V(⋅|x)||W(⋅|x)) . (46)

For CCCs with codeword composition and information rate , the sphere packing exponent function [46] of DMC is given by

 Esp(R,P,W)≜minV:I(P,V)≤RD(V||W|P) , (47)

with ranging over all channels , and represents an upper bound on the error exponent using best possible codes. For fixed and , the function is a convex function of (which follows from convexity of and as a function of ), positive for and zero otherwise.

The random coding exponent function [46] of channel for CCCs with codeword composition and information rate is denoted by and represents a lower bound on achievable error exponent. It is related to as

 Er(R,P,W)={Esp(R,P,W),if R≥^REsp(^R,P,W)+^R−R,if 0

where is the smallest at which the convex curve meets its supporting line of slope .

The structure of which achieves the minimum in (47) for is given by the following lemma. For , the minimum in (47) is equal to zero which is obtained by choosing .

###### Lemma 2.

For , the stochastic matrix which minimizes subject to is given by

 V(y|x)=W(y|x)1−sPV(y)s∑~y∈YW(~y|x)1−sPV(~y)s, (49)

where satisfies the set of simultaneous equations

 PV(y)=∑x∈XP(x)V(y|x)=∑x∈XP(x)W(y|x)1−sPV(y)s∑~y∈YW(~y|x)1−sPV(~y)s, (50)

and is chosen such that .

###### Proof:

See Appendix F. ∎

We remark that the random coding exponent function for a DMC was stated by Fano [45] using the distributions and , given by (49) and (50), respectively, and were referred to as tilted probability distributions. However, the explicit statement of Lemma 2 seems not to have appeared in the literature before.

The following theorem uses Shannon’s random coding argument to bound the probability of error for CSCC with subblock-composition on a DMC. It also applies Lemma 2 to compactly express the error probability in terms of the sphere packing exponent function.

###### Theorem 7.

There exists a CSCC with subblock length , subblock-composition , and codeword length , transmitting information at rate on DMC , for which the maximum probability of error is upper bounded as

 Pe<⎧⎨⎩2exp(−nEsp(R′,P,W)),if R′≥^Rexp(−n(Esp(^R,P,W)+^R−R′)),if R′<^R, (51)

where and is the smallest at which the convex curve meets its supporting line of slope .

###### Proof:

See Appendix G. ∎

The following corollary is immediate.

###### Corollary 2.

The error exponent for CSCC with subblock length , subblock-composition , information rate on DMC , is lower bounded by

 Er(R+r(L,P),P,W). (52)

Thus the bound on the error exponent for CSCC is related to the error exponent for CCC by the same term, , as the bound for the rate penalty (31).

## V Beyond Constant Subblock Composition Codes

In a CSCC, every subblock within any codeword has the same composition, and this composition is chosen to meet the subblock energy constraint (5). The capacity using CSCC (given by (11)) is achieved by choosing that subblock-composition in (given by (10)) which maximizes the information rate. We will see that rates greater than can be achieved while still meeting the subblock energy constraint (1).

We first review known results when constraints are placed on the entire codeword (with no subblock constraints) [10, 46]. Let denote any codeword of length . If we impose the average energy constraint on codewords,

 1nn∑i=1b(Xi)≥B, (53)

then the channel capacity with this constraint is [10, 46]

 maxPX:EPX[b(X)]≥BI(X;Y). (54)

Information rates arbitrarily close to this capacity can be achieved by making the codeword length sufficiently large. Moreover, if is an input distribution which maximizes (54), then this capacity can be achieved by a sequence of CCCs with codeword composition tending to [45, 46]. Thus, if denotes the capacity using CCC when the average energy per symbol is constrained to be at least , then

 CCCC(B) =maxP:EP[b(X)]≥BCCCC(P) (55) =maxPX:EPX[b(X)]≥BI(X;Y). (56)

Thus the capacity with codeword constraints can be achieved by restricting the codewords to have a fixed composition. This is possible because for a given transmission rate, the codebook size increases exponentially with codeword length while the number of different types of sequences only increase polynomially with .

We will now show that contrary to the case with codeword constraints, when the constraints are applied to fixed sized subblocks then information rates can, in general, be increased by not restricting the subblocks to have a fixed composition. Towards this, we define a subblock energy-constrained code (SECC) as a code which satisfies the subblock energy constraint given by (1). Since all subblocks in SECC satisfy (1), the composition of each subblock belongs to the set .

Let denote the capacity using SECC with subblock length and average energy per symbol at least . Similar to CSCC, the uses of the channel in case of SECC induce a vector channel with input alphabet

 A=⋃P∈ΓLBTLP , (57)

output alphabet , and channel transition probabilities given by (6). Since each subblock may be chosen independently,

 CLSECC(B)=maxXL1∈AI(XL1;YL1)L , (58)

where the maximization is over the probability distribution of input vectors in . For a noiseless -ary channel (), it is easy to check that SECC capacity is achieved by the uniform distribution of over . Thus for the noiseless channel, we have .

For CSCC, the induced vector channel was symmetric (irrespective of the underlying (scalar) DMC being symmetric or not), and hence the capacity was achieved with a uniform distribution over the input alphabet. In contrast, in case of SECC the induced vector channel need not be symmetric even when the underlying DMC is symmetric. This is formalized in the following theorem which is proved by providing a counterexample.

###### Theorem 8.

Uniform distribution of over may not achieve SECC capacity even when the underlying DMC is symmetric.

###### Proof:

See Appendix H. ∎

Finding the probability distribution which achieves the maximum in (58) is not straightforward, in general. If denotes the uniform distribution of over , then the maximum information rate achievable with , denoted , acts as a lower bound for . Since a CSCC can be viewed as a SECC where the input vectors have the same composition, it follows that is also a lower bound for . Thus we have

 CLSECC(B)≥max{CLCSCC(B),CLUA(B)}. (59)

The following proposition is useful in reducing the computational complexity of .

###### Proposition 3.

For a random input vector uniformly distributed over with corresponding output vector , the pairwise joint probability, for , satisfies

 PXY(Xi=x,Yi=y)=∑P∈ΓLB|TLP||A|P(x)W(y|x) (60)
###### Proof:

When is uniformly distributed over ,

 Pr(XL1∈TLP)=|TLP||A|. (61)

From Prop. 1 it follows that

 Pr(Xi=x,Yi=y|XL1∈TLP)=P(x)W(y|x). (62)

Finally (60) follows from (61) and (62) since is equal to

 (63)

Another useful observation with SECC is that if and are two output vectors having the same composition, then the columns of the induced vector channel transition matrix corresponding to and are permutations of each other. This follows from arguments similar to those presented in Appendix A for CSCC. Thus, if is distributed uniformly over , then for and having the same composition, we have

 PYL1(~yL1)=PYL1(yL1)=1|A|∑xL1∈AWL(yL1|xL1). (64)

The next proposition gives a computationally efficient expression for .

###### Proposition 4.

can be expressed as

 1L∑Q∈QL|TLQ|PYL1(yL1)log1PYL1(yL1) − H(Y|X), (65)

where is the set of all compositions for output vectors of length , only one representative output vector is chosen from every type class , is given by (64), and is evaluated using the joint pairwise probability distribution given by (60).

###### Proof:

For a DMC, we have

 CLUA(B)=1L(H(YL1)−L∑i=1H(Yi|Xi)), (66)

where the probability of is given by (64). Thus, (65) follows from (66), (60) and the observation that output vectors with the same composition have equal probability when input subblocks are uniformly distributed over . ∎

As discussed earlier, the energy requirement per subblock is stricter than the average energy requirement per codeword. Hence, the capacity using codes with subblock-constraint (1) is less than the capacity using codes with codeword constraint (53). Since CCCs achieve capacity with codeword constraint [46], we have

 CLSECC(B)≤CCCC(B). (67)

From (59) and (67) it follows that . Further, using (34) it follows that SECC capacity tends to CCC capacity as . We will compare these capacities for different cases in the numerical results section.

## Vi Real-time Information Transfer

So far, we could ensure real-time energy transfer to the receiver by placing constraints on the subblock-composition. For information transfer, although joint decoding of all the subblocks within a codeword is preferred for reducing the probability of error, it also causes delay in information arrival.

For enabling real-time information transfer, the receiver may decode each subblock independently, and thus avoid waiting for arrival of future subblocks. Here, since the subblock decoding proceeds the instant that subblock has been completely received, the information transfer delay is only due to subblock transfer time and the corresponding decoding delay.

When each subblock within the transmitted sequence is decoded independent of other subblocks, then each subblock may itself be viewed as a codeword. We will refer to the independent decoding of subblocks as local subblock decoding (LSD). We remark that this subblock based decoding is distinct from decoding for locally decodable codes that allows any bit of the message to be decoded with high probability by only querying a small number of received bits [50].

### Vi-a Local Subblock Decoding

In case of local subblock decoding, each subblock may be treated as an independent codeword since every subblock is decoded independently. We are interested in estimating achievable rates with bounded error probability when local subblock decoding is employed. We now provide a short review of an existing result on achievable rates for constant composition finite blocklength codes. This result will then be used (in Sec. VII) to compare rates between local (independent) subblock decoding and joint subblock decoding.

Let denote the maximum size of length- constant composition code for a DMC with average error probability no larger than . When the composition of codewords is equal to an input probability distribution which maximizes the mutual information and the channel satisfies some regularity conditions, then [51, 52, 53]

 logM∗(n,ϵ)=nC−√nVQ−1(ϵ)+12logn+O(1) (68)

where is the channel capacity, is the information variance, and is the Gaussian -function [52]. We remark that is also termed channel dispersion in literature [54]. Early results on finite blocklength capacity for memoryless symmetric channels are due to Weiss [55], which were generalized for the DMC and strengthened by Strassen [56].

When each codeword has equal number of ones and zeros, the achievable rate in bits per channel use for BSC with crossover probability using CCC is approximated as [51]:

 log2M∗(n,ϵ)n≈C−√p(