Energy, Latency, and Reliability Tradeoffs in Coding Circuits^{†}^{†}thanks: Part of this work was submitted for presentation at the 2016 International Symposium on Information Theory.
Abstract
It is shown that fullyparallel encoding and decoding schemes with asymptotic block error probability that scales as have Thompson energy that scales as . As well, it is shown that the number of clock cycles (denoted ) required for any encoding or decoding scheme that reaches this bound must scale as . Similar scaling results are extended to serialized computation. The Grover informationfriction energy model is generalized to three dimensions and the optimal energy of encoding or decoding schemes with probability of block error is shown to be at least .
I Introduction
Expanding on work started in [1] and more recently advanced in [2, 3, 4], we borrow a computational complexity model introduced in [5] that allows us to model the energy and number of clock cycles of a computation. We consider fundamental tradeoffs between the asymptotic energy, number of clock cycles, and block error probability for sequences of good encoders and decoders.
Definition 1.
An coding scheme is a sequence of codes of increasing block length , together with a sequence of encoders and decoders, in which the block error probability associated with the code of block length is less than for sufficiently large .
We show, in terms of (the number of clock cycles of the encoder or decoder for the code with block length ) that an coding scheme that is fully parallel has encoding and decoding energy () that scales as . We show that the energy optimal number of clock cycles for encoders and decoder () for an coding scheme scales as , giving a universal energy lower bound of . A special case of our result is that exponentially low probability of error coding schemes thus have encoding and decoding energy that scales at least as with energyoptimal number of clock cycles that scales as . This approach is generalized to serial implementations.
Recent work on the energy complexity of good decoding has focused largely on planar circuits. However, circuits implemented in threedimensions exist [6], and so we generalize the recent information friction (or bitmeters) model introduced by Grover in [3] to circuits implemented in threedimensions and extend the technique of Grover to show that, in terms of block length , a bitmeters coding scheme in which block error probability is given by has encoding/decoding energy that scales as . We show how this approach can be generalized to an arbitrary number of dimensions.
In Section II we discuss prior work, and in particular we discuss existing results on complexity lower bounds for different models of computation for different notions of “good” encoders and decoders. The main technical results of this work are in Section III, where we study the Thompson energy model, and in Section IV, where we study a multidimensional generalization of the Grover bitmeters model. In these sections we present lower bounds for decoders, as the derivation for encoding lower bounds is almost exactly the same. We provide an outline of the technique for encoder lower bounds in Section V. In Section VI we discuss limitations and weaknesses in the model used. In Section VII, we discuss other energy models of computation. In Section VIII we discuss possible future work, and conjecture that similar tradeoffs may extend to circuits that perform inference.
Notation: We use standard BachmannLandeau notation in this paper. The statement means that for sufficiently large , for some positive constant . The statement means that for sufficiently large , again for some constant . The statement means that there are two positive constants and such that and for sufficiently large , .
Ii Prior Related Work: Computational Complexity Lower Bounds for Good Decoders and Encoders
The earliest work on computational complexity lower bounds for good decoding comes from Savage in [7] and [8], which considered bounds on the memory requirements and number of logical operations needed to compute decoding functions. However, wiring area is a fundamental cost of good decoding and the authors do not consider this. More recently, in [1], the authors use a model similar to our model, except the notion of “area” the authors use is the size of the smallest rectangle that completely encloses the circuit under consideration.
In [2], Grover et al. consider the same model that we do, and find Thompson energy lower bounds as a function of probability of block error probability for good encoders and decoders. Our analysis of the Thompson model differs from the approach of Grover et al. in a number of ways. Firstly, central to the work of Grover et al. is a bound on block error probability if intersubcircuit bits communicated is low (presented in Lemma 2 in the Grover et al. paper), which is analogous to our result in (4) of the proof of Theorem 1. Our result simplifies this relationship using simple probability arguments. Secondly, the Grover et al. paper does not present what energyoptimal number of clock cycles are in terms of asymptotic probability of block error, nor do they present the fundamental tradeoff between number of clock cycles, energy, and reliability within the Thompson model that we present in this paper. Moreover, the technique of [2] does not extend to serial implementations.
In [4] we considered the corner case of decoding schemes in which block error probability asymptotically was less than for serial and parallel decoding schemes. We did not, however, analyze schemes in terms of the rate at which block error probability approaches , nor did we compute energyoptimal number of clock cycles as we do herein.
There has also been some work on complexity scaling rules for encoding and decoding of specific types of codes. Low density parity check coding VLSI scaling rules have been studied in [9, 10] and polar coding scaling rules have been studied in [11]. The scaling rules presented in this paper are general and apply to any code.
Another computational model that has proven more tractable than the Turing Time complexity model is the constant depth circuit model (see [12] for a detailed description of this model). Superpolynomial lower bounds on the size of constant depth circuits that compute certain notions of “good encoding functions” (though not decoding) were derived in [13]. In this case, the notion of “good” considered was the ability to correct at least errors at rates asymptotically above . Similar related work exists in [14] which discovered lower bounds on the formulasize of functions that perform good error control coding; similar bounds were later discovered in [15].
Iii Thompson Model
Iiia Circuit Model
The model we will consider derives from Thompson [5]. The specific model we consider has been studied in [4, 2, 9, 10]. The reader should refer to [4] for details of the model. The important parameters to be extracted from the model are , the circuit area, and , the number of clock cycles in a computation. Since in this paper we are only concerned with scaling rules, we assume that both the technology constant and the wire width considered in [2, 4] are equal to . The energy of a computation is thus defined as .
Note that a circuit can be associated with a graph in the natural way, in which a wire corresponds to an edge of the graph and a node corresponds to a vertex. An edge connects two vertices if their associated nodes are connected by wires. A diagram of a small circuit next to its associated graph is given in Fig. 1.
IiiB Definitions and Lemmas
To present the main results of this paper we shall present a sequence of definitions and lemmas similar to [4, 2].
Lemma 1.
[4] Suppose that , , and are random variables that form a Markov chain and takes on values from a finite alphabet with a uniform distribution, (i.e., for all ), takes on values from a finite set , and from a set . Suppose as well that . Then:
Remark 1.
We will interpret as the set of symbols a particular subcircuit will need to estimate, as that subcircuit’s estimate of those symbols, and as the bits injected into the subcircuit during the computation. Note that this result mirrors the result of Lemma 4 in [3]. In this lemma, the author proves that if a circuit has bits to make an estimate of a random variable that is uniformly distributed over all binary strings of length , then that circuit makes an error with probability at least . Our lemma presented here includes this lemma as a special case by setting and . In this case we can infer: , where the last inequality is implied by .
Proof:
Definition 2.
A bisection of a graph of a set of vertices is a set of edges that, once removed from the graph, results in two disconnected subgraphs with vertices and in which . That is, it is the set of edges that, once removed, divides the vertices of roughly in half. The minimum bisection width of a set of vertices is the size of a smallest bisection.
Note that since a circuit is associated with a graph, we can discuss such a circuit’s minimum bisection width, that is the minimum bisection width of the graph with which it is associated. Herein we will consider bisecting the output nodes of a circuit.
Lemma 2.
All circuits whose associated graphs have minimum bisection width have circuit area .
Proof:
See Thompson [5]. ∎
We now discuss the notion of nested minimum bisection, a concept introduced by Grover et al. in [2] and also used in [4] which we again present here so the paper is self contained.
Suppose that a circuit has output nodes. If the output nodes of such a circuit are minimum bisected, this results in two disconnected subcircuits each with, roughly, output nodes. These two subcircuits can each have their output nodes minimum bisected again, resulting in four disconnected subcircuits, now each with roughly output nodes.
Definition 3.
This process of nested minimum bisections on a circuit, when repeated times, is called performing stages of nested minimum bisections. In the case of this paper, the set of nodes to be minimum bisected will be the output nodes. We may also refer to this process as performing nested bisections, and a circuit under consideration in which nested bisections have been performed as a nested bisected circuit. Note that we will omit the term “minimum” in discussions of such objects, as this is implicit.
Note that associated with an stage nested bisected circuit are subcircuits. Note as well that once a subcircuit has only one node, it does not make sense to bisect that subcircuit again. Suppose we are nestedbisecting the output nodes of a circuit. In this case, one cannot meaningfully nestedbisect the output nodes of a circuit times if .
Note that each of the subcircuits induced by the stage nested bisection may have some internal wires, and also wires that were deleted and connect to nodes in other subcircuits. We can index the subcircuits with the symbol .
Definition 4.
Let the number of wires attached to nodes in subcircuit that were deleted in the nested bisections be . This quantity is the fanout of subcircuit .
We shall also consider the bits communicated to a given subcircuit.
Definition 5.
Let , where we recall that is the number of clock cycles used in the running of the circuit under consideration. This quantity is called the bits communicated to the th subcircuit.
We can now define an important quantity.
Definition 6.
The quantity is the intersubcircuit bits communicated.
Note that each subcircuit induced by the nested bisections will each have close to output nodes within them (a consequence of choosing to bisect the output nodes at each stage), however, each may have a different number of input nodes.
Definition 7.
This quantity is called the number of input nodes in the th subcircuit and we denote it .
Note that for all valid choices of . That is, the sum over the number of input nodes in each subcircuit is the total number of input nodes in the original circuit.
This now allows us to present an important lemma.
Lemma 3.
All fullyparallel circuits with intersubcircuit bits communicated have product bounded by:
(1) 
where we define .
Proof:
Lemma 4.
All fullyparallel circuits with intersubcircuit bits communicated and number of input nodes have product bounded by:
where we define .
Proof:
Definition 8.
An decoder is a circuit that computes a decoding function . It is associated with a codebook, (and therefore, naturally, an encoding function, which computes a function ), a channel statistic, (which we will assume herein to be the statistic induced by channel uses of a binary erasure channel), and a statistic from which the source is drawn (which we will assume to be the statistic generated by independent fair binary coin flips). The quantity is the block length of the code, and the quantity is the the number of bits decoded.
Definition 9.
The block error probability of a decoder, denoted , is the probability that the decoder’s estimate of the original source is incorrect. Note that this probability depends on the source distribution, the channel, and the function that the decoder computes.
Definition 10.
A decoding scheme is an infinite sequence of circuits each of which computes a decoding function, with block lengths and bits decoded . They are associated with a sequence of codebooks and a channel statistic.
We assume throughout this paper that the channel statistic associated with each decoder is the statistic induced by uses of a binary erasure channel. Our lower bound results also apply to any channel that is a degraded erasure channel, including the binary symmetric channel. Our results in terms of binary erasure probability can be applied to decoding schemes for the binary symmetric channel with crossover probability by substituting .
Definition 11.
We let denote the block error probability for the decoder with input size . We let be the rate of the decoder with input size .
We also classify decoding schemes in terms of how their probability of error scales in the definition below.
Definition 12.
An decoding scheme is a decoding scheme in which for sufficiently large the block error probability .
Definition 13.
The asymptoticrate, or more compactly, the rate of a decoding scheme is , if this limit exists, which we denote .
Note that the rate of a decoding scheme may not be the rate of any particular codebook in the decoding scheme.
Definition 14.
An exponentiallylowerror decoding scheme is an decoding scheme for some with asymptotic rate greater than .
We will also consider another class of decoding schemes, one which can be considered less reliable.
Definition 15.
A polynomiallylowerror decoding scheme is a decoding scheme for some with asymptotic rate .
We will also need to define a sublinear function, which will be used to deal with a technicality in Theorem 1.
Definition 16.
A sublinear function is a function in which .
IiiC Main Lower Bound Results
We can now state the main theorem of this paper.
Theorem 1.
All decoding schemes associated with a binary erasure channel with erasure probability in which monotonically decreases to and in which is a sublinear function have energy that scales as
(2) 
where and complexity that scales as:
(3) 
for another positive constant .
Proof:
Associated with each decoder is its , the intersubcircuit bits communicated. We can choose to be any function of so long as . From here on, we will suppress the dependence of , , and on . For ease of notation, let be the number of subcircuits induced by the stages of nested bisections. Consider any specific sufficiently large circuit in our decoding scheme, and suppose that . Then there exists at least subcircuits in which (where we recall is the bits communicated to the th subcircuit from Definition 5). Suppose not, i.e., that there are subcircuits with . Then, , violating the assumption that . Call the set of at least subcircuits with bits communicated to them less than . Using a similar averaging argument, we claim that within there must be one subcircuit in which . If not, if all subcircuits in have greater than input bits injected into them, then the total number of inputs nodes in the entire circuit is greater than , but there are only input nodes in the entire circuit. Thus, there is at least one subcircuit in in which and .
Suppose that all the input bits injected into this special subcircuit are erased. Then, that subcircuit makes an error with probability at least by Lemma 1, since it will have to form an estimate of bits by only having injected into it fewer than bits. Thus, if then:
where this first inequality flows from summing one term in a law of total probability expansion of the probability of block error, and the second from lower bounds on these probabilities.
Combining this observation with the fact the gives us the following observation:
(4) 
This is true for any valid choice of .
Now suppose that our decoding scheme is an decoding scheme. We choose to be
so that
(5) 
This is a valid choice of because cannot grow faster than because we assumed was monotonically decreasing (easily checked by inspection). Note as well that increases with because of the sublinearity assumption of . Then, if , by directly substituting into (4),
In other words, if then our decoding scheme is not an decoding scheme. Thus, for this choice of , .
Thus, by Lemma 4,
where we substituted the value for in the first line, used the fact that in the second, and simplified the lines that followed, proving inequality (2) of the theorem. As well, by Lemma 1, using for this choice of , following a similar substitution as in the previous paragraph:
and the inequality in (3) flows from substituting the appropriate value for as defined in Lemma 1. ∎
Corollary 1.
All exponentially low error decoding schemes have energy that scales as
for all functions that increase without bound. In other words, all exponential probability of error decoding schemes have energy at least that scales very close to . Moreover, any such scheme that has energy that grows optimally, i.e. as , must have .
Proof:
Note that an exponentially low error decoding scheme has . Thus, such a scheme is also an decoding scheme, for any increasing . The result then directly flows by substituting into (2) of Theorem 1.
For the second part of the corollary, suppose that for some constant , a decoding scheme has
(6) 
We have as well from (3) and substituting
(7) 
where we use the fact that (since by definition exponentiallylow error decoding schemes have asymptotic rate greater than ).
Suppose that
(8) 
for a that grows with , i.e., that asymptotically grows slower than . Then, to satisfy (7) we need
(9) 
for all increasing , implying
To see this precisely, suppose otherwise and then it is easy to see that, combined with (8) the inequality in (9) will be unsatisfied. If this is true, however, then the product
Since this is true for all increasing , it is true for, say, , implying that the product grows strictly faster than , contradicting the assumption of (6). ∎
We generalize Corollary 1 to decoding schemes with different asymptotic block error probabilities below:
Theorem 2.
Proof:
As well, suppose for some increasing . Then, from the bound (11) (to prove this, suppose otherwise and derive a contradiction). This implies then that , contradicting (10).
Moreover, for all growing slower than that required for optimal energy, this implies that , which implies ∎
Corollary 2.
All polynomiallylow error decoding schemes have energy that scales at least as
(12) 
If this optimal is reached, then .
IiiD Serial Decoding Scheme Scaling Rules
Let the number of output nodes in a particular decoder be denoted (in a decoding scheme this will be a function of ).
Definition 17.
A serial decoding scheme is one in which is constant.
In [4] we considered the case of allowing the number of output nodes to increase with increasing block length. We required an assumption that such a scheme be output regular, which we define below.
Definition 18.
[4] An output regular circuit is one in which each output node of the circuit outputs exactly one bit of the computation at specified clock cycles. This definition excludes circuits where some output nodes output a bit during some clock cycle and other output nodes do not during this clock cycle. An output regular decoding scheme is one in which each decoder in the scheme is an output regular circuit.
Theorem 3.
All serial decoding schemes have energy that scales as .
Proof:
The lower bound flows from following the arguments of the proof of Theorem 2 in [4], by showing that any decoding scheme in which the area scales less than cannot be an decoding scheme.∎
Theorem 4.
All output regular increasingoutput node decoding schemes have energy that scales as .
Proof:
From the derivations preceding equation (13) in [4], following a similar argument as in this paper, we divide the circuit into epochs as before, and divide the subcircuits into subcircuits through nested bisections. With this choice, we can follow the same arguments used in Theorem 3 in [4], and derive that all decoding schemes must have
∎
Iv Information Friction in ThreeDimensional Circuits
The “information friction” computational energy model was introduced by Grover in [3] and further studied by Vyavahare et al. in [16] and Li et al. in [17]. We generalize (and slightly modify) this model to three dimensions and use a similar approach to Grover to obtain some nontrivial lower bounds on the energy complexity of three dimensional bitmeters decoder circuits, in terms of block length and probability of error. We will discuss how this approach can be generalized to models in arbitrary numbers of dimensions. We present the model below and then prove our main complexity result.

A circuit is a grid of computational nodes at locations in the set , where is the set of integers. Some nodes are inputs nodes, some are output nodes, and some are helper nodes. Note that Grover [3] considers this model in terms of a parameter characterizing the distance between the nodes, but since we are concerned with scaling rules, we will assume that they are placed at integer locations, allowing us to avoid unnecessary notation. The Grover paper considered scaling rules in which nodes are placed on a plane, in which the number of dimensions . In our results we will discuss the case of and afterwards discuss how the approach can be generalized to an arbitrary number of spatial dimensions.

A circuit is to compute a function of binary inputs and binary outputs.

At the beginning of a computation, the inputs to the computation are injected into the input nodes. At the end of the computation the outputs should appear at an output node. A node can be both input and output.

A node can communicate messages along its links to any other node, and can receive bits communicated to them from any other node.

Each node has constant memory, and can compute any computable function of all the inputs it has received throughout the computation that is stored in their memory, to produce a message that it can send to any other node.

We associate a computation with a directed multigraph, that is, a set of edges linking the nodes. For every computation, there is one edge per bit communicated along a link in the computation’s associated multigraph. The “cost” of an edge in such a multigraph is the Euclidean distance between the two nodes that it connects. Note that if a node communicates bits to another node in a computation, then that computation’s associated multigraph must have edges connecting the two nodes. This multigraph is called a computation’s communication multigraph.

The energy, or the bitmeters, denoted of a computation is the sum of the costs of all the edges in the computation’s associated multigraph (that is, the sum of the Euclidean distances of all the edges).
We consider a grid of threedimensional cubes, with “inner cubes” nested within them. This object is a generalization of the “stencil” object defined by [3].
Definition 19.
An nested cube grid is an infinite grid of cubes, with side length and inner cube side length . Note that the inner cubes are centered within the outer cubes. Fig. 2 shows a diagram of one cube in a nested cube grid, to which the reader can refer to visualize this nested cube structure. A set of nested cube grid parameters is valid if and .
Note that a nested cube grid can be placed conceptually on top of a bit meters circuit. We will consider placing a nested cube grid in parallel with the Cartesian space that defines our circuit. We can specify the position of a nested cube grid that is parallel to a set of Cartesian coordinates by calling one of the corners of an outer cube the origin, and then specify the location of its origin. A particular set of parameters for a nested cube grid and a location for its origin (called its orientation) induces a set of subcircuits, defined below.
Definition 20.
A subcircuit, associated with a particular orientation of a nested cube grid, is the part of a bitmeters circuit within a particular outer cube.
Nodes in any subcircuit can thus be considered to be either inside an inner cube or outside an inner cube. For any circuit with finite number of nodes there will thus be some cubes that contain computational nodes, and some that do not. We can label the subcircuits that contain nodes with the index . The number of input nodes in cube we denote . The number of output nodes in subcircuit we denote . Furthermore, we denote the number of input nodes within the inner cube of subcircuit as .
Definition 21.
We define , which is the the number of output nodes within inner cubes, which we will often simply refer to with the symbol .
We will show in Lemma 6 that there exists a nested cube grid orientation in which is high.
Definition 22.
The internal bit meters of a subcircuit is the length of all the communication multigraph edges completely within subcircuit , plus the length of the parts of the edges within subcircuit . This quantity is denoted with the symbol . Note that (where we may have to sum over some subcircuits that do not contain any nodes).
Since a computation has associated with it its communication multigraph, for a given subcircuit we can consider the subgraph formed by all the paths that start outside of the cube and end inside the inner cube. We can group all the vertices of this graph that start outside the outer cube and call this the source, and group all vertices inside an inner cube and call it the sink. For this graph we can consider its mincut, the minimum set of edges that, once removed, disconnects the source from the sink.
Definition 23.
The number of bits communicated from outside a cube to within an inner cube, or, bits communicated, is the size of this minimum cut. For a particular subcircuit we refer to this quantity with the symbol .
Remark 2.
This quantity is analogous (but not the same) as the quantity for the Thompson circuit model from Definition 5, and thus we use the same symbol. The reader should not confuse these symbols; the Thompson model definition applies to discussions in Section III, and the bitmeters model definition applies in this section, Section IV.
If the internal bits of a subcircuit are fixed, then the subcircuit inside an inner cube will compute a function of the messages passed from outside the outer cube. Clearly, the size of the set of possible messages injected into this internal cube is (since is the min cut of the paths leading from outside to inside.)
Lemma 5.
All subcircuits with bits communicated have internal bit meters at least .
Proof:
Remark 3.
This lemma makes rigorous the idea that to communicate bits from outside a subcircuit to within its inner square, the bitmeters this takes is proportional to the distance from outside an outer square to within an inner square () and the number of bits communicated.
In the lemma below we show that there exists an orientation of any nested cube grid such that is high.
Lemma 6.
For all three dimensional bitmeters circuits with output nodes, all valid nested cube grid parameters and , there exists an orientation of an nested cube grid in which the number output nodes within inner cubes () is bounded by:
Remark 4.
Note that the relative volume of the inner cubes is This lemma says there exists an orientation of any nested cube grid in which the fraction of output nodes within inner cubes is at least this fraction, so this result is not surprising.
Proof:
This is a natural generalization of the Grover result (See Lemma 2 of [3]), which uses the probabilistic method. We consider placing the origin of an nested cube grid uniformly randomly within a cube of side length centered at the origin in the Cartesian space. We index the output nodes by . Let be the indicator random variable that is equal to if output node is within an inner cube. Then, given the uniform measure on the position of the cube, the quantity is a random variable. We observe:
where in (IV) we use the observation that, for each output node, the probability that it is in an inner square is proportional to the relative area of the inner square. Thus, the expected value of is and so there must be at least one nested cube grid orientation in which is greater than or equal to that value.∎
Lemma 7.
For all valid nested cube parameters and , and thus for sufficiently large .
Proof:
Intuitively, there cannot be more than on the order of inner nodes in a cube of volume . The bound comes from considering the corner case of a cube whose sides exactly touch output nodes. ∎
We can now state the main results of this section.
Theorem 5.
All 3Dbitmeters decoders for a binary erasure channel with erasure probability of sufficiently large block length with block error probability have bitmeters bounded by:
Proof:
We consider the number of bits communicated from outside a subcircuit to within the inner cube of subcircuit (). It must at least be to overcome the case that all the input nodes in the entire cube are erased. If this does not happen, then one of the output nodes must guess at least one bit, making an error with probability at least , formally justified by Lemma 1. This allows us to argue that:
(14)  
If then there exists a subcircuit indexed by in which . Suppose otherwise, i.e. that for all , then:
where we apply Lemma 5 after the first inequality, and for convenience suppress the subscript on the summation sign after the first instance. This contradicts our assumption that .
We choose the parameter in terms of probability of error in order to derive a contradiction if a circuit does not have high enough bitmeters. Specifically, we choose
(15) 
Consider the nested cube structure that has that must exist by Lemma 6. If then there must exist a subcircuit that has less than bits injected into it from outside the subcircuit to within its inner cube. Thus:
where (a) flows from (14), (b) from Lemma 7, and (c) from the evaluation of this expression by substituting (15). This is a contradiction. Thus, all bit meters decoders must have
The second inequality flows from the fact that we are considering the nested cube structure in which that must exist by Lemma 6. We may choose any valid to maximize this bound, and letting gives us:
∎
Remark 5.
Note that this argument naturally generalizes to dimensional space, in which all dimensional bitmeters decoders have energy that scales as . The key step in the proof to be altered is in a modification of Lemma 7 and a choice of in line 15 of the proof for some constant that may vary depending on the dimension. This implies, among other things, that exponentially low probability of error decoding schemes implemented in dimensions have bitmeters energy that scales as . Obviously, the most engineeringrelevant number of dimensions for this type of analysis are and .
V Encoder Lower Bounds
In terms of scaling rules, all the decoder lower bounds presented herein can be extended to encoder lower bounds. The main structure of the decoder lower bounds (inspired by [2, 3]) involves dividing the circuit into a certain number of subcircuits. Then, we argue that if the bits communicated within the circuit is lower, then there must be one subcircuit where the bits communicated to it are less than the bits it is responsible for decoding. If all the inputs bits in that circuit are erased, the decoder must make an error with probability at least .
In the encoder case, we also take inspiration from [2, 3]. In this case, the outputs of the encoder circuit can be divided into a certain number of subcircuits. Then we consider the bits communicated out of each subcircuit. This quantity must be proportional to the number of output bits in each subcircuit. Otherwise, there will be at least one subcircuit where the number of bits communicated out is less than the number of output nodes in the subcircuit. Call these bits that were not fully communicated out of this subcircuit . Suppose that once the output bits of the encoder are injected into the channel, all the bits in are erased. Now, the decoder must use the other bits of the code to decode. But, the subcircuit containing in the encoder communicated less than bits to the other outputs of the encoder. By directly applying Lemma 1, we see that no matter what function the decoder computes, it must make an error with probability at least . An argument of this structure and following exactly the structure of Theorems 1, 2, 3, 4, and 5 for the decoders gives us the following theorems, whose proofs are omitted.
Theorem 6.
All fullyparallel encoding schemes with number of clock cycles have energy
with optimal lower bound of when .
All serial, encoding schemes have energy that scales as
All increasing output node, outputregular encoding schemes have energy that scales as
Finally, all threedimensional, bitmeters encoding schemes associated with block error probability have energy that scales
Vi Limitations of Results
There are a number of weaknesses in the models we have used. Firstly, our results are asymptotic. For some set block error probability and rate, there may be a specific circuit that reaches this block error probability using a circuit design methodology that does not generalize to scale in a way as predicted by our theorems.
Note that our quantity refers to number of clock cycles, which reflects one of the main “time costs” in a circuit computation. In real circuits, the “time cost” of a computation involves two parameters: the number of clock cycles required, and the time it takes to do each clock cycle. In our model, we do not consider the time per clock cycle. In real circuits, this quantity often varies with wire lengths. We do not consider this in our model.
A particular weakness of the Thompson model we use is that it does not consider a quantity called switching activity factor. In circuit design, this quantity is the fraction of the circuit that “switches” during the course of the computation. And yet, our model assumes a switching activity factor of . Thus, in terms of scaling rules, the Thompson model should be considered applicable only to computational schemes in which the switching activity factor does not change with increasing input sizes. On the other hand, the informationfriction model accounts for the possibility of schemes in which switching activity factor changes with increasing block length, so, combined with the results of Grover, [3], the asymptotic energy lower bounds we derive apply.
Vii Other Energy Models of Computation
There has been some work on energy models of computation different from the Thompson energy models and Grover information friction models, and herein we provide a short review.
In [20], Bingham et al. classify the tradeoffs between the “energy” complexity of parallel algorithms and “time” complexity for the problem of sorting, addition, and multiplication using a model similar to, but not the same as the model we use. In the grid model used by these authors, a circuit is composed of processing elements laid out on a grid, in which each element can perform an operation. In this model the circuit designer has choice over the speed of each operation, but this comes at an energy cost. Real circuits run at higher voltages can result in lower delay for each processing element but higher energy [21]. The model used by the authors in [20] captures some of this fundamental tradeoff. Note that our model assumes constant voltage. Nontrivial results that show how real energy gains can occur by lowering voltages in decoder circuits have been studied in [22], but we do not study this here.
Another energy model of computation was presented by Jain et al. in [23]. This model introduced an augmented Turing machine, a generalization of the traditional Turing machine [24]. The authors introduce a transition function, mapping the current instruction being read, the current state, the next state and the next instruction to the “energy” required to make this transition. This model (once the transition function is clearly defined for a specific processor architecture) would be good for the algorithm designer at the software level. However, we do not believe this model informs the specialized circuit designer. The Thompson model which we analyze, on the other hand, can include, as a special case, the energy complexity of algorithms implemented on a processor, as our model allows for a composition of logic gates to form a processor.
Landauer [25] derives that the energy required to erase one bit of information is at least , where is Boltzmann’s constant, and is the temperature. Thus, a fundamental limit of computation comes from having to erase information. Of course, it may be possible to do reversible computation in which no information is erased that can use arbitrarily small amounts of energy, but such circuits must be run arbitrarily slowly. This suggests a fundamental timeenergy tradeoff different from the tradeoff discussed herein. Landauer [26], Bennett [27] and Lloyd [28] provide detailed discussions and bibliographies on this line of work. Demaine et al. [29] extract a mathematical model from this line of work and analyze the energy complexity of various algorithms within this model. Note that the Thompson model we use is one informed by how modern VLSI circuits are created, even though they operate at energies far above ultimate physical limits.
Viii Future Work
Currently, our work on lower bounds has not be extended to other channels, like the additive white Gaussian noise channel. Perhaps more interesting, however, is the question, do there exist polynomially low probability of error decoding schemes with energy that closely matches (12) of Corollary 2, i.e., one with energy that scales as ? This may have significantly lower energy than an exponentiallylow error decoding scheme, and may provide sufficient error control performance. We do not know whether such a decoding scheme exists and this remains an important open question. It may be that decoding strategies with energy that scales like this are already invented but have simply not been analyzed in terms of their energy complexity.
The decoding problem for communication systems is a special case of the more general problem of inference. Well known algorithms used for inference, for example the SumProduct Algorithm [30] and variational methods [31], include Gallager’s lowdensity paritycheck decoding algorithms as a special case [32]. Thus, we conjecture that there may be similar tradeoffs between energy, latency, and reliability in circuits that perform inference.
References
 [1] A. El Gamal, J. Greene, and K. Pang, “VLSI complexity of coding,” The MIT Conf. on Adv. Research in VLSI, 1984.
 [2] P. Grover, A. Goldsmith, and A. Sahai, “Fundamental limits on the power consumption of encoding and decoding,” in Proc. 2012 IEEE Int. Symp. Info. Theory, 2012, pp. 2716–2720.
 [3] P. Grover, “Information friction and its implications on minimum energy required for communication,” IEEE Trans. Inf. Theory, vol. 61, no. 2, pp. 895–907, Feb 2015.
 [4] C. G. Blake and F. R. Kschischang, “Energy consumption of VLSI decoders,” IEEE Trans. Inf. Theory, vol. 61, no. 6, pp. 3185–3198, June 2015.
 [5] C. D. Thompson, “Areatime complexity for VLSI,” Proc. 11th Ann. ACM Symp. Theory of Comput., pp. 81–88, 1979.
 [6] Y. Xie, J. Cong, and S. S. Sapatnekar, Threedimensional integrated circuit design: EDA, design and microarchitectures. New York, NY, USA: Springer Verlag, 2010.
 [7] J. E. Savage, “Complexity of decoders: Iclasses of decoding rules,” IEEE Trans. Inf. Theory, vol. 15, no. 6, pp. 689–695, Nov 1969.
 [8] ——, “The complexity of decoders – part ii: Computational work and decoding time,” IEEE Trans. Inf. Theory, vol. 17, no. 1, pp. 77–85, January 1971.
 [9] C. G. Blake and F. R. Kschischang, “On the energy complexity of LDPC decoder circuits,” CoRR, vol. abs/1502.07999, Feb. 2015. [Online]. Available: http://arxiv.org/abs/1502.07999
 [10] K. Ganesan, P. Grover, J. Rabaey, and A. Goldsmith, “On the total power capacity of regularldpc codes with iterative messagepassing decoders,” Selected Areas in Communications, IEEE Journal on, vol. 34, no. 2, pp. 375–396, Feb 2016.
 [11] C. G. Blake and F. R. Kschischang, “On scaling rules for energy of VLSI polar encoders and decoders,” 2016, in preparation.
 [12] S. Arora and B. Barak, Computational Complexity: A Modern Approach. New York, NY, USA: Cambridge University Press, 2009.
 [13] S. Lovett and E. Viola, “Boundeddepth circuits cannot sample good codes,” 2012, available at author’s homepage: http://www.ccs.neu.edu/home/viola/papers/LoV.pdf.
 [14] K. L. Rychkov, “A modification of khrapchenko’s method and its applications to bounds on the complexity of pischemes and coding functions,” Met. Disk. Anal. Theor. Graph. Skhem., vol. 42, pp. 91–98, 1985.
 [15] A. Kojevnikov and A. S. Kulikov, “Lower bounds on formula size of errorcorrecting codes,” 2007, unpublished manuscript, available at author’s homepage: http://logic.pdmi.ras.ru/ arist/papers/hamming.pdf.
 [16] P. Vyavahare, M. Mahzoon, P. Grover, N. Limaye, and D. Manjunath, “Information friction limits on computation,” in Communication, Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on, Sept 2014, pp. 93–100.
 [17] T. Li, M. Bakshi, and P. Grover, “Energyefficient decoders for compressive sensing: Fundamental limits and implementations,” CoRR, vol. abs/1411.4253, 2015. [Online]. Available: http://arxiv.org/abs/1411.4253
 [18] K. Menger, “Zur allgemeinen kurventheorie,” Fund. Math., vol. 10, pp. 96–115, 1927.
 [19] F. Göring, “Short proof of menger’s theorem,” Discrete Mathematics, vol. 219, pp. 295–296, 2000.
 [20] B. D. Bingham and M. R. Greenstreet, “Modeling energytime tradeoffs in VLSI computation,” IEEE trans. Computers, vol. 61, no. 4, April 2012.
 [21] B. Hoeneisen and C. A. Mead, “Fundamental limitations in microelecttronics – i. MOS technology,” SolidState Electronics, vol. 15, pp. 819–829, 1972.
 [22] F. LeducPrimeau, F. R. Kschischang, and W. Gross, “Modeling and energy optimization of LDPC decoder circuits with timing violations,” CoRR, vol. abs/1503.03880, 2015. [Online]. Available: http://arxiv.org/abs/1503.03880
 [23] R. Jain, D. Molnar, and Z. Ramzan, “Towards a model of energy complexity for algorithms [mobile wireless applications],” in 2005 IEEE Wireless Communications and Networking Conference, vol. 3, March 2005, pp. 1884–1890.
 [24] A. M. Turing, “On computable numbers, with an application to the entscheidungsproblem,” Journal of Math, vol. 58, 1936.
 [25] R. Landauer, “Irreversibility and heat generation in the computing process,” IBM Journal of Research and Development, vol. 5, no. 3, pp. 183–191, July 1961.
 [26] ——, “Dissipation and noise immunity in computation and communication,” Nature, vol. 335, no. 27, Oct. 1988.
 [27] C. H. Bennett, “The thermodynamics of computation  a review,” International Journal of Theoretical Physics, vol. 21, no. 12, 1982.
 [28] S. Lloyd, “Ultimate physical limits to computation,” Nature, vol. 406, August 2000.
 [29] E. D. Demaine, J. Lynch, G. J. Mirano, and N. Tyagi, “Energyefficient algorithms,” in Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, ser. ITCS ’16. New York, NY, USA: ACM, 2016, pp. 321–332. [Online]. Available: http://doi.acm.org/10.1145/2840728.2840756
 [30] F. Kschischang, B. Frey, and H. A. Loeliger, “Factor graphs and the sumproduct algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, Feb 2001.
 [31] M. J. Wainwright and M. I. Jordan, “Graphical models, exponential families, and variational inference,” Foundations and Trends in Machine Learning, vol. 1, no. 1–2, 2008.
 [32] R. Gallager, “Lowdensity paritycheck codes,” IRE Trans. Inf. Theory, vol. 8, no. 1, pp. 21–28, 1962.