# Classification in a Large Network

###### Abstract

We construct and analyze the communication cost of protocols (interactive and one-way) for classifying , in a network with nodes, with known only at node . The classifier takes the form , with weights . The interactive protocol (a zero-error protocol) exchanges a variable number of messages depending on the input and its sum rate is directly proportional to its mean stopping time. An exact analysis, as well as an approximation of the mean stopping time is presented and shows that it depends on , where and , with being the number of positive weights. In particular, the mean stopping time grows logarithmically in when , and is bounded in otherwise. Comparisons show that the sum rate of the interactive protocol is smaller than that of the one-way protocol when the error probability for the one-way protocol is small, with the reverse being true when the error probability is large. Comparisons of the interactive protocol are also made with lower bounds on the sum rate.

Index terms—Interactive communication, distributed function computation, hypothesis testing, sequential hypothesis testing, classification.

## I Introduction

We study the communication cost of implementing a 2-classifier which maps input vector to a class label in the set . It is assumed that the classifier is to be implemented in a network with sensor nodes, where is a random variable known at the th node alone. We propose a distributed algorithm for solving the classification problem and provide an analysis of the communication rate required by this algorithm. We also investigate lower bounds on the communication rate.

## Ii Background

Given a function , where and are arbitrary sets, its communication complexity [10], is , where is the communication cost of protocol for computing in a distributed environment. Here, the average-case rather than worst-case is considered, the average being over all inputs with respect to a known probability distribution [6].

Communication complexity for interactive function computation of Boolean functions is explored in great detail in the seminal work [6], which has also influenced the development here. Our paper addresses a problem of current interest since classification is an important part of machine learning. Interactive and one-way hypothesis testing were studied information theoretically (i.e. as the block length grows) in [9] for nodes and rounds. Here, we assume a block length of unity, the methods used are from sequential analysis, and the results derived are for infinite round protocols for nodes. We note that classification is also a central operation in nearest lattice point search [1], [8]. The bit exchange protocol described here is in a class of protocols described in [5].

## Iii Problem Setup

Since is known at node and is known globally, the quantity can be treated as a new , with a modified probability distribution, known to all nodes. If for some , it can be dropped and treated as a problem with a smaller . Thus it suffices to consider classifiers with filter coefficients , with magnitudes of the ’s absorbed into the probability distribution . Let and . Let , the number of positive filter weights. Then . Let

(1) |

Our classification function

(2) |

partitions into two classes and its complement .

We assume here that are independent, identically distributed (iid) random variables, uniformly distributed on the unit interval . Note that under our assumptions for the source distribution, the classification problem is non-trivial only if lies in the two-side open interval .

## Iv The Infinite Round Bit-Exchange Protocol

Nodes exchange information in a pre-arranged manner and any information transmitted by node- at time can depend only on and information that it has received from other nodes at times . We do not assume a broadcast model for counting bits, so every transmitted bit is indexed by and , the source, destination and transmission time, respectively. The cost is the sum of communication over .

The Interactive Protocol computes as follows. A node is selected as a leader node. Here, we assume that the leader node is distinct from the sensor nodes ^{1}^{1}1As will be apparent later, we could have chosen one of the sensor nodes as a leader node at a slightly lower communication cost. The corrections required for this are simple and negligible relative to the communication cost. and at the outset of a session has no access to . The order of communication is known to all nodes. The time axis is broken into sessions, a session starts with a fresh observation and concludes when is computed. Here we are concerned with a single session. A session is divided up into rounds.

Node- produces a bit stream according to the following standard binary expansion rule: set and for compute

(3) |

Let

(4) |

and

with . In round , node- sends to the leader node and after the leader node receives all the bits , , it computes and sends back to each sensor node, where

(5) |

with and . Observe that and , .

Let be the stopping time, i.e. is the smallest for which . Since the leader node sends back bits at the conclusion of every round, the total number of bits communicated in a session is where is the mean stopping time, over all inputs .

###### Lemma 1.

If at the end of round , , then either or .

###### Proof.

This follows from the fact that and the definition of in (2). ∎

### Iv-a A Recursive Description of the Protocol

For the purpose of analysis, it is convenient to have an alternate description, referred to as the recursive description, of the protocol. Towards this end consider the following alternate description of the feedback signal from leader to sensor. Let

(6) |

where , ,

(7) |

Observe that and , .

###### Remark 1.

From the form of and , it is clear that at iteration , the effective threshold is . Thus for the recursive protocol, the threshold shifts with each iteration, but the width of the interval remains fixed, as opposed to the non-recursive description, where decreases with .

###### Lemma 2.

###### Proof.

Observe that is determined by testing against thresholds . Similarly is obtained by testing against . Since , it suffices to show that

(8) |

with a similar statement for and . Eq. (8) is clearly true for since and and . Assume it holds for . Then

(9) | |||||

∎

For the analysis presented later we observe that

###### Lemma 3.

(10) |

###### Proof.

To see that (10) is true, let where is the number of 1’s in positions and the number of ’s in the positions . Then the number of ’s in the positions is and if and only if . ∎

In order to understand how scales with we now proceed to analyze . We first present a computational approach when is an integer, followed by an asymptotic analysis for general .

### Iv-B Exact Analysis

We first present the analysis for where the only non-trivial case is . Node 1 sends a bit to node 2, which responds with a signal to continue or stop (here and for this case only, the leader node is a sensor node; see footnote on previous page). When the signal is to stop, both nodes know the class. Thus the average rate is , which follows from the fact that and .

A general analysis for is presented next. We work with the recursive description of the protocol. The basic idea is that if the initial threshold is an integer then so is as can be seen from (7). Together with the fact that the distribution of does not depend on iteration index means that can be regarded as a state variable and if it shown to be restricted to a finite set, then , the mean stopping time associated with a threshold , can be obtained through the state transition probability matrix of a finite state transition graph. Let

###### Theorem 1.

The vector of mean stopping times is a solution of

(11) |

where is a matrix, whose value in the th row and th column is , ( is a commensurate identity matrix).

###### Proof.

We show that at each iteration in the recursive description of the protocol, the problem is to compute , where has the same joint probability distribution as , perhaps with a different threshold. Towards this end it suffices to show (i) that the probability distribution of does not depend on the iteration index , and that (ii) the possible value that a threshold can have at any iteration in the algorithm lies in a finite set. Eq. (11) then follows by a probability transition calculation. To see that (i) holds, note that is iid with marginal distribution that is uniform on . For the proof of (ii), note that by (7), if is an integer, then so is . The remaining steps to show (ii) are by induction. For , lies in the set . Assume it is true for . From the recursion (7), and the fact that the protocol did not stop at iteration , must satisfy , from which the assertion follows directly. Thus if the protocol continues at iteration , the probability that the threshold given that is , which does not depend on . Thus

(12) |

and (11) follows. ∎

###### Example 1.

For ,

(13) |

### Iv-C An approximation for

The event is equivalent to the event , where are defined immediately after (5). Thus . We will assume that , a constant and that . By the central limit theorem, the cumulative distribution function of converges to that of a Gaussian with mean and variance . With , let Then by the central limit theorem , where is an error term in the central limit approximation. Thus

The first term , where

(15) | |||||

where as , and is the largest integer for which . Note that is bounded in . Eq. (15) follows by applying the bound for the upper branch, , for the lower two branches. For the middle branch, we additionally use the bound , when and .

###### Theorem 2.

.

###### Proof.

Since the random variables have finite absolute third moment, it follows from a result due to Berry and Esseen [2], that for , , where is the CDF of a Gaussian random variable with with mean and variance , is the CDF of and is a constant independent of . Further, and for constants and . Thus for constants and . We now proceed to bound . Let be the largest such that for . Then . ∎

###### Remark 2.

It is interesting that when , is integer valued.

## V One-Way Protocols

In the One-Way protocol, the class label is to be known at the leader node alone. In the One-Way protocol, the leader informs each sensor node of the class label at an additional cost of bits. The encoder for the th node is a mapping , where is the codebook for node . Encoded value is sent to a leader node using bits. The leader node estimates the class label, and informs each node about this. The total rate is given by bits. Suppose the estimated class label is . The error is a function of the rate . We analyze and implement this protocol when for each ( is the largest integer ). In terms of the step size , bits. We now determine , and thus in terms of . Our main finding is that for and , the sum rate for One-Way is

(16) |

where is a target upper bound on the error probability. Thus for this case, for fixed , grows logarithmically in .

The proof of (16) is now presented. Observe that the protocol partitions the unit cube into smaller cubes, called cells, of side . An error occurs only when the source vector lies in a cell whose interior does not intersect the separating hyperplane . We now calculate the number of cells that intersect . In order to simplify the analysis a bit, let for . Note that, like , is uniformly distributed on . Now rename the to . The boundary . Thus under the transformation, the threshold is now , which we rename to , with the constraint that the new satisfies . Under our assumption of uniform quantization, the lower endpoints of the bins are in the set , , where is the number of bins. Let be the lower endpoint of the bin for , i.e . Then a cell intersects if and only if , or equivalently

(17) |

as can be seen after some algebraic manipulation. The random variable has mean zero and variance and through an application of the central limit theorem it follows that . The bound , and some further algebra leads to (16).

We conjecture that the performance of a uniform quantizer cannot be improved on, under the assumptions made here. The proof of this appears to be significantly complicated.

## Vi Lower Bounds for the Interactive Protocol

We first show, through a lower bounding argument, that for and , the bit exchange protocol achieves the optimal sum rate. For , the protocol achieves optimal scaling with respect to , when . However, when , there is a gap in the sum rate. We show this through a lower bound for .

### Vi-a A Tight Lower Bound for

The Interactive Protocol results in a rectangular partition where is a subpartition of the region and of . Since the error probability is zero, we refer to as a zero-error partition. The boundary is represented by the set . For a zero-error partition it is true that each point of must be the upper left corner of some rectangle that lies entirely in or the lower right corner of some rectangle that lies entirely in , except possibly for a set of one-dimensional measure zero. Let be the probabilities of the cells of and let be the probabilities of the cells of . We note here that we use the word rectangle to include what is referred to as a combinatorial rectangle, which is the Cartesian product of unions of intervals [4].

###### Theorem 3.

If a partition minimizes the entropy it contains a rectangle with vertices and and another rectangle with vertices and , for some .

###### Proof.

(Sketch) The idea is to grow the largest rectangle in a partition cell until a vertex touches the boundary while reducing the probabilities of the other rectangles in the partition cell. The probability distribution of the readjusted rectangles majorizes all other rectangular partitions. The details of the proof are omitted due to space constraints and can be found in [7]. ∎

###### Theorem 4.

For , the sum rate is no smaller than bits.

###### Proof.

Consider an extreme partition which contains a rectangle which has a points and as its upper left and lower right vertices. This is always true by Thm. 3. Let random variable indicate whether lies in or not, and let random variable indicate whether lies in one of the three regions, , or as shown in Fig. 1. Let denote the entropy of the partition . Then

(18) | |||||

and

(19) | ||||||

Since the regions and are similar to it follows that if this partition minimizes the entropy it must satisfy the recursion

(20) | |||||||

Solving for we obtain

(21) |

whose unique minimum value of bits occurs when . Plugging back in (18) leads to the desired result. ∎

### Vi-B

###### Lemma 4.

(Lemma 9 in [6]) Assume that probabilities are in decreasing order, and that , . Then the entropy .

It turns out to be easy to determine the largest rectangle in for . We do so here for the case where and . In this case, with being the probability of the largest rectangle in it follows that as can be checked by maximizing subject to . Thus it follows that . In this specific case, there is a gap between the upper bound of our protocol and the lower bound. For the cases where , the rate grows linearly in . Since we cannot expect a slower growth with (the result must be distributed to nodes), the protocol exhibits the correct scaling behavior with .

## Vii Numerical Experiments

Computer simulations were carried out for the Interactive Protocol for a node network and compared to (11). Simulations were based on repetitions per data point in the graph shown in Fig. 2(a). Fig. 2(b) shows the agreement between the growth of the stopping time obtained via simulation and with the approximation (15) for , . The agreement is good in all cases.

Fig. 2(d) compares the sum rate of the Interactive Protocol with the One-Way protocols for and . In our experiments, we used a uniform quantizer for quantizing for each . Note that in the one-way case, it is not possible to obtain zero error with finite rate. In our experiments we set the error probability to . It is seen that for a node network, the Interactive Protocol used bits whereas the One-Way protocol uses bits; a non-trivial saving of almost % over the One-Way protocol for . On the other hand, as seen in Fig. 2(c), when , the One-Way protocol uses only 2972 bits and thus the Interactive Protocol uses 38% more bits. In both cases, the Interactive Protocol achieves this with zero error, at the cost of variable completion time.

## Viii Summary and Conclusions

We presented and analyzed interactive and one-way protocols for solving a 2-classifier problem in a sensor network with sensor nodes. The sum rate of the interactive protocol depends on the mean stopping time of the protocol. Analysis of both protocols is presented. The analysis reveals a gap between the interactive protocol and the lower bound. The relative performance of the interactive and one-way protocols is seen to depend strongly on the error probability required of the one-way protocol. The cause of the gap between the interactive protocol and its lower bound requires further investigation.

## References

- [1] M. F. Bollauf, V. A. Vaishampayan and S. I. R. Costa, "On the communication cost of determining an approximate nearest lattice point," 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, 2017, pp. 1838-1842. doi: 10.1109/ISIT.2017.8006847.
- [2] P. L. Butzer, L. U. Hahn and U. Westphal, “On the rate of approximation in the central limit theorem”, Journal of approximation theory, vol. 13, No. 3, pp. 327–340, 1975.
- [3] R. A. Horn and C. J. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, U.K., 1985.
- [4] E. Kushilevitz and N. Nisan, Communication Complexity. Cambridge, U.K.: Cambridge Univ. Press, 1997.
- [5] A. Orlitsky, “Average-case interactive communication,” IEEE Transactions on Information Theory, vol. 38, No. 5, pp.1534-1547. Sep. 1992.
- [6] A. Orlitsky and A. El Gamal. "Average and randomized communication complexity." IEEE Transactions on Information Theory, vol. 36, no. 1, pp. 3-16, Jan. 1990.
- [7] V. A. Vaishampayan, “Towards a Converse for the Nearest Lattice Point Problem,” CoRR, vol. abs/1711.04714, http://arxiv.org/abs/1711.04714, 2017.
- [8] V. A. Vaishampayan and M. F. Bollauf, "Communication cost of transforming a nearest plane partition to the Voronoi partition," 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, 2017, pp. 1843-1847. doi: 10.1109/ISIT.2017.8006848
- [9] Y. Xiang and Y. H. Kim. Interactive hypothesis testing with communication constraints. In Proc. 2012 50th Annual Allerton Conference on Communication, Control, and Computing, pp. 1065–1072, Oct 2012.
- [10] A. C. Yao, “Some Complexity Questions Related to Distributive Computing(Preliminary Report)”. In Proceedings of the Eleventh Annual ACM Symposium on Theory of Computing, STOC ’79, 209-213. 1979.