A Graph-Theoretic Approach to Multitasking
A key feature of neural network architectures is their ability to support the simultaneous interaction among large numbers of units in the learning and processing of representations. However, how the richness of such interactions trades off against the ability of a network to simultaneously carry out multiple independent processes – a salient limitation in many domains of human cognition – remains largely unexplored. In this paper we use a graph-theoretic analysis of network architecture to address this question, where tasks are represented as edges in a bipartite graph . We define a new measure of multitasking capacity of such networks, based on the assumptions that tasks that need to be multitasked rely on independent resources, i.e., form a matching, and that tasks can be multitasked without interference if they form an induced matching. Our main result is an inherent tradeoff between the multitasking capacity and the average degree of the network that holds regardless of the network architecture. These results are also extended to networks of depth greater than . On the positive side, we demonstrate that networks that are random-like (e.g., locally sparse) can have desirable multitasking properties. Our results shed light into the parallel-processing limitations of neural systems and provide insights that may be useful for the analysis and design of parallel architectures.
lemmatheorem \aliascntresetthelemma \newaliascntpropositiontheorem \aliascntresettheproposition \newaliascntcorollarytheorem \aliascntresetthecorollary \newaliascntfacttheorem \aliascntresetthefact \newaliascntdefinitiontheorem \aliascntresetthedefinition \newaliascntremarktheorem \aliascntresettheremark \newaliascntconjecturetheorem \aliascntresettheconjecture \newaliascntclaimtheorem \aliascntresettheclaim \newaliascntquestiontheorem \aliascntresetthequestion \newaliascntexercisetheorem \aliascntresettheexercise \newaliascntexampletheorem \aliascntresettheexample \newaliascntnotationtheorem \aliascntresetthenotation \newaliascntproblemtheorem \aliascntresettheproblem
- 1 Introduction
- 2 Preliminaries
- 3 Upper bounds on the multitasking capacity
- 4 Constructions of Good Multitaskers
- 5 Conclusions
- A Appendix: Bounds on the number of -matchings
One of the primary features of neural network architectures is their ability to support parallel distributed processing. The decentralized nature of biological and artificial nets results in greater robustness and fault tolerance when compared to serial architectures such as Turing machines. On the other hand, the lack of a central coordination mechanism in neural networks can result in interference between units (neurons) and such interference effects have been demonstrated in several settings such as the analysis of associative memories [AGS85] and multitask learning [MC89]. Understating the source of such interference and how it can be prevented has been a major focus of recent research (see, e.g., [KPR17] and the references therein).
Recently, a graph-theoretic model has suggested that interference effects may explain the limitations of the human cognitive system in multitasking: the ability to carry out multiple independent processes at the same time. This model consists of a simple 2-layer feed-forward network represented by a bipartite graph wherein the vertex set is partitioned into two disjoint sets of nodes and , representing the inputs and the outputs of tasks respectively.
An edge corresponds to a directed pathway from the input layer to the output layer in the network that is taken to represent a cognitive process (or task) that maps an input to an output [Nei67]. In more abstract terms, every vertex in is associated with a set of inputs , every vertex in is associated with a set of outputs and the edge is associated with a function
Given a 2-layer network, a task set is a set of edges . A key assumption made in [FSGC14] that we adopt as well is that all task sets that need to be multitasked in parallel form a matching, namely, no two edges in share a vertex as an endpoint. This assumption reflects a limitation on the parallelism of the network that is similar to the Exclusive Read Exclusive Write (EREW) model in parallel RAM, where the tasks cannot simultaneously read from the same input or write to the same output. Similarly, for depth networks, task sets correspond to node disjoint paths from the input layer to the output layer. For simplicity, we shall focus from now on the depth 2 case with .
In [MDO16, FSGC14] it is suggested that concurrently executing two tasks associated with two (disjoint) edges and will result in interference if and are connected by a third edge . The rationale for this interference assumption stems from the distributed operation of the network that may result in the task associated with becoming activated automatically once its input and output are operating, resulting with interference with the tasks associated with and . Therefore, [MDO16, FSGC14] postulate that all tasks within a task set can be performed in parallel without interferences only if the edges in form an induced matching. Namely, no two edges in are connected by a third edge. Interestingly, the induced matching condition also arises in the communication setting [BLM93, AMS12, CK85], where it is assumed that messages between senders and receivers can be reliably transmitted if the edge set connecting these nodes forms an induced matching. Following the aforementioned interference model, [MDO16, FSGC14] define the multitasking capability of a bipartite network as the maximum cardinality of an induced matching in .
The main message of [MDO16, FSGC14] is that there is a fundamental tradeoff in neural network architectures like the human brain between the efficiency of shared representations , and the independence of representations that supports concurrent multitasking (this tradeoff is termed “multitasking versus multiplexing”). In graph-theoretic terms, it is suggested that as the average degree (“efficiency of representations”–larger degree corresponds to more economical and efficient use of shared respresentations) of increases, the “multitasking ability” should decay in . In other words, the cardinality of the maximal induced matching should be upper bounded by with . This prediction was tested and supported on certain architectures by numerical simulations in [MDO16, FSGC14]. Establishing such as a tradeoff is of interest, as it can identify limitations of artificial nets that rely on shared representations and aid in designing systems that attain an optimal tradeoff. Furthermore, such a tradeoff is also of significance for cognitive neuroscience as it can shed some light on the source of the striking limitation of the human cognitive system to execute control demanding tasks simultaneously.
Identifying the multitasking capacity of with the size of its maximal induced matching has two drawbacks. First, the fact that there is some, possibly large, set of tasks that can be multitasked does not preclude the existence of a (possibly small) set of critical tasks that greatly interfere with each other (e.g., consider the case in which a complete bipartite graph occurs as a subgraph of . This is illustrated in Figure 1). Second, it is easy to give examples of graphs (where ) with arbitrarily large average degree that nonetheless contain an induced matching of size . For example, there are -regular bipartite graphs with vertices on each side that contain an induced matching of size even when (For example, one can take two copies of a dense bipartite graph and connect these two copies with a perfect matching-see Figure 1 for an illustration). Hence, it is impossible to upper bound the multitasking capacity of every network with average degree by with vanishing as the average degree tends infinity. Therefore, the generality of the suggested tradeoff between efficiency and concurrency is not clear under this definition.
Our main contribution is a novel measure of the multitasking capacity that is aimed at solving the first problem, namely networks with “high” capacity that contain a task set whose edges badly interfere with one another. In particular, for a parameter we consider every matching of size , and ask whether every matching of size contains a large induced matching . This motivates the following definition (see Figure 2 for an illustration).
Let be a bipartite graph with , and let be a parameter. We say that is a -multitasker if for every matching in of size , there exists an induced matching such that
We will say that a graph is an -multitasker if it is -multitasker for all .
The parameter measures the multitasking capabilities of , and the larger is the better multitasker is considered. We call the parameter the multitasking capacity of for matchings of size .
Our definition generalizes without much difficulty to networks of depth , where instead of matchings, we consider first to last node disjoint paths, and instead of induced matchings we consider induced paths, i.e., a set of disjoint paths such that no two nodes belonging to different paths are adjacent.
Observe that our measure is related to the previously mentioned measure of the cardinality of an induced matching. That is, if is an -multitasker for a large , then contains a large induced matching.
The main question we shall consider here is what kind of tradeoffs one should expect between and . In particular, are there networks with large average degree that achieve a multitasking capacity bounded away from , especially, if is not too large? Which network architectures give rise to good multitasking behavior? Should we expect “multitasking vs. multiplexing”: namely, tending to zero with for all graphs of average degree ? While our definition of multitasking capacity is aimed at resolving the problem of small task sets that can be poorly multitasked, it turns out to be also related also to the “multitasking vs. multiplexing” phenomena. Furthermore, our graph-theoretic formalism also gives insights as to how network depth and interferences are related.
1.1 Our results
We provide some answers to the questions raised above. Our main contribution is in establishing a tradeoff between multitasking capacity of a graph and the its edge density that hold for arbitrary networks.
We divide the presentation of the results into two parts. The first part discusses the case of -regular graphs, and the second part discusses general graphs.
The -regular case:
Let be a bipartite -regular graph with vertices on each side. Considering the case of , i.e., maximal possible induced matchings that are contained in a perfect matching, we show that if a -regular graphs is an -multitasker, then . Our upper bound on establishes an inherent limitation on the multitasking capacity of any network. That is, for any task set of size it holds that must tend to as the degree grows. In fact, we prove that degree of the graph constrains the multitasking capacity also for task sets of smaller sizes. Specifically, for that is sufficiently larger than it holds that tends to as increases. We summarize these results in the following theorem.
There is a constant such that the following holds. Let , be a -regular bipartite graph with .
If , then . In particular, there exists a perfect matching in that does not contain an induced matching of size larger than .
If , then .
If then .
For a certain range of parameters our results are tight. Specifically, when considering task sets of size our result is tight up to logarithmic factors, as we provide a construction of a -regular graph where every matching of size contains an induced matching of size . See Theorem 4.3 for details.
For arbitrary values of it is not hard to see that every -regular graph achieves . We show that this naive bound can be asymptotically improved upon, by constructing an -multitaskers with . The construction is based on bipartite graphs which have good spectral expansion properties. See Theorem 4.4 for details.
Considering bounded values of we show that is it possible to achieve multitasking capacity bounded away above , when measured on task sets of bounded size (up to ). The best multitasking capacity one can hope for is (see Remark 4.1), and we construct -multitaskers for all . See Theorem 4.1 for details.
We also consider networks of depth
The irregular case:
Next we turn to arbitrary, not necessarily regular, graphs. We show that for an arbitrary bipartite graph with vertices on each side and average degree its multitasking capacity is upper bounded by . That is, when the average degree is concerned, the multitasking capacity of a graph tends to zero, provided that the average degree of a graph is larger than .
There is a constant such that the following holds. Let , be a bipartite graph of average degree with . If is an -multitasker then .
We also show that there are multitaskers of average degree , with . Hence, in contrast to the regular case, for the multitasking capacity to decay with average degree , we must assume that grows faster than . See Theorem 4.5 and Theorem 4.6 for the exact statements. It is an interesting question whether there exists a multitasker with independent of , for average degree , which, if true is the largest average degree possible. This is left as an open problem.
Finally, for any we show a construction of a graph with average degree such that for every , is a -multitaskers for all . Comparing this to the foregoing results, here we do not required that . Allowing larger values of allows for weaker multitasking: we obtain that the graph is a multitasker only with respect to matchings whose size is at most . See Theorem 4.2 for details.
A matching in a graph is a set of edges such that no two edges in share a common vertex. If has vertices and , we say that is a perfect matching. By Hall Theorem, every -regular graph with bipartition has a perfect matching. A matching is induced if there are no two distinct edges in , such that there is an edge connecting to . Given a graph and two disjoint sets we let be the set of edges with one endpoint in and the other in . For a subset , is the set of all edges contained in . Given an edge , we define the graph obtained by contracting as the graph with a vertex set . The vertex is connected to all vertices in neighboring or . For all other vertices , they form an edge in if and only if they were connected in . Contracting a set of edges, and in particular contracting a matching, means contracting the edges one by one in an arbitrary order.
Given a subset of vertices , the subgraph induced by , denoted by is the graph whose vertex set is and two vertices in are connected if and only if they are connected in . For a set of edges , denote by the graph induced by all vertices incident to an edge in . We will use the following simple observation throughout the paper.
Let be a matching in , and let be the average degree of . Suppose that we contract all edges in in . Then the resulting graph has average degree at most .
contains vertices and edges. The result follows as has vertices and at most edges. ∎
An independent set in a graph is a set of vertices that do not span an edge. We will use the following well known fact attributed to Turan.
Every -vertex graph with average degree contains an independent set of size at least .
The girth of a graph is the length of the shortest cycle in .
Let be a bipartite graph, an integer and , a parameter. We define the -matching graph to be a bipartite graph where is the set of all matchings of size in , is the set of all induced matchings of size in and a vertex (corresponding to matching of size ) is connected to a vertex (corresponding to an induced matching of size ) if and only if . We omit from the notation of when it will be clear from the context. We will repeatedly use the following simple Lemma in upper bounding the multitasking capacity in graph families. We refer to this Lemma as the induced matching Lemma.
Suppose the average degree of a vertex in in the graph is strictly smaller than . If is a -multitasker, then .
By the assumption, has a vertex of degree 0. Hence there exist a matching of size in not containing an induced matching of size . As required. ∎
Throughout the paper we will need the following concentration inequalities known as Chernoff’s bound.
Let be independent random variables where for every for all , and let . Then, for all it holds that
3 Upper bounds on the multitasking capacity
3.1 The regular case
In this section we prove Theorem 1.1 that upper bounds the multitasking capacity of arbitrary -regular multitaskers. We start the proof of Theorem 1.1 with the case . The following theorem shows that -regular -multitaskers must have .
Let , be a bipartite -regular graph where . Then contains a perfect matching such that every induced matching has size at most .
For the proof, we need the following bounds on the number of perfect matchings in -regular bipartite graphs.
Let , be a bipartite -regular graph where . Denote by the number of perfect matchings in . Then
Proof of Theorem 3.1.
Consider , where will be determined later. Clearly . By the upper bound in Lemma 3.1, every induced matching of size can be contained in at most perfect matchings. By the lower bound in Lemma 3.1, . Therefore, the average degree of the the vertices in is at most
Setting yields , and it can be verified that for all such . Therefore in this setting, the average degree of the vertices in is smaller than , which concludes the proof by Lemma 2. This completes the proof of the theorem. ∎
We record the following simple observation, which is immediate from the definition.
If is a -multitasker, then for all , the graph is a -multitasker.
If is a -regular -multitaskers with vertices on each side and , then .
Next, we prove that for smaller values of the multitasking capacity is upper bounded by .
Let be a -regular (bipartite) subgraph with , and let . Then, contains a matching of size , such that every induced matching has size for .
In particular, this rules out the existence -regular -multitaskers for any constant multitasking capacity . To see this, take any , and put and in the above theorem. It implies that .
In the proof of Theorem 3.2 we use the following result on the number of matchings of size in -regular bipartite graphs, known as the Lower Matching Conjecture and recently proven by Csikvàri [Csi14].
Let , be a bipartite -regular graph where . Denote by the number of matchings of size in . Then
In the setting of Lemma 3.1, if , then
Proof of Theorem 3.2.
For brevity, we refer to a matching of size as a -matching. Fix . Consider the graph . Clearly . For a given induced -matching, we can obviously upper-bound the number of -matchings that contain it by the total number of edge subsets of size that contain it, which is at most . By Corollary 3.1, . Therefore, the average degree of the vertices in is at most
If we choose such that this bound is smaller than , then there must be a vertex in with no neighbors in , and we are done by Lemma 2. Hence we need to satisfy
We now bound the terms on the left-hand side for an appropriate choice of . For the term is upper bounded by 2. For the term is upper bounded by . The term is upper-bounded by for any . Therefore, if we chose that satisfies both of the above inequalities, the average degree of the vertices in is at most , which is smaller than 1 for . Overall, choosing suffices. By noting that , we get as stated. ∎
Putting all the bounds together
3.2 Upper bounds for networks of depth larger than
A graph is a network with layers of width and degree , if is partitioned into independent sets of size each, such that each induced a -regular bipartite graph for all , and there are no additional edges in .
A top-bottom path in is a path such that for all , and are neighbors for all .
A set of node-disjoint top-bottom paths is called induced if for every two edges and such that , there is no edge in connecting and .
A set of node-disjoint top-bottom paths is induced if and only if for every it holds that is an induced matching in .
We say that a network as above is a -multitasker if every set of node-disjoint top-bottom paths contains an induced subset of size at least .
If is an -multitasker then .
Let be the bipartite graph in which side has a node for each set of node-disjoint top-bottom paths in , side has a node for each induced set of node-disjoint top-bottom paths in , and , are adjacent iff . Let be the maximum degree of side . We wish to upper-bound the average degree of side , which is upper-bounded by .
is clearly upper bounded by . It is a simple observation that equals , where denotes the number of perfect matchings in the bipartite graph . Since this graph is -regular, by the Falikman-Egorichev proof of the Van der Waerden conjecture ([Fal81], [Ego81]), or by Schrijver’s lower bound, we have and hence . To upper bound , fix , and let be the network resulting by removing all nodes and edges in from . This removes exactly nodes from each layer ; denote by the remaining nodes in this layer in . It is a straightforward observation that equals the number of sets of node-disjoint top-bottom paths in . Each such set decomposes into such that is a perfect matching on for each . Therefore where denotes the number of perfect matchings in . The latter is a bipartite graph with nodes on each side and maximum degree , and hence by the Bregman-Minc inequality, . Consequently, .
Putting everything together, we find that the average degree of side is upper bounded by
We will show that if then above bound is less than , which implies side has a node of degree , a contradiction. To this end, note that for this setting of we have
One can verify that,
For every constants , the function is maximized at .
3.3 The irregular case
Below we consider general graphs with average degree . This is in contrast to the previous section, where we considered only -regular graphs.
Let be a bipartite graph with nodes on each side, average degree , and maximum degree . If is an -multitasker, then .
Note that in case we get .
Proof of Theorem 3.4.
Denote . We use the following lemma to lower-bound the number of matchings of size in .
The number of matchings of size in is at least .
Consider the following greedy procedure: Initialize and . For , Choose an arbitrary edge in , and let denote the set of all edges in sharing an endpoint with . Set and let be the graph resulting from removing the edges from .
Initially has edges, and since the maximum degree is each iteration removes at most edges. Hence for every , the number of edges in is at least , where the last inequality is by recalling the setting of . Hence the number of different matchings that can be realized by the algorithm above is at least . ∎
We proceed to proving Theorem 3.4. Consider . Let be an induced matching of size in . Let be the graph resulting from removing all nodes participating in , together with their incident edges, from . Note that we remove every edge that has at least one endpoint matched in , even if its other endpoint does not participate in . The degree of in equals the number of -matchings in , which is clearly upper bounded by , since has edges and is a subgraph of . Furthermore we clearly have , which implies that has in total at most edges. Combining this with the lower bound on given by section 3.3, we get the following upper bound on the average degree of side in :
Where the final inequality is since for every , and in particular (for ), and .
If the average degree on side is less than then there is an isolated node in , which represents a -matching in that contains no induced matching of size , which contradicts being an -multitasker. Suppose for a sufficiently large constant . Then the term is less than . Furthermore the term is less than as long as , which holds for our setting since . Hence has average degree smaller than and the proof is finished. ∎
Note that Theorem 3.4 does not provide any nontrivial bound for when exceeds . It is, however, possible to establish nearly the same upper bound provided by this theorem with no assumption on . To do so we need the following lemma, which is proved following the approach of Pyber [Pyb85].
Every (bipartite) graph with vertices and average degree at least contains a subgraph in which the average degree is at least and the maximum degree is at most .
The word bipartite appears in brackets here since any graph contains a spanning bipartite subgraph in which the average degree is at least half of that of , hence the assertion of the lemma holds for general graphs as well, up to a factor of in the bound for .
Let be a bipartite graph with average degree . As long as it contains a vertex of degree smaller than omit it. This process must terminate with a nonempty graph, as the total number of edges deleted during the process is smaller than , that is, smaller than the number of edges of . Thus contains a bipartite subgraph with minimum degree at least . Let and be its vertex classes, where . Let be a minimal nonempty subset of (with respect to containment) so that There is such a set, since and it contains at least vertices as the number of neighbors of any nonempty set is at least . By the minimality since otherwise we can delete a vertex form and get a smaller set satisfying the condition. It is also clear, by minimality, that satisfies Hall’s condition and thus there is a matching saturating and . Let be the graph obtained from by removing all vertices besides those in and by removing the perfect matching from it. Then the degree of every vertex of in is at least . Let be a minimal nonempty subset of satisfying . As before, it clear that exists (and contains at least elements). It is also clear as before that satisfies Hall’s condition and hence there is a matching saturating and . Proceeding in this way we get a sequence of matchings in (and hence in ), where matches the vertices of with those of , and where and . Clearly and hence . Thus there is some so that . Fix such and let be the union of the matchings . Define , . Then the maximum degree of is clearly at most , as it is the union of matchings. The number of vertices of is and its number of edges is at least . Thus the average degree of is at least , completing the proof. ∎
Let be a bipartite graph with vertices on each side, and average degree . If is an -multitasker, then
A similar reasoning gives the following.
Let be a bipartite graph with vertices on each side, and average degree . If is an -multitasker, then
4 Constructions of Good Multitaskers
It is easy to design arbitrarily large -regular -multitaskers by simply taking disjoint edges, and -regular -multitaskers by taking a cycle of length ). More generally, one can obtain a -regular -multitaskers by taking disjoint copies of the bipartite clique . In fact, it is easy to see that any -regular graph is a -multitasker using the greedy algorithm that given a matching takes in each step an edge in the matching and removes at most edges that are in conflict with it, and repeats as long as possible. The challenge is to design multitaskers achieving that is an absolute constant (independent of , and ), where both and are as large as possible.
4.1 Several simple constructions
How can we lower bound the multitasking capability of a network? It turns out that a simple idea is to contract edges in a given matching and look for large independent sets in the resulting contracted graph. We first exemplify this idea when is a forest.
Let be a forest. Then is a -multitasker. In other words, if is a matching in , then contains an induced matching of size at least .
Consider an arbitrary matching in . Contract every edge to a single vertex . Since is a forest, the resulting graph induced on the contracted edges is a forest, hence it contains an independent set of size . The edges corresponding to the vertices in form an induced matching contained in of size at least . ∎
Note that holds for any graph which contains a path of length , as it contains a matching of size whose largest induced matching has size .
A similar argument also extends to the case where one is concerned with collections of disjoint induced -paths instead of matchings. One simply contracts paths instead of edges of the matcing in the proof of Lemma 4.1. It is also not hard to generalize the result above to the weighted case, where the edges in the matching have nonnegative weights. We omit the details.
The argument above can be generalized to minor-closed graph families. For example, we have the following result:
Every planar bipartite graph is a -multitasker.
The proof is similar to Lemma 4.1. For a matching , the graph obtained by contracting every matching in is planar. By the four-color Theorem, it has an independent set of size at least , concluding the proof. ∎
We note that the bound is tight for bipartite planar graphs. To see this consider the hypercube over vertices. It can be seen that contains a matching of size that does not contain any induced matching of size greater than , as is demonstrated in Figure 2.
Lemmas 4.1 and 4.1 deal with the setting , i.e. they work for matchings of any size, while posing a strict constant bound on the average degree ( in the case of forest, and in the planar case). Next we see how to obtain different trade-offs between and , while keeping constant. We start with the optimal , and prove that for there exists a -multitasker.
Fix , and let be sufficiently large. There exists a graph that is -multitasker for all , with .
It is well known there are (explicit) -vertex -regular bipartite graphs of girth . Since any edge set of size is a forest, the statement follows from Lemma 4.1. ∎
Next, we show that for small constants , we may achieve a significant increase in by showing existence of a -multitaskers for any .
Fix , let be sufficiently large, and suppose . There exists a -multitasker with vertices on each size, average degree , for all .
It is known (see, e.g., [FW16]) that for sufficiently large , there exist an -vertex graph with average degree such that every subgraph of of size has average degree at most . Define a bipartite graph such that and are two copies of , and for and we have if and only if . We get that the average degree of is , and for any two and such that , the average degree of is at most