Modulation of early visual processing alleviates capacity limits
in solving multiple tasks
Modulation of early visual processing alleviates capacity limits
in solving multiple tasks
Sushrut Thorat††thanks: Equal contribution Giacomo Aldegheri11footnotemark: 1 Marcel A. J. van Gerven Marius V. Peelen firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com Donders Institute for Brain, Cognition and Behaviour, Radboud University
In daily life situations, we have to perform multiple tasks given a visual stimulus, which requires task-relevant information to be transmitted through our visual system. When it is not possible to transmit all the possibly relevant information to higher layers, due to a bottleneck, task-based modulation of early visual processing might be necessary. In this work, we report how the effectiveness of modulating the early processing stage of an artificial neural network depends on the information bottleneck faced by the network. The bottleneck is quantified by the number of tasks the network has to perform and the neural capacity of the later stage of the network. The effectiveness is gauged by the performance on multiple object detection tasks, where the network is trained with a recent multi-task optimization scheme. By associating neural modulations with task-based switching of the state of the network and characterizing when such switching is helpful in early processing, our results provide a functional perspective towards understanding why task-based modulation of early neural processes might be observed in the primate visual cortex111The code to train and analyze the networks mentioned here can be found at - https://github.com/novelmartis/early-vs-late-multi-task.
Keywords: neural modulation, multi-task learning, early visual cortex, attention, perception, capacity limits
Humans and other animals have to perform multiple tasks given a visual stimulus. For example, seeing a face, we may have to say whether it is happy or sad, or recognize its identity. For each of these tasks, a subset of all the features of the face are useful. In principle, it could be possible for a visual system to extract all of the features necessary to solve all possible tasks, and then select the relevant information from this rich representation downstream. However, as the number of tasks increases, a network with a limited capacity may not be able to extract all of the potentially relevant features (an information bottleneck is manifest), requiring the information that is extracted from the stimulus in the early processing stages to change according to the task.
Several studies in neuroscience have found evidence for such task-dependent modulations of sensory processing in the primate visual system, including at the early levels (?, ?, ?, ?). For example, human neuroimaging studies have shown that attending to a stimulus could lead to an increase in the accuracy with which its task-relevant features could be decoded by a classifier in early visual areas (?, ?), and neurophysiological experiments in nonhuman primates have shown that the stimulus selectivity of neurons in primary visual cortex was dependent on the task the monkeys had to perform (?, ?).
Despite the observation of such modulations of early visual processing, it is not clear whether they are causally necessary for performing better on the corresponding tasks. This question has been addressed by deploying biologically-inspired task-based modulations on computational models. ? (?) showed that task-based modulation deployed on multiple stages of a convolutional neural network improves performance on challenging object classification tasks. Other recent work (?, ?, ?) has also shown that task-based modulation of early visual processing aids in object detection and segmentation in addition to the task-based modulation of late processing. However, the conditions under which early modulation can be beneficial in performing multiple tasks have not been systematically investigated.
In the present work, we assessed the effectiveness of task-based modulation of early visual processing as a function of an information bottleneck in a neural network, quantified by the number of tasks the network had to execute and the neural capacity of the network. To do so, we trained networks to, given an image, provide an answer conditioned on the cued task. Every task required detecting the presence of the corresponding object in the image. The networks were trained according to a recent framework proposed in the field of continual learning (?, ?), which helps them execute multiple tasks by switching their state given a task cue, in order to transmit relevant information through the network. In this work, to quantify the effectiveness of task-based modulation of early neural processing, we measured the increase in performance provided by modulating early neural processing in addition to modulating the late neural processing in the networks.
Task and system description
In a multi-task setting, object detection can be thought of as solving one of a set of possible binary classification (one object versus the rest) problems. Given an image and a task cue indicating the identity of the object to be detected, a network had to output if the object in the image matched the task cue.
We used MNIST (?, ?) digits and their permutations as objects (?, ?). The original MNIST dataset has px images of digits. Each permuted version consists of images of those digits, whose pixels undergo a given permutation, creating new objects. We varied the number of permutations used (, , and ) to modulate the number of tasks the networks had to perform (which are times the number of permutations).
We considered a multilayer perceptron with rectified linear units (ReLU), which had one hidden layer between the input (image) and the binary output. The number of neurons in the hidden layer were variable (, , and ) and determined the neural capacity of the late stage of the network.
Task-based modulation and its function
Modelling biological neurons as perceptrons (?, ?), task-based modulations have been shown to affect the effective biases and gains of the neurons (?, ?, ?, ?). The nature of modulation - which neurons to modulate and how - is under debate (?, ?, ?). We adapted these findings by introducing task-based modulation into our networks via the biases of the perceptrons and the gains of their ReLU activation functions. The modulations were then trained end-to-end with the rest of the network.
Given a particular task, the task cue is a one-hot encoding of the relevant object. Task-based modulation is mediated through bias and gain modulation in the following manner.
where the transformation between layers and () is modulated by changing the slope of the ReLU activation function (gain, ) in and the bias () to the perceptrons in ; are the pre-gain activations of the perceptrons in , is the task-independent transformation matrix between and , and map the task cue (one-hot encoding of the relevant object ) to the gain and bias modulations of the perceptrons in respectively, and refers to element-wise multiplication.
Given a task , modulating the gains of the pre-synaptic perceptrons (in ) and the biases of the post-synaptic perceptrons (in ) transforms the information transformation between and . This allows for the transmission of information required to perform task , while ignoring the information required to perform the other tasks, as formalized in ? (?). This transformation can also be thought of as the network switching its state to selectively transmit task-relevant information downstream (see Figure 1). The conditions - the nature of these modulations and the neural capacity of the network - under which the network can switch between a given number of tasks, are preliminarily described in ? (?).
Here, for every relevant layer , , , and were jointly learned for the given number of tasks.
Evaluation metric and expected trends
The effectiveness of early neural modulation was quantified by the average absolute increase in detection performance across all the tasks when modulations were implemented on both the transformations and ( corresponds to the input layer and to the output layer) as opposed to when the modulations were trained on the transformation only.
We expected the effectiveness of task-based early neural modulation to be directly proportional to the number of neurons in and inversely proportional to the number of tasks (permuted MNIST sets used).
Neural network training details
All the networks were trained with adaptive stochastic gradient descent with backpropagation through the ADAM optimizer (?, ?) with the default settings in TensorFlow (v1.4.0) and . We used a batch size of . Half of each batch contained randomly selected images of randomly selected tasks where the cued object was present, and half where the cued object was not present. These images were taken from the MNIST training set and its corresponding permutations. The images were augmented by adding small translations and some noise. We trained each network with such batches. The relevant metrics discussed in the previous section are computed at the end of training over a batch of size created from the MNIST test set and its corresponding permutations.
We first analyzed the detection performance of the network with only modulation. The network performance as a function of the number of neurons in and the number of detection tasks the network had to perform is shown in Figure 2 (red circles). The network performance increased with an increase in the number of neurons in , as the neural capacity increased. The performance decreased with an increase in the number of tasks to be performed, as the representational capacity of the network for any one task was reduced. A network with as little as neurons in its hidden layer was able to switch between as many as detection tasks, while keeping the average detection performance across all the tasks as high as , thus replicating the success of the multi-task learning framework proposed by ? (?).
To assess the dependence of the effectiveness of task-based modulation of early neural processing () on the bottleneck in the network, we analyzed the boost in average detection performance when task-based modulation of was deployed in addition to task-based modulation of , as a function of the number of neurons in and the number of detection tasks the network had to perform. The resulting boosts are shown in Figure 2 ( quantification). The performance boost increased as the number of neurons in decreased, and as the number of tasks the network has to perform increased. This confirms the hypothesis that task-based modulation of early neural processing is essential when an information bottleneck exists in a subsequent processing stage.
The contribution of bias and gain modulation
Gain, but not so much bias, modulation of neural responses has been observed in experiments investigating feature-based attention in the monkey/human brain (?, ?, ?). We assessed how the two contributed to the overall modulation of the transformations in the network.
We selectively turned off the bias or gain modulation for all the variants of the network that were trained. The average detection performance decreased by when gain modulation was turned off, and by when bias modulation was turned off, suggesting that in our framework, when jointly deployed, gain modulation is more important than bias modulation in switching the state of the network to be able to perform the desired task well.
We also trained a network with neurons in , on permutations of MNIST, with gain-only or bias-only modulations of both the and transformations. When the gain and bias modulations were jointly trained, the network performance was . With gain-only modulation, the performance was , and with bias-only modulation the performance was . As the performance when only bias modulation was deployed was much higher than chance (), we can conclude that bias modulation alone can also lead to efficient task-switching. When the bias and gain modulations are jointly trained, gain might take over as it multiplicatively impacts responses, and therefore has higher gradients during training, as opposed to the additive impact of bias.
Adding to the discussion about the functional role of task-based modulation of early neural processing, in this work we have shown that modulating the early layer of an artificial neural network in a task-dependent manner can boost performance, beyond just modulating the late layer, in a multi-task learning scenario in which a network contains an information bottleneck, either due to a large number of tasks to be performed or to a small number of units in the late layer.
Adapting a formalism proposed by ? (?), we showed how bias and gain modulation, two prevalent neuronal implementations of top-down modulation in the brain, could functionally lead to switching the state of a network to perform transformations effective for the task at hand. While task-dependent computations are widespread in higher-level areas of the primate brain, such as prefrontal cortex (?, ?), it is not clear to what extent sensory streams (which perform early visual processing) can also be seen as switching their state according to the current task (although see ? (?) for a proposal), and what the functional relevance of doing so would be. Here we show how, in principle, this switching could be computationally advantageous when it is not possible to send the information required for all tasks to higher layers, which might well be the case in the complex environments that humans and other animals are able to navigate.
To further investigate the relevance of our findings to biological visual systems, in follow-up work we intend to deploy our modulation scheme on architectures that bear more similarity to the primate visual hierarchy, such as deep convolutional networks (?, ?), datasets of naturalistic images such as ImageNet (?, ?), and general naturalistic tasks such as visual question answering (?, ?). This will allow us to assess whether the functional advantage provided by early modulation holds true in a more realistic scenario, and whether the resulting modulation schemes resemble those observed in the early visual areas of the primate brain.
Finally, a key aspect of our approach is the fact that the network is constantly operating in a task-dependent manner. Most previous approaches to task-dependent modulation have assumed the presence of an underlying task-free representation on which the modulation operates (for example, in the case of ? (?) this corresponds to a network pre-trained on object recognition). Providing the network with task cues during the training phase, on the other hand, has been used in the field of continual learning (?, ?, ?, ?), and according to one influential theory in neuroscience, the interplay between sparse, context-specific information encoded by the hippocampus and shared structural information in the neocortex is crucial for learning new tasks without overwriting previous ones (?, ?). To our knowledge, the question of how the task-based modulations observed in visual cortex might be learned has not been explicitly addressed in previous literature. On the one hand, it is possible that a context-free representation is learned first, possibly through unsupervised learning, and then modulated upon. On the other, learning of representations and task modulations might interact at all stages, allowing the representations to be optimized for the type of modulations they are subject to. Whether one scheme or the other constitutes a better explanation for the modulations observed in biological visual systems is an important direction for future research.
This work was funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. ). This manuscript reflects only the authors’ view, and the agency is not responsible for any use that may be made of the information it contains.
- Agrawal et al.Agrawal et al. Agrawal, A., Lu, J., Antol, S., Mitchell, M., Zitnick, C. L., Parikh, D., Batra, D. (2017). Vqa: Visual question answering. International Journal of Computer Vision, 123(1), 4–31.
- BoyntonBoynton Boynton, G. M. (2009). A framework for describing the effects of attention on visual responses. Vision Research, 49(10), 1129–1143.
- CarrascoCarrasco Carrasco, M. (2011). Visual attention: The past 25 years. Vision Research, 51(13), 1484–1525.
- Cheung, Terekhov, Chen, Agrawal, OlshausenCheung et al. Cheung, B., Terekhov, A., Chen, Y., Agrawal, P., Olshausen, B. (2019). Superposition of many models into one. arXiv preprint arXiv:1902.05522.
- Gilbert LiGilbert Li Gilbert, C. D., Li, W. (2013). Top-down influences on visual processing. Nature Reviews Neuroscience, 14(5), 350–363.
- Jehee, Brady, TongJehee et al. Jehee, J. F., Brady, D. K., Tong, F. (2011). Attention improves encoding of task-relevant features in the human visual cortex. Journal of Neuroscience, 31(22), 8210–8219.
- Kingma BaKingma Ba Kingma, D. P., Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Kirkpatrick et al.Kirkpatrick et al. Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., … others (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526.
- KriegeskorteKriegeskorte Kriegeskorte, N. (2015). Deep neural networks: a new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1, 417–446.
- Kumaran, Hassabis, McClellandKumaran et al. Kumaran, D., Hassabis, D., McClelland, J. L. (2016). What learning systems do intelligent agents need? complementary learning systems theory updated. Trends in Cognitive Sciences, 20(7), 512–534.
- LeCun, Bottou, Bengio, HaffnerLeCun et al. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
- Lindsay MillerLindsay Miller Lindsay, G. W., Miller, K. D. (2018). How biological attention mechanisms improve task performance in a large-scale visual system model. eLife, 7, e38105.
- Ling, Liu, CarrascoLing et al. Ling, S., Liu, T., Carrasco, M. (2009). How spatial and feature-based attention affect the gain and tuning of population responses. Vision Research, 49(10), 1194–1204.
- Mante, Sussillo, Shenoy, NewsomeMante et al. Mante, V., Sussillo, D., Shenoy, K. V., Newsome, W. T. (2013). Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature, 503(7474), 78.
- Masse, Grant, FreedmanMasse et al. Masse, N. Y., Grant, G. D., Freedman, D. J. (2018). Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization. Proceedings of the National Academy of Sciences, 115(44), E10467–E10475.
- Maunsell TreueMaunsell Treue Maunsell, J. H., Treue, S. (2006). Feature-based attention in visual cortex. Trends in Neurosciences, 29(6), 317–322.
- RosenblattRosenblatt Rosenblatt, F. (1957). The perceptron, a perceiving and recognizing automaton project para. Cornell Aeronautical Laboratory.
- Rosenfeld, Biparva, TsotsosRosenfeld et al. Rosenfeld, A., Biparva, M., Tsotsos, J. K. (2018). Priming neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 2011–2020).
- Russakovsky et al.Russakovsky et al. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … others (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
- Thorat, van Gerven, PeelenThorat et al. Thorat, S., van Gerven, M., Peelen, M. (2018). The functional role of cue-driven feature-based feedback in object recognition. In Conference on Cognitive Computational Neuroscience, CCN 2018 (pp. 1–4).
- Yang, Joglekar, Song, Newsome, WangYang et al. Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T., Wang, X.-J. (2019). Task representations in neural networks trained to perform many cognitive tasks. Nature Neuroscience, 22(2), 297.