Improved Image Augmentation for Convolutional Neural Networks by Copyout and CopyPairing
Image augmentation is a widely used technique to improve the performance of convolutional neural networks (CNNs). In common image shifting, cropping, flipping, shearing and rotating are used for augmentation. But there are more advanced techniques like Cutout and SamplePairing.
In this work we present two improvements of the state-of-the-art Cutout and SamplePairing techniques. Our new method called Copyout takes a square patch of another random training image and copies it onto a random location of each image used for training. The second technique we discovered is called CopyPairing. It combines Copyout and SamplePairing for further augmentation and even better performance.
We apply different experiments with these augmentation techniques on the CIFAR-10 dataset to evaluate and compare them under different configurations. In our experiments we show that Copyout reduces the test error rate by 8.18% compared with Cutout and 4.27% compared with SamplePairing. CopyPairing reduces the test error rate by 11.97% compared with Cutout and 8.21% compared with SamplePairing.
Image augmentation is a data augmentation method that generates more training data from the existing training samples. This way the neural network is trained with more different images. This helps to generalize better on new and unknown images and leads to better performance with smaller datasets. Image augmentation is especially useful in domains where training data is limited or expensive to obtain like in biomedical applications.
Image augmentation is a regularization technique like Dropout Hinton et al. (2012) and weight regularization Ng (2004). But it is different in one way. It is not part of the network itself which makes it easy and cheap to apply. Images can be augmented on runtime. While the GPU runs the neural network, the CPU can do the augmentation for the next batch in an asynchronous manner.
Two state-of-the-art techniques for advanced image augmentation are called Cutout Devries and Taylor (2017) and SamplePairing Inoue (2018). In our work we show two improvements to these techniques. Our main contributions are:
The Copyout image augmentation technique must be applied on runtime on each image before it is used to build the mini-batch for training. The results must not be stored and reused. This would reduce the variance of the training images and reduce the performance of the CNN. The augmentation is conducted in these three steps:
Select a target training image to augment. On this target image select a square area on a random location. By this way it is possible that the randomly selected square leaves the image on one or two sides. This can effectively change the shape of the square to a rectangle. As stated by Devries and Taylor (2017) this is an important feature and improves performance. In our implementation the center of the square never leaves the target image.
Randomly select a source image from the training set. On this source image also select an area of the same extend as the one we selected on the target image. This second area is also selected on a random location but must be fully placed within the source image.
Copy the area of the source image to the target image.
The extend of the square is a hyperparameter which must be set before training starts and does not change during the training process. Apparently the extend must be smaller than the training images. Devries and Taylor (2017) found that for Cutout the extend of the square is more important than the shape. This is the reason why we also decided to use squares for simplicity.
Examples of images augmented by Copyout are given in Figure 1.
CopyPairing is a mixture of Copyout and SamplePairing (Inoue (2018) and Appendix B). CopyPairing follows the same three phases as SamplePairing. The only difference is that it replaces the phases where SamplePairing is disabled by using Copyout. CopyPairing is conducted by the following three steps:
Only Copyout is enabled and SamplePairing is completely disabled for the first epochs.
SamplePairing is enabled for a few epochs and then Copyout is used for the next few epochs. This is alternating for the majority of the epochs.
At the end of training only Copyout is enabled while SamplePairing is completely disabled again. This is also called fine-tuning.
4.1 Experimental Setup
The CNN we use for our experiments consists of six 2D convolution layers with Glorot uniform kernel initializer Glorot and Bengio (2010), Rectified Linear Unit (ReLU) activation (Glorot et al., 2011) and Batch Normalization Ioffe and Szegedy (2015). Two dense layers are applied at the output and combined with two Dropout layers (Hinton et al., 2012). Although Ioffe and Szegedy (2015) propose to place the Batch Normalization layer in front of the activation function we found out that this CNN performs better when activation and Batch Normalization are interchanged. Appendix A Listing 1 presents the definition of the concrete CNN with Keras Chollet et al. (2015).
4.2 Basic Augmentation
For basic augmentation we randomly rotate the images by up to 15 deg. to the left and right, randomly flip them horizontally and randomly zoom them by a factor of up to 0.2. Appendix A Listing 2 presents our implementation of basic augmentation with the Keras ImageDataGenerator class.
It is important where and how basic augmentation is applied. When it is only applied to the target image and not the source image the performance will be reduced. It is important that the source image is also augmented. This can be achieved when basic augmentation is applied after the square has been copied. But we selected another method.
Our basic augmentation is combined with Copyout and SamplePairing by using a buffer. We Augment the target image, store it in a buffer and select the source image from this randomly. This way we have augmentation for source and target images.
Figure 2 displays different experiments with varying buffer sizes. It shows that the buffer size can be small without reducing performance. It also shows that augmentation of source images is important. The “Copyout - no Buf” experiment selects the source images from the whole training set without buffer and augmentation. Its median test error rate is significantly increased.
4.3 Method Comparison
We compared the different augmentation techniques while using the setup described in Section 4.1. All experiments have been combined with basic augmentation (see Section 1). Each experiment was repeated at least 12 times to get results with statistical relevance. Boxplots with test error rates are shown in Figure 3.
For the baseline experiment we only used basic augmentation and reached a median test error rate of 7.09%. With Cutout augmentation we reached a median test error rate of 6.6%.
For the SamplePairing experiment we used the following configuration: In phase 1 we disabled SamplePairing for the first 100 epochs. Phase 2 lasted 900 epochs, SamplePairing was turned on for 8 epochs and turned off for 2 epochs alternating. Phase 3 (fine-tuning) lasted the remaining 200 epochs. This corresponds to the setup of Inoue (2018).
With SamplePairing we reached a median test error rate of 6.33%.
For Copyout augmentation we selected a 16 pixel extend of the square as hyperparameter and reached a median test error rate of 6.06%. This is an 8.18% improvement compared with Cutout and a 4.27% improvement compared with SamplePairing.
The experiment with CopyPairing was done with the following configuration: For the Copyout part we selected a 16 pixel extend of the square as hyperparameter. In phase 1 we disabled the SamplePairing part for the first 100 epochs with only Copyout enabled. Phase 2 lasted 900 epochs with alternating SamplePairing and Copyout part for one epoch each. This was executed alternating. Phase 3 (fine-tuning) lasted the remaining 200 epochs with the SamplePairing part turned off and only Copyout enabled.
We reached a median test error rate of 5.81%. This is an 11.97% improvement compared with Cutout and an 8.21% improvement compared with SamplePairing. Even compared with the Copyout augmentation we reached an improvement of 4.13%.
4.4 Square Area Extent Comparison for Copyout
In our experiments with Copyout we selected a 16 pixel extend of the square as hyperparameter. To examine if the selection was the best choice we executed three different experiments. One with 16 pixel extend, one with 15 and 17 pixel extends. The result is that larger and smaller extend numbers for the square are lowering the performance for our experimental setting and that a 16 pixel extend is the best choice for this setup (see Figure 4).
4.5 CopyPairing Hyperparameter Comparison
As with Copyout we also wanted to compare the different hyperparameter settings for CopyPairing. In particular phase 2 of CopyPairing provides many different options for configuration. Therefore we selected a different ratio of epochs with SamplePairing and Copyout. The results are presented in Figure 5 and show that a ratio of one epoch with SamplePairing and one epoch with Copyout is best (see CopyPairing 1:1).
As a second experiment we investigated if the fine-tuning phase is needed. In this third phase (called fine-tuning) the SamplePairing is normally turned off and just Copyout is used for the last epochs of the training. In this experiment we just skipped this fine-tuning phase and extended the second phase until the end of training. The result is also shown in Figure 5 and indicates that the fine-tuning phase should be retained as it improves the performance.
In this work the application of Copyout and CopyPairing is described. We proved that they are a statistically significant improvement of the state-of-the-art image augmentation techniques Cutout and SamplePairing (see Section 4.3). But what are the reasons for this improvement?
We conducted several experiments to compare the average magnitude of feature activations as done by Devries and Taylor (2017). But this gave us ambiguous results. That is the reason why we give intuitive explanations. Cutout and Copyout both randomly remove visual features of the training images and forces the network to focus on a wider number of different features to distinguish between different images. This helps to generalize better on unknown pictures. Cutout and Copyout also help to handle occlusion. But what is the relevant distinction between Copyout and Cutout? Why does Copyout perform better?
Cutout performs the occlusion with just a gray square which can be considered wasted space. Copyout fills this space with more visual information which can be observed in Figure 1. One wing of the airplane on the right image is occluded by a frog. This way Copyout not only removes random visual features but also adds new random ones. These random features better correspond to the natural effect of occlusion. In real images for example a truck is more likely not occluded by a grey box but by a car. This helps with further generalization. The small squares of the Copyout technique with additional visual features moreover forces the CNN to differ between important and less important features. The intuitive explanation is that the network learns to focus more closely to the different details to distinguish between important features and misguiding details.
The best presented augmentation scheme is CopyPairing. But why does it improve performance compared to SamplePairing? The reason might be that SamplePairing is wasting augmentation potential when it is deactivated for a few epochs in phase 2 and completely deactivated in phase 3 for fine-tuning (also see Appendix B). This gap is filled in the CopyPairing technique with using Copyout in-between.
In summary we found two new augmentation techniques that improve the performance for image processing tasks with CNNs. This is especially useful in domains where training data is limited or expensive to obtain as in biomedical applications.
- Hinton et al. (2012) Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv.org, 2012, arXiv:1207.0580 [cs.NE]. URL https://arxiv.org/abs/1207.0580.
- Ng (2004) Andrew Y. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-first International Conference on Machine Learning, ICML ’04, pages 78–, New York, NY, USA, 2004. ACM. ISBN 1-58113-838-5. URL https://www.andrewng.org/portfolio/feature-selection-l1-vs-l2-regularization-and-rotational-invariance/.
- Devries and Taylor (2017) Terrance Devries and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout. arXiv.org, 2017, arXiv:1708.04552 [cs.CV]. URL http://arxiv.org/abs/1708.04552.
- Inoue (2018) Hiroshi Inoue. Data augmentation by pairing samples for images classification. arXiv.org, 2018, arXiv:1801.02929 [cs.LG]. URL http://arxiv.org/abs/1801.02929.
- Krizhevsky (2009) Alex Krizhevsky. Cifar-10 and cifar-100 datasets. http://www.cs.toronto.edu/~kriz/cifar.html, 2009.
- Glorot and Bengio (2010) Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256, 2010. URL http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf.
- Glorot et al. (2011) Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315–323, 2011. URL http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf.
- Ioffe and Szegedy (2015) Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv.org, 2015, arXiv:1502.03167 [cs.LG]. URL https://arxiv.org/abs/1502.03167.
- Chollet et al. (2015) François Chollet et al. Keras. https://keras.io, 2015.
- Kingma and Ba (2014) Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv.org, 2014, arXiv:1412.6980 [cs.LG]. URL https://arxiv.org/abs/1412.6980.
Appendix A Keras Code
Appendix B SamplePairing Overview
Since the CopyPairing augmentation is building on SamplePairing Inoue (2018) we give a short overview of SamplePairing. It takes the image it wants to augment, a second random image from the training set and then averages each pixel of the two images. The augmentation is done in these three steps:
SamplePairing is completely disabled for the first epochs.
SamplePairing is enabled for a few epochs and then disabled again. This is alternating for the majority of the epochs.
At the end of training SamplePairing is completely disabled again. This is called fine-tuning.
The SamplePairing paper describes several experiments. For the CIFAR-10 dataset the following configuration is applied: In phase 1 SamplePairing is disabled for the first 100 epochs. Phase 2 takes 700 epochs with SamplePairing enabled for 8 epochs and turned off for 2 epochs alternating. Phase 3 (fine-tuning) lasts for the remaining 200 epochs.