Fixed smooth convolutional layer for avoiding checkerboard artifacts in CNNs
Abstract
In this paper, we propose a fixed convolutional layer with an order of smoothness not only for avoiding checkerboard artifacts in convolutional neural networks (CNNs) but also for enhancing the performance of CNNs, where the smoothness of its filter kernel can be controlled by a parameter. It is wellknown that a number of CNNs generate checkerboard artifacts in both of two process: forwardpropagation of upsampling layers and backwardpropagation of strided convolutional layers. The proposed layer can perfectly prevent checkerboard artifacts caused by strided convolutional layers or upsampling layers including transposed convolutional layers. In an imageclassification experiment with four CNNs: a simple CNN, VGG8, ResNet18, and ResNet101, applying the fixed layers to these CNNs is shown to improve the classification performance of all CNNs. In addition, the fixed layer are applied to generative adversarial networks (GANs), for the first time. From imagegeneration results, a smoother fixed convolutional layer is demonstrated to enable us to improve the quality of images generated with GANs.
Yuma Kinoshita and Hitoshi Kiya
\addressTokyo Metropolitan University, Tokyo, Japan
\ninept
{keywords}
Checkerboard artifacts, Convolutional neural networks, Deep learning,
Fixed convolutional layers, Generative adversarial networks
1 Introduction
Convolutional neural networks (CNNs) [1, 2], which are a kind of deep neural networks (DNNs), have attracted attention because of their outstanding performance and have widely been used in many fields: image processing, natural language processing, acoustic/speech processing, and more. In research fields of imagetoimage translation problems, e.g., image superresolution, it is wellknown that forwardpropagation of upsampling layers including transposed convolutional layers causes images to be distorted by checkerboard artifacts [3]. In addition, checkerboard artifacts are also caused by backwardpropagation of downsampling layers including strided convolutional layers [3]. CNN architectures usually have upsampling layers and/or have downsampling layers, such as VGG [4], ResNet [5], and UNet [6], for increasing and/or reducing the spatial sampling rate of feature maps, respectively [7]. For this reason, checkerboard artifacts affect most commonlyused CNNs.
To overcome checkerboard artifacts caused by upsampling layers, a few research has been done to reduce the effects of checkerboard artifacts [8, 9, 3, 10, 11, 12, 13]. In particular, Sugawara et al. [12, 13] gave us two approaches to perfectly prevent checkerboard artifacts by extending a condition for avoiding checkerboard artifacts in linear multirate systems [14, 15, 16, 17]. However, the literature [12, 13] mainly focuses on checkerboard artifacts caused by upsampling layers, but there are few discussion about artifacts caused by downsampling layers. In addition, Sugawara et al. did not consider CNNs that have both upsampling and downsampling layers, such as generative adversarial networks (GANs) [18]. Furthermore, in Sugawara’s method, only a zeroorder hold kernel that has the lowest smoothness is used for avoiding checkerboard artifacts. Note that the smoothness of filter kernels is different from what has widely been considered for GANs, i.e., Lipschitz constraints [19, 20, 21]. Hence, the effects of the smoothness on the performance of CNNs has never been discussed so far.
Because of such a situation, in this paper, we propose a novel fixed convolutional layer with an order of smoothness not only for perfectly avoiding checkerboard artifacts caused by upsampling and downsampling layers but also for enhancing the performance of CNNs. A filter kernel of the proposed layer is given by convolving a zeroorder hold kernel multiple times. When using the proposed layer, we can control the smoothness of its filter kernel by changing the number of the convolution, referred to as the order of smoothness. In addition, we apply the proposed fixed layer to GANs for the first time,
We performed an imageclassification experiment and an imagegeneration experiment in order to evaluate the effectiveness of the proposed fixed smooth convolutional layers. In the imageclassification experiment with four CNNs: a simple CNN, VGG8, ResNet18, and ResNet101, applying the proposed layer is shown to improve the classification performance of all CNNs. From the imagegeneration results, a smoother fixed convolutional layer enables us not only to prevent checkerboard artifacts in GANs but also to improve the quality of images generated with GANs.
2 Preparation
In this section, we briefly summarize CNNs and checkerboard artifacts, where we only focus on 2D convolutional layers for simplicity. We use notations shown in Table 1 throughout this paper.
Symbol  Definition 

A scalar  
A vector  
Element of vector , with indexing starting at  
A matrix  
Element of matrix  
Row of matrix  
Column of matrix  
A 3D or higherdimensional tensor  
Element of 3D tensor  
2D slice of 3D tensor  
A 3D tensor with a size of channel height width, which denotes an input feature map for a layer.  
A 4D tensor with a size of output channel input channel height width, which denotes a filter kernel (weights) of a convolutional layer.  
A vector with a size of output channel which denotes a bias of a convolutional layer.  
A 3D tensor with a size of output channel height width, which denotes an output feature map of a layer.  
A singlechannel image which is a 2D slice of .  
A 2D filter which is a 2D slice of filter kernel .  
A loss function  
The set of nonnegative integers  
The convolution on two matrices and 
2.1 Convolutional neural networks
A CNN is one of DNNs that has one or more convolutional layer(s). Calculation in a convolutional layer is given as
(1) 
where denotes an input feature map, is a filter kernel, and is a bias. Focusing on the th channel of an output feature map, the calculation of eq. (1) can be illustrated as a blockdiagram by using a 2D filter and a singlechannel image = , as shown in Fig. 1(\subreffig:block_conv).
CNNs usually include upsampling layers and/or downsampling layers, to increase and/or reduce the spatial sampling rate (or the resolution) of feature maps, respectively. Typical downsampling layers are: strided convolutional layers, average pooling layers, and max pooing layers [7]. Here, average pooling layers and max pooling layers can be replaced with strided convolutional layers because:

An average pooling layer is a particular kind of a strided convolutional layer.

Replacing a maxpooling layer has little effect on the performance of CNNs [22].
For these reasons, we focus on strided convolutional layers as downsampling layers. A strided convolutional layer is defined by the following equation:
(2) 
where is a parameter called stride. Focusing on the th channel of an output feature map, the calculation of a strided convolutional layer corresponds to downsampling signals with a downsampling rate after convolving an input feature map with a filter [see Fig. 1(\subreffig:block_strided)].
Upsampling layers includes a transposed convolutional layer, a subsampling layer [23], and a resize convolution layer [3]. In these layers, the most commonlyused transposed convolutional layer is given as
(3) 
In contrast to a strided convolutional layer, the calculation for obtaining the th channel of an output feature map of a transposed convolutional layer corresponds to upsampling an input feature map with an upsampling rate and then convolving the resulting feature map with a filter [see Fig. 1(\subreffig:block_trans)].
2.2 Upsampling and checkerboard artifacts
Checkerboard artifacts have been studied as a distortion caused by using upsamplers in linear multirate systems [14, 15, 16, 17]. Figure 2(\subreffig:lin_interpolator) shows a simple linear multirate system called linear interpolator that consists of an upsampler with an upsampling rate and a linear timeinvariant filter . To avoid checkerboard artifacts caused by the linear interpolator, it is necessary and sufficient that its filter satisfies the following condition:
(4) 
where and are the zeroorder hold kernel with a size of and another linear timeinvariant filter, respectively, and means the convolution on two matrices.
A transposed convolutional layer has nonlinear interpolators having a bias , as in Fig. 3(\subreffig:nonlin_interpolator). Sugawara et al. [12, 13] realized the perfect avoidance of checkerboard artifacts in nonlinear interpolators by using the following two approaches:

Insert after adding a bias , without any constraints on a filter [Fig. 3(\subreffig:nonlin_wo_checkerboard_2)].
Note that Approaches 1 and 2 cannot be equivalently converted to each other.
However, the literature [13] mainly focuses on checkerboard artifacts caused by upsampling layers, i.e., artifacts in the forwardpropagation, but there are few discussion about artifacts caused by downsampling layers, i.e., artifacts in the backwardpropagation. In addition, Sugawara et al. did not consider CNNs that have both upsampling and downsampling layers, such as GANs. Furthermore, in Sugawara’s method, a zeroorder hold kernel that has the lowest smoothness is used for avoiding checkerboard artifacts, although we have a choice of the smoothness of filter kernels.
3 Proposed method
Figure 4 shows an usage of the fixed smooth convolutional layer that we propose. The proposed layer is a (depthwise) convolutional layer and the smoothness of its filter kernel is controlled by a parameter called the order of smoothness. Similarly to Sugawara’s method in Section 2.2, there are two approaches to apply the proposed layer to a transposed convolutional layer or a strided convolutional layer, depending on the location of a bias .
3.1 Checkerboard artifacts in CNNs
Checkerboard artifacts in CNNs can be classified into artifacts in the forwardpropagation and ones in the backwardpropagation.
Checkerboard artifacts in the forwardpropagation are caused by interpolators in transposed convolutional layers as shown in Fig. 1(\subreffig:block_trans), where the forwardpropagation means a process calculating a new feature map from an input feature map and passing the new feature map to the next layer. All filter kernels in transposed convolutional layers have to be trained to satisfy the condition in eq. (4), to prevent checkerboard artifacts. However, checkerboard artifacts cannot be prevented in practice because it is difficult to perfectly satisfy this condition by using an optimization algorithm. To resolve the issue, the use of fixed layers is required. Conventional CNNs with upsampling layers, which are mainly used for image superresolution, distort an output feature map of each layer and an output image due to the artifacts. In addition, the artifacts affect training CNN models because the distorted output images are used for evaluating a loss function, for training the CNNs.
Checkerboard artifacts are also caused in the backwardpropagation of a strided convolutional layer. The backwardpropagation is a process calculating gradients of a loss function with respect to parameters in a CNN, in order from the output layer to the input layer. In a strided convolutional layer, calculation of the gradient with respect to an input feature map is done by using transposed convolution as
(5) 
where a 3D tensor of a gradient is given by an outputside layer and denotes a filter kernel that the strided convolutional layer has. For this reason, checkerboard artifacts appear in calculated gradients of CNNs with downsampling layers, which are generally used for image classification. Therefore, a filter kernels and a bias in these CNNs will be affected by the checkerboard artifacts in gradients.
3.2 Fixed smooth convolutional layer
A filter kernel of the proposed layer is obtained by convolving multiple times, as
(6) 
where a parameter , referred to as the order of smoothness, controls the smoothness of . When the proposed layer is applied to an upsampling layer with an upsampling rate , the kernel size for is given as . In contrast, the kernel size is given as when the proposed layer is applied to a downsampling layer with a downsampling rate . By using a filter kernel and a trainable bias , an output feature map of the proposed layer can be written as
(7) 
3.3 Avoiding checkerboard artifacts by the proposed layer
Checkerboard artifacts caused by a transposed convolutional layer can perfectly be prevented by using the proposed layer as follows [see also Fig. 4(\subreffig:proposed_up)]:

Fix a bias in a transposed convolutional layer as , and insert the proposed layer having a trainable bias after the transposed convolutional layer.

Insert the proposed layer having a bias which is fixed as 0 after a transposed convolutional layer having a trainable bias .
These approaches correspond to Approaches 1 and 2 in Section 2.2, respectively. The use of Approach 1 allows us to reduce computational costs since the redundancy of interpolators can be removed by the polyphase decomposition, but it will negate the effect of the bias since the batch normalization (BN) is generally applied after each transposed convolutional layer. In contrast, under the use of Approach 2, the bias is not affected by the BN but its computational costs cannot be reduced.
Checkerboard artifacts caused by a strided convolutional layer can also be prevented by inserting the proposed layer before the strided convolutional layer, in an opposite way to a transposed convolutional layer, because checkerboard artifacts in the strided convolutional layer occur in the backwardpropagation [see 4(\subreffig:proposed_down)]. Similarly to the case of transposed convolutional layers, we can consider two approaches in terms of the location of a bias.
Both Approaches 1 and 2 for the proposed fixed layer can prevent checkerboard artifacts caused by both transposed convolutional layers and strided convolutional layers, under any order of smoothness . In the next section, we will confirm that both Approaches 1 and 2 enhance the performance of CNNs and evaluate the quality of images generated by GANs under the use of the proposed layer having a highorder of smoothness.
4 Simulation
To evaluate the effectiveness of avoiding checkerboard artifacts by the proposed layer, we performed two simulations: image classification and image generation.
4.1 Image classification
We evaluated the classification accuracy of four CNNs with/without the proposed layer, in order to confirm the effectiveness of avoiding checkerboard artifacts in the backwardpropagation. Here, we set all filter kernels of the proposed fixed layers as . The following are the four CNNs used in this simulation: a simple CNN illustrated in Table 2, VGG8, ResNet18, and ResNet101, where ReLU in the table denotes the rectified linear unit activation function [24].
Layer  Stride  Kernel size  ch 

Conv. + BN + ReLU  2  64  
Conv. + BN + ReLU  2  128  
Conv. + BN + ReLU  2  256  
Conv. + BN + ReLU  2  512  
Conv. + BN + ReLU  1  10 
In this simulation, we used the CIFER10 dataset [25] without any data augmentation for training and testing CNNs. Each CNN was trained with 300 epochs by using the Adam optimizer [26], where parameters in Adam were set as an initial learning rate of 0.1, , and , and we utilized an initial learning rate of 0.01 only for VGG8. In addition, each learning rate was multiplied by when the number of epoch reached 150 and 225.
Figure 5 shows the unitstep response of the backwardpropagation of the second convolutional layer in the simple CNN with/without the proposed layers. From Fig. 5(\subreffig:step_response_conv), we can confirm that checkerboard artifacts appeared in calculated gradients when the simple CNN without the proposed layer was used. The artifacts were perfectly prevented by the proposed layers [see 5(\subreffig:step_response_conv)].
Table 3 illustrates the classification accuracy of the four CNNs. From Table 3, the proposed fixed layer improved the accuracy for the test set of all four CNNs. This improvement was confirmed in both Approaches 1 and 2 for avoiding checkerboard artifacts. The results illustrate that conventional CNNs for image classification were affected by checkerboard artifacts, and preventing the artifacts is effective for improving their performance.
Network  Conv.  Prop. (Approach 1)  Prop. (Approach 2) 

Simple CNN  0.999/0.585  0.999/0.645  0.976/0.671 
VGG8  1.000/0.795  1.000/0.840  1.000/0.839 
ResNet18  1.000/0.812  1.000/0.862  1.000/0.862 
ResNet101  1.000/0.852  1.000/0.863  1.000/0.863 
4.2 Image generation
In the simulation for image generation, we used generative adversarial networks (GANs) [18, 27]. Basic GANs consists of two networks called “Generator” and “Discriminator,” respectively. Generally, a generator includes transposed convolutional layers and a discriminator includes strided convolutional layers. For this reason, checkerboard artifacts are generated in both the forward and backwardpropagation of GANs. To avoid the artifacts, we applied the proposed layers to GANs, where filter kernels of the proposed layer were , and .
Tables 4 and 5 show the architectures of a generator and a discriminator used in the simulation, respectively, where Leaky ReLU [28] was utilized as activation functions in these networks.
Layer  Stride  Kernel size  ch 

Trans. conv. + BN + Leaky ReLU  2  512  
Trans. conv. + BN + Leaky ReLU  2  512  
Trans. conv. + BN + Leaky ReLU  2  512  
Trans. conv. + BN + Leaky ReLU  2  256  
Trans. conv. + BN + Leaky ReLU  2  128  
Trans. conv. + BN + Leaky ReLU  2  64  
Conv. + Tanh  1  3 
Layer  Stride  Kernel size  ch 

Conv. + BN + Leaky ReLU  2  8  
Conv. + BN + Leaky ReLU  2  32  
Conv. + BN + Leaky ReLU  2  64  
Conv. + BN + Leaky ReLU  2  64  
Conv.  1  1 
We trained GANs with 10 epochs by using 202,599 images in the CelebA [29] dataset in order to generate face images from 100dimensional random vectors that follow the standard normal distribution, where the images were resized to pixels. The Adam optimizer [26] with an initial learning rate of 0.08, , and was utilized for training both the generator and the discriminator.
Figure 6 illustrates an example of face images generated by using trained GANs. From the figure, checkerboard artifacts in generated images were suppressed by using the proposed layer although conventional GANs without the proposed layer caused the artifacts. Furthermore, the use of the proposed layers enables us to improve the quality of generated images. By comparing figures 6(\subreffig:prop_gan_0)ï¼(\subreffig:prop_gan_1)ï¼ and (\subreffig:prop_gan_1), it is confirmed that a sharp and clear image was generated when filter kernels and were used. From these results, the fixed convolutional layer having a smoother filter kernel provides GANs with a better performance.
5 Conclusion
We proposed a novel fixed convolutional layer with an order of smoothness. The proposed layer can not only perfectly avoid checkerboard artifacts caused by transposed convolutional layers and strided convolutional layers, but also enhance the performance of CNNs including GANs. A filter kernel in the proposed layer is calculated by using a zeroorder hold kernel and a parameter socalled the order of smoothness. From an imageclassification experiment, it was confirmed that avoiding checkerboard artifacts by the proposed layer improved the classification performance of all four CNNs that we used in the experiment. In addition, an imagegeneration experiment demonstrated that a smoother fixed convolutional layer enables us to improve the quality of generated images by GANs while avoiding checkerboard artifacts, for the first time. In future work, we will further evaluate the proposed layer by more detailed simulations using other network architectures such as UNet and discuss its effectiveness theoretically.
Footnotes
 This work was supported by JSPS KAKENHI Grant Number JP18J20326.
References
 Y. LeCun, “Generalization and Network Design Strategies,” Department of Computer Science, University of Toronto, Tech. Rep., 1989.
 A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proc. of International Conference on Neural Information Processing Systems, 2012, pp. 1097–1105.
 A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and Checkerboard Artifacts,” 2016. [Online]. Available: https://distill.pub/2016/deconvcheckerboard/
 K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for LargeScale Image Recognition,” arXiv preprint arXiv:1409.1556, Sep. 2014. [Online]. Available: http://arxiv.org/abs/1409.1556
 K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp. 770–778.
 O. Ronneberger, P.Fischer, and T. Brox, “UNet: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and ComputerAssisted Intervention, ser. LNCS, vol. 9351, Nov. 2015, pp. 234–241.
 I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT Press, 2016.
 J. Johnson, A. Alahi, and L. FeiFei, “Perceptual Losses for RealTime Style Transfer and SuperResolution,” in Proc. of European Conference on Computer Vision, 2016, pp. 694–711.
 A. Aitken, C. Ledig, L. Theis, J. Caballero, Z. Wang, and W. Shi, “Checkerboard artifact free subpixel convolution: A note on subpixel convolution, resize convolution and convolution resize,” arXiv preprint arXiv:1707.02937, Jul. 2017. [Online]. Available: http://arxiv.org/abs/1707.02937
 H. Gao, H. Yuan, Z. Wang, and S. Ji, “Pixel Deconvolutional Networks,” arXiv preprint arXiv:1705.06820, May 2017. [Online]. Available: http://arxiv.org/abs/1705.06820
 Z. Wojna, V. Ferrari, S. Guadarrama, N. Silberman, L.C. Chen, A. Fathi, and J. Uijlings, “The Devil is in the Decoder: Classification, Regression and GANs,” arXiv preprint arXiv:1707.05847, Jul. 2017. [Online]. Available: http://arxiv.org/abs/1707.05847
 Y. Sugawara, S. Shiota, and H. Kiya, “SuperResolution Using Convolutional Neural Networks Without Any Checkerboard Artifacts,” in Proc. of IEEE International Conference on Image Processing, Oct. 2018, pp. 66–70.
 Y. Sugawara, S. Shiota, and H. Kiya, “Checkerboard artifacts free convolutional neural networks,” APSIPA Transactions on Signal and Information Processing, vol. 8, no. e9, Feb. 2019.
 Y. Harada, S. Muramatsu, and H. Kiya, “Multidimensional Multirate Filter without Checkerboard Effects,” in Proc. of European Signal Processing Conference, 1998, pp. 1881–1884.
 T. Tamura, M. Kato, T. Yoshida, and A. Nishihara, “Design of CheckerboardDistortionFree Multidimensional Multirate Filters,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E81A, no. 8, pp. 1598–1606, Aug. 1998.
 Y. Harada, S. Muramatsu, and H. Kiya, “Multidimensional Multirate Filter and Filter Bank without Checkerboard Effect,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E81A, no. 8, pp. 1607–1615, Aug. 1998.
 H. Iwai, M. Iwahashi, and H. Kiya, “Methods for Avoiding the Checkerboard Distortion Caused by Finite Word Length Error in Multirate System,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E93A, no. 3, pp. 631–635, 2010.
 I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.
 M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” arXiv preprint arXiv:1701.07875, Jan. 2017. [Online]. Available: http://arxiv.org/abs/1701.07875
 I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved Training of Wasserstein GANs,” arXiv preprint arXiv:1704.00028, Mar. 2017. [Online]. Available: http://arxiv.org/abs/1704.00028
 T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral Normalization for Generative Adversarial Networks,” arXiv preprint arXiv:1802.05957, Feb. 2018. [Online]. Available: http://arxiv.org/abs/1802.05957
 J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for Simplicity: The All Convolutional Net,” arXiv preprint arXiv:1412.6806, Dec. 2014. [Online]. Available: http://arxiv.org/abs/1412.6806
 W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “RealTime Single Image and Video SuperResolution Using an Efficient SubPixel Convolutional Neural Network,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp. 1874–1883.
 X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proc. of the International Conference on Artificial Intelligence and Statistics, Apr. 2011, pp. 315–323.
 A. Krizhevsky and G. E. Hinton, “Learning multiple layers of features from tiny images,” Master’s thesis, University of Toronto, 2009.
 D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv preprint arXiv:1412.6980, Dec. 2014. [Online]. Available: http://arxiv.org/abs/1412.6980
 A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” arXiv preprint arXiv:1511.06434, Nov. 2015. [Online]. Available: http://arxiv.org/abs/1511.06434
 B. Xu, N. Wang, T. Chen, and M. Li, “Empirical Evaluation of Rectified Activations in Convolutional Network,” arXiv preprint arXiv:1505.00853, May 2015. [Online]. Available: http://arxiv.org/abs/1505.00853
 Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep Learning Face Attributes in the Wild,” in Proc. of International Conference on Computer Vision, Dec. 2015.