CaseNet: Content-Adaptive Scale Interaction Networks for Scene Parsing

CaseNet: Content-Adaptive Scale Interaction Networks
for Scene Parsing

Xin Jin USTC jinxustc@mail.ustc.edu.cn Cuiling Lan MSRA culan@microsoft.com Wenjun Zeng MSRA wezeng@microsoft.com Zhizheng Zhang USTC zhizheng@mail.ustc.edu.cn  and  Zhibo Chen USTC chenzhibo@ustc.edu.cn
Abstract.

Objects in an image exhibit diverse scales. Adaptive receptive fields are expected to catch suitable range of context for accurate pixel level semantic prediction for handling objects of diverse sizes. Recently, atrous convolution with different dilation rates has been used to generate features of multi-scales through several branches and these features are fused for prediction. However, there is a lack of explicit interaction among the branches to adaptively make full use of the contexts. In this paper, we propose a Content-Adaptive Scale Interaction Network (CaseNet) to exploit the multi-scale features for scene parsing. We build the CaseNet based on the classic Atrous Spatial Pyramid Pooling (ASPP) module, followed by the proposed contextual scale interaction (CSI) module, and the scale adaptation (SA) module. Specifically, first, for each spatial position, we enable context interaction among different scales through scale-aware non-local operations across the scales, \peek_meaning:NTF . i.e \peek_catcode:NTF a i.e. i.e., CSI module, which facilitates the generation of flexible mixed receptive fields, instead of a traditional flat one. Second, the scale adaptation module (SA) explicitly and softly selects the suitable scale for each spatial position and each channel. Ablation studies demonstrate the effectiveness of the proposed modules. We achieve state-of-the-art performance on three scene parsing benchmarks Cityscapes, ADE20K and LIP.

scene parsing, multi-scale information, contextual scale interaction, scale adaptation
journalyear: 2018copyright: acmlicensedconference: Woodstock ’18: ACM Symposium on Neural Gaze Detection; June 03–05, 2018; Woodstock, NYbooktitle: Woodstock ’18: ACM Symposium on Neural Gaze Detection, June 03–05, 2018, Woodstock, NYprice: 15.00doi: 10.1145/1122445.1122456isbn: 978-1-4503-9999-9/18/06\setitemize

[1]itemsep=2pt,partopsep=2pt,parsep=topsep=2pt \ExplSyntaxOn \ExplSyntaxOff

1. Introduction

Scene parsing or semantic segmentation is a fundamental and challenging task. The purpose is to predict the semantic of each pixel including stuffs (e.g. sky, road) and objects (e.g. person, car). This study benefits various challenging tasks such as autonomous driving (Teichmann et al., 2018), robot sensing (Park et al., 2018b, a), and image editing/captioning (Li et al., 2016; Liu et al., 2018b).

With the development of the Fully Convolutional Network (FCN) (Long et al., 2015), semantic image segmentation has achieved promising results with significantly improved feature representation. However, as shown in Figure 1, objects within a picture are typically diverse on scales. The CNNs with standard convolutions can not handle diverse scales due to fixed receptive fields. Objects that are larger than the receptive fields often have inconsistent parsing prediction while objects that are smaller than the receptive fields are often ignored/mislabeled (Zhang et al., 2017).

Figure 1. Example of the scale variations of objects in a street scene. The same category of objects, such as cars, may vary largely in scale. Each pixel needs a suitable receptive field to catch the best context.

To deal with scale diversity, multi-scale context fusion schemes have been proposed (Zhao et al., 2017b; Chen et al., 2018; Yang et al., 2018). The early representative work is the pyramid pooling module (PPM) in PSPNet (Zhao et al., 2017b), which pools the feature map at multiple rates and multiple effective field-of-views, capturing objects at multiple scales(Chen et al., 2017). However, such a pooling operation reduces the feature map resolution and also leads to information loss. Atrous spatial pyramid pooling (ASPP) (Chen et al., 2018) and densely connected atrous spatial pyramid pooling (DenseASPP) (Yang et al., 2018) modules use atrous/dilated convolutions (Chen et al., 2015) to effectively enlarge the field of view of filters to have multi-scale features. Atrous/Dilated convolutions with different atrous rates lead to different receptive field sizes for generating features. All the above approaches concatenate these features followed by convolution operations to exploit multi-scale contexts. To some extent, these convolution filters promote the interactions and selections among features of different scales. However, after training, the filter weights are fixed and thus the interaction and selection are not highly content adaptive and flexible. Besides, for different spatial positions, the object/stuff categories and scales are different. The feature interaction among scales for exploiting contextual features of different scales should be content adaptive rather than position invariant.

Figure 2. An overview of the Content-Adaptive Scale Interaction Network (CaseNet). We build it based on the classic Atrous Spatial Pyramid Pooling (ASPP) (Chen et al., 2017), followed by the proposed Contextual Scale Interaction (CSI) module, and the Scale Adaptation (SA) module. The fused feature is fed to a convolution layer followed by SoftMax for prediction. Note denotes the dilation rate of the atrous convolution.

To address the above problems, we propose a Content-Adaptive Scale Interaction Network (CaseNet) to adaptively ( \peek_meaning:NTF . i.e \peek_catcode:NTF a i.e. i.e.case by case for each position) exploit multi-scale features for scene parsing. Fig. 2 shows the overall flowchart of our framework. We build our framework on top of the classic Atrous Spatial Pyramid Pooling (ASPP) module, which provides feature maps with each characterized by a different scale/receptive field.

First, to enable the feature interaction among scales to exploit the scale contextual information, we design a contextual scale interaction (CSI) module for feature refinement, where scale-aware non-local operations are performed across the scales of features. For the feature of a scale at a spatial position, the interaction is achieved by computing the response as a weighted sum of the features at all the scales of the same spatial position. Note that this module is functionally different from the non-local block (Wang et al., 2018b) from two aspects. First, we use the non-local operations to exploit the scales while non-local block (Wang et al., 2018b) use them to exploit the long-range spatial/temporal contexts. Second, we explicitly leverage prior scale information, \peek_meaning:NTF . i.e \peek_catcode:NTF a i.e. i.e., scale/branch index, together with the embedding features to learn the weights (edges) which are used for computing the sum. The learned weights are different for different spatial positions and thus content adaptive.

Second, we design a spatial and channel adaptive scale adaptation (SA) module to adaptively select the appropriate receptive fields/scales for objects/stuffs of different sizes. For each spatial position, we simultaneously learn scale and channel attention. Ours is different from the attention in (Chen et al., 2016; Kong and Fowlkes, 2019). First, we relax the sum 1 constraint on the attention weights across the scales, which provides more flexible optimization. On the other hand, since the sum of the attention weights in a spatial position is not restricted to 1, the energy on different spatial position can vary and play a role of spatial attention. Besides, to be more flexible, our attention is also channel adaptive.

Our main contributions are summarized as follows:

  • We propose a simple yet effective content-adaptive scale interaction network (CaseNet) to efficiently exploit the multi-scale features for scene parsing.

  • Two content adaptive modules, contextual scale interaction (CSI) module and scale adaptation (SA) module, are proposed to boost the interaction of multi-scale features. The CSI module enables the scale context interaction for feature refinement while the SA module facilitates the appropriate scale selection adapted to spatial positions and channels.

  • We validate the effectiveness of the proposed modules through adequate ablation studies. We achieve state-of-the-arts results on three scene parsing benchmarks including Cityscapes (Cordts et al., 2016), ADE20K (Zhou et al., 2017), and LIP datasets (Gong et al., 2017).

2. Related Work

Multi-Scale Feature Exploration. Scene parsing has achieved great progress with the development of Fully Convolutional Networks (FCNs) (Long et al., 2015). To alleviate the local receptive field (RF) issue of convolution operations in FCNs, several network variants are proposed to generate and aggregate multi-scale features. Motivated by the spatial pyramid matching (Lazebnik et al., 2006), PSPNet (Zhao et al., 2017b), Deeplabv2 (Chen et al., 2018), and Deeplabv3 (Chen et al., 2017) are proposed to concatenate features of multiple receptive field sizes together for the semantics prediction. PSPNet (Zhao et al., 2017b) employs four spatial pyramid pooling (down-sampling) layers in parallel, named pyramid pooling module (PPM), to aggregate information from multiple receptive field sizes. Deeplabv2 (Chen et al., 2018) and Deeplabv3 (Chen et al., 2017) both adopt atrous spatial pyramid pooling (ASPP) to concatenates features from multiple atrous convolution layers with different dilation rates arranged in parallel.

Besides the above popular multi-branch structures, there are some other works for exploiting spatial contexts to have different receptive field sizes. Peng \peek_meaning:NTF . et al \peek_catcode:NTF a et al. et al.  (Peng et al., 2017) enlarge the kernel size with a decomposed structure for global convolution. Zhang \peek_meaning:NTF . et al \peek_catcode:NTF a et al. et al.(Zhang et al., 2017) propose a scale-adaptive convolution to acquire flexible-size receptive fields by adding a scale regression layer. Lin, \peek_meaning:NTF . et al \peek_catcode:NTF a et al. et al.(Lin et al., 2017; Ding et al., 2018) utilize the encoder-decoder structure to fuse mid-level and high-level features. DAG-RNN (Shuai et al., 2018) employs a recurrent neural network to capture the contextual dependencies over local features. EncNet (Zhang et al., 2018) introduces a global encoding layer to learn whether some category of object/stuff exists and enforce this as channel attention over the score maps.

In considering the simple structure yet superior performance of ASPP (Chen et al., 2017; Chen et al., 2018), our design follows the multi-branch structure as ASPP. However, for the set of works (Zhao et al., 2017b; Chen et al., 2017; Chen et al., 2018), there is a lack of content adaptive interaction among the different scaled features. In this paper, we enable this by introducing a contextual scale interaction (CSI) module.

Non-Local Mean. Non-local mean has proven effective for many tasks, such as image denoising (Buades et al., 2005; Dabov et al., 2007), and texture synthesis (Efros and Leung, 1999). It calculates the feature at one position as a weighted sum of all other positions to exploit spatial context. Wang et al. extend the idea (Wang et al., 2018b) by capturing the spatial and temporal contexts for feature refinement within the non-local neural networks.

Motivated by these works, we leverage the non-local mean idea to enable the interaction among different feature scales to exploit the scale contexts. Specially, we leverage the scale prior, \peek_meaning:NTF . i.e \peek_catcode:NTF a i.e. i.e., scale/branch index, as part of the features for determining the weights that control the amount of information passing.

Attention. Attention that aims to enhance important features and suppress irrelevant features are widely used in many tasks (Chorowski et al., 2015; Bahdanau et al., 2014; Xu et al., 2015; Xu and Saenko, 2016; Wang et al., 2017; Hu et al., 2018). Hu et al. (Hu et al., 2018) introduce a channel-wise attention mechanism through a squeeze-and-excitation (SE) block to modulate the channel-wise feature responses for image classification. For scene parsing, an attention mechanism is proposed that learns to weight the multi-scale features at each pixel location on score maps (Chen et al., 2016; Kong and Fowlkes, 2019). For a spatial position at a scale, each attention weight is shared across all channels. Weights are normalized by SoftMax/Gumbel Softmax across scales.

Our scale adaptation (SA) can be viewed as an attention, but ours is different from them in two aspects. First, we relax the sum 1 constraint over the attention weights (across the scales). Second, we extend the attention to be both channel and spatial adaptive to enable the network to flexibly select the appreciate scales.

3. Proposed CaseNet

We propose a Content-Adaptive Scale Interaction Network (CaseNet) to efficiently exploit the multi-scale features for scene parsing. Fig. 2 illustrates the overall framework. For a given input image, the ASPP module generates multi-scale features ( \peek_meaning:NTF . e.g \peek_catcode:NTF a e.g. e.g.  scales) from the feature map extracted by a backbone FCN. For the multi-scale features, the proposed contextual scale interaction (CSI) module (see subsection 3.1) facilitates the mutual interaction among scales for refining features. Over the refined multi-scale features, the proposed scale adaptation (SA) module (see subsection 3.2) adaptively determines suitable scales for channels and spatial positions to fuse multi-scale features. The fused feature is fed to a convolution layer followed by SoftMax for prediction. Note that the entire network is end-to-end trained.

3.1. Contextual Scale Interaction (CSI) Module

Figure 3. Illustration of the generation of mixed receptive field (higher important levels near the center) from three flat receptive fields by combing them with weights of \peek_meaning:NTF . e.g \peek_catcode:NTF a e.g. e.g.  0.5, 0.3, 0.2.

The features from different scales can provide useful contextual information to the current scale. For example, a receptive field being a little larger than the boat size may involve the surrounded water. Such context could help reduce the ambiguity of being car or boat. At the same time, the feature of boat may be more important than that with mixed water feature. Thus, a mixed receptive field, as illustrated in Fig. 3 which allocates large importance levels near the center and smaller ones over places far from the center, is more ideal. Fortunately, we can achieve this by simply enabling the interaction among the multi-scale features and weighted combine the feature responses come from different receptive field. We propose a Contextual Scale Interaction (CSI) module which performs the interactions among scales by scale-aware non-local operations.

Figure 4. Contextual Scale Interaction (CSI) module through scale-aware non-local operations across scales (best viewed in color).
\Description

extractor

Fig. 4 illustrates the procedure. Given five () intermediate feature maps (tensors) from five branches of ASPP with each of width , height , and channels. For each spatial position , the information interaction is performed across the five scales with each scale being a feature node. The five feature nodes are represented as , where . Then non-local operations are performed for the five features. For the feature of scale , it is calculated as a weighted average of all the scales as

(1)

where denotes the pairwise affinity between features (of scale ) and (of scale ), . Here is obtained by

(2)

where , represent two individual embedding functions implemented by two 1 1 convolutional layers followed by batch normalization (BN) and ReLU activation. For each spatial position , from with , we can obtain an affinity matrix of for guiding the interaction. It is worth noting that we embed the prior information of scale index to better learn the affinity among features of various scales.

Then, we obtain five refined feature maps (tensors), .

3.2. Scale Adaptation (SA) Module

We propose a spatial and channel adaptive scale adaptation (SA) module. This enables the adaptive selection of the appropriate receptive fields/scales for different sized objects/stuffs. As illustrated in Fig. 5, the refined feature maps (of with of width , height , and channels) from the CSI module is the input of SA module. For each spatial position , we concatenate the scale features to derive the attention over different scales and channels.

For each spatial position , we represent the feature vectors from the five scale feature maps as , where . We concatenate the feature vectors to have . Specifically, the channel adaptive scale attention vector over the refined feature map at position is obtained by:

(3)

where and are implemented by 11 convolution followed by batch normalization. shrinks the channel dimension by a rate (we experimentally set it to ), and transforms the channel dimension to . Then, the attention element-wisely modulates the channel dimensions for each scale.

Figure 5. Illustration of our Scale Adaptation (SA) module. To facilitate illustration, we pick up the position as the example. For each spatial position , we concatenate the features of all scales and learn content adaptive scale attention over each channel of each scale, \peek_meaning:NTF . i.e \peek_catcode:NTF a i.e. i.e. 5C attention values (best viewed in color).
\Description

rebuilder

Similar idea but with shared channel attention has been proposed in (Chen et al., 2016; Kong and Fowlkes, 2019) to spatial adaptively determine the suitable feature scales. However, they all constrain the sum of the attention weights across scales to be 1, \peek_meaning:NTF . e.g \peek_catcode:NTF a e.g. e.g., using SoftMax activation function. We relax the sum 1 constraint on the attention weights by using activation function. This provides more flexible optimization space. Intuitively, since the sum of the attention weights in a spatial position is not restricted to be 1, the energy on different spatial position can vary and this is equivalent to play a role of spatial attention. Further, to be more flexible, our scale attention is also channel adaptive.

4. Experiments

To evaluate the proposed CaseNet, we carry out comprehensive experiments on three widely-accepted scene parsing datasets including Cityscapes (Cordts et al., 2016), ADE20K (Zhou et al., 2017) and LIP (Gong et al., 2017). In the following sections, we will first introduce three datasets and the implementation details, respectively. Then we perform extensive ablation studies on the Cityscapes dataset to demonstrate the effectiveness of our designs. Finally, we further evaluate CaseNet’s performance on ADE20K (Zhou et al., 2017) and LIP (Gong et al., 2017).

4.1. Datasets

Cityscapes. The dataset is comprised of a large, diverse set of high-resolution (1024x2048) images recorded in streets from 50 different cities, where 5000 of these images have high quality pixel-level labels of 19 classes and are finely annotated. Following the standard setting of Cityscapes, the finely annotated 5000 images are divided into 2975, 500, and 1525 images for training, validation, and testing. We do not use coarse data (20000 coarsely annotated images) in our experiments.

ADE20K. The dataset has been used in ImageNet scene parsing challenge 2016, including 150 classes and diverse scenes with 1038 image-level labels, which is divided into 20K/2K/3K images for training, validation, and testing, respectively.

LIP. The dataset has been used in the Look into Person (LIP) challenge 2016 for single human parsing, including 50,462 images with 19 semantic human part classes and 1 background class.

4.2. Implementation Details

We employ a pre-trained FCN (ResNet) (Wang et al., 2017) with the dilated strategy (Chen et al., 2018) as the backbone to extract the feature map. To capture objects of different scales, we further organize five dilated convolutional branches in a parallel fashion, which provides five feature maps with each characterized by a different scale/receptive field. Referring to ASPP (Chen et al., 2018; Chen et al., 2017), we set the dilation rates for the five branches as , respectively. We implement our method based on Pytorch, and all experiments are performed on 4x Tesla P40 GPUs.

4.2.1. Cityscapes.

Referring to the public settings in previous work, we set the initial learning rate as 0.01 and weight decay as 0.0005. The original image size is 1024x2048 and we crop it to 769769 following PSPNet (Zhao et al., 2017b). Training batch size is set to 8 and we train our model with InPlaceABNSync (Rota Bulò et al., 2018) to synchronize the mean and standard-deviation of BN across multiple GPUs in all the experiments. We employ 40K/80k training iterations when training without/with validation set. Similar to the previous works ASPP (Chen et al., 2018) and DenseASPP (Yang et al., 2018), we also employ the poly learning rate policy where the learning rate is multiplied by .

For the data augmentation, we apply random flipping horizontally and random scaling in the range of [0.5, 2]. For loss functions, following (Chen et al., 2018), we employ class-balanced cross entropy loss on both the final output and the intermediate feature map output, where the weight of the final loss is 1 and the auxiliary loss is 0.4. Following previous works (Wu et al., 2016), we also adopt the online hard example mining (OHEM) strategy.

4.2.2. Ade20k

We set the initial learning rate as 0.02 and weight decay as 0.0001. The input image is resized to 300, 375, 450, 525, 600 randomly since the images are of various sizes on ADE20K. Training batch size is set to 8 and we also train our model with InPlaceABNSync (Rota Bulò et al., 2018). We train the model for 100k iterations. By following the previous works (Chen et al., 2017; Zhao et al., 2017b), we employ the same poly learning rate policy and data augmentation and employ a supervision in the intermediate feature map.

4.2.3. Lip

Following the CE2P (Liu et al., 2018a), we set the initial learning rate as 0.007 and weight decay as 0.0005. The original images are of various sizes and we resize all of them to 473x473. Training batch size is set to 40 and we also train our model with InPlaceABNSync (Rota Bulò et al., 2018). We train the model for 110k iterations. The poly learning rate policy, data augmentation methods and deep supervision in the intermediate feature map are consistent with the experiments on Cityscapes and ADE20K.

4.3. Ablation Study

We perform all ablation studies on the Cityscapes datasets. We split this section into three sub-sections for better illustration: (1) CaseNet versus Baseline. (2) Study on the Contextual Scale Interaction (CSI) module. (3) Study on the Scale Adaptation (SA) module.

\topruleMethods Train. mIoU (%) Val. mIoU (%)
\midruleResNet-101 Baseline 83.75 74.82
ResNet-101 + ASPP (Chen et al., 2017) 85.91 78.51
\midruleResNet-101 + ASPP + CSI 87.74 80.17
ResNet-101 + ASPP + SA 87.66 80.08
\midruleCaseNet 88.33 81.04
\bottomrule
Table 1. Ablation study on Cityscapes validation set. CSI represents the contextual scale interaction (CSI) module, SA represents the scale adaptation (SA) module, and CaseNet represents the complete version with both CSI and SA.

4.3.1. CaseNet versus Baseline.

In order to verify the effectiveness of each module in CaseNet, we perform comprehensive ablation studies on the Cityscapes validation datasets. We use the ResNet-101 Baseline (He et al., 2016) to represent the traditional FCN with the dilated strategy (Chen et al., 2018), which can be viewed as the “FCN” in our pipeline (Fig. 2). To ensure the fairness, the ResNet-101 + ASPP scheme follows DeepLabv3 (Chen et al., 2017) with some modifications in our designs: we remove the original image-level pooling branch, and employ five 3x3 dilated convolution branches with dilation rates of 1, 6, 12, 24 and 36, respectively (as shown in Fig. 2).

Because our key contributions lie in the contextual scale interaction (CSI) module and scale adaptation (SA) module, we verify their effectiveness separately: ResNet-101 + ASPP + CSI and ResNet-101 + ASPP + SA. The complete version, i.e. ResNet-101 + ASPP + CSI + SA, is abbreviated as CaseNet. For quantitative evaluation, mean of class-wise Intersection over Union (mIoU) is used.

Experiment results are reported in Table 1, where all the results are based on single scale testing. The performance of ResNet-101 + ASPP is comparable to the numbers in the original DeepLabv3 paper (Chen et al., 2017). We make the following observations.

Our contextual scale interaction (CSI) module and scale adaptation (SA) module both significantly improve performance over other two powerful baselines. ResNet-101 + ASPP + CSI outperforms the ResNet-101 Baseline and ResNet-101 + ASPP by 5.35% and 1.66% respectively in mIoU, which verifies the effectiveness of the feature interaction among scales to exploit scale contextual information. In addition, ResNet-101 + ASPP + SA outperforms the ResNet-101 Baseline and the ResNet-101 + ASPP by 5.26% and 1.57% respectively in mIoU, which verifies the effectiveness of the spatial and channel adaptive scale attention.

We find that we can further improve performance by combining the CSI and SA modules together. For example, the complete version CaseNet achieves the best 81.04% on the validation set based on single scale testing and improves by 0.87% over ResNet-101 + ASPP + CSI and 0.96% over ResNet-101 + ASPP + SA.

Figure 6. Visualization results of Contextual Scale Interaction (CSI) module on Cityscapes validation set.
Figure 7. Visualization of scale attention (SA) map on Cityscapes validation set.

4.3.2. Study on Contextual Scale Interaction (CSI) Module.

The influences of Contextual Scale Interaction (CSI) module can be observed in Fig.  6. We observe that some inconspicuous objects are well-captured and object boundaries are clearer with the CSI module included, such as the ‘fence’ in the first row and the ‘pole’ in the second row, which demonstrates that the contextual interaction over multi-scale features enhances the discrimination of details.

Meanwhile, some salient objects exhibit their intra-class consistency because some misclassified regions are now correctly classified with the CSI module, such as the ‘truck’ in the third row and the ‘bus’ in the fourth row, which further verifies that the feature interaction of CSI is powerful and can promote the content adaptation.

Influence of Scale Index. The scale index (branch index with each corresponding to different dilation rate) in our CaseNet acts as an important scale prior to explicitly represent its corresponding receptive-field size. We verify its effectiveness in Table 2.

\topruleMethods Train. mIoU (%) Val. mIoU (%)
\midruleCSI without scale index 86.68 79.51
CSI with scale index (Ours) 87.74 80.17
\bottomrule
Table 2. Influence of the scale prior (scale index) in the CSI module. For fairness of comparison, we implement different designs on top of our baseline ResNet-101 + ASPP.

Sharing Embedding or Not? We have compared the individual embedding and the shared embedding function among scales which is used for calculating the affinities within the CSI module. We find that the former achieves about 0.45% gain over the latter on the validation set. We note that features of different scales should be embedded into different high dimensional feature spaces.

4.3.3. Study on Scale Adaptation (SA) Module.

In this subsection, extensive experiments are designed to study how different designs of the Scale Adaptation (SA) module influence performance quantitatively. We perform the studies on the Cityscapes’ validation set. For fairness of comparison, we implement different designs on top of our baseline ResNet-101 + ASPP.

Proposed Scale Adaptation versus Other Attentions. Table 3 compares the performance of our scale adaptation (SA) modules with the other attention designs. (1) Channel Attention Alone. Following the attention design in Squeeze-and-Excitation module (SE) (Hu et al., 2018), we use spatially global average-pooled features to compute channel-wise attention for the five branches, by using two fully connected (FC) layers with the non-linearity. In comparison with SE, our SA achieves 1.33% gain in mIoU, which demonstrates the importance of the spatially-adaptive attention for scene parsing. (2) Spatial Attention Alone. For using only spatial attention (Spatial-A) designs, we change our attention output from the original dimensions to 1 dimension. In comparison with Spatial-A, our SA achieves 0.58% gain in mIoU. By using both spatial and channel attention across scales, our final scheme achieves the best performance.

\topruleMethods Train. mIoU (%) Val. mIoU (%)
\midruleResNet-101 + ASPP 85.91 78.51
ResNet-101 + ASPP + SE (Hu et al., 2018) 86.13 78.75
ResNet-101 + ASPP + Spatial-A 86.71 79.5
\midruleResNet-101 + ASPP + SA 87.66 80.08
\bottomrule
Table 3. Performance comparisons of our Scale Adaptation (SA) with other attention designs on the validation set of Cityscapes. We implement different attention designs on top of our baseline ResNet-101+ASPP for fair comparison.
\topruleMethods Train. mIoU (%) Val. mIoU (%)
\midruleSA with Softmax 86.82 79.56
SA with Sigmoid (Ours) 87.66 80.08
\bottomrule
Table 4. Influence of relaxing the constraint on attention values in the SA module. We implement the different designs on top of our baseline ResNet-101+ASPP for fair comparison.
\topruleMethods

Mean IoU

road

sidewalk

building

wall

fence

pole

traffic light

traffic sign

vegetation

terrain

sky

person

rider

car

truck

bus

train

motorcycle

bicycle

\midruleFCN-8s (Long et al., 2015) 65.3 97.4 78.4 89.2 34.9 44.2 47.4 60.1 65 91.4 69.3 93.9 77.1 51.4 92.6 35.3 48.6 46.5 51.6 66.8
DeepLab-v2 (Chen et al., 2018) 70.4 97.9 81.3 90.3 48.8 47.4 49.6 57.9 67.3 91.9 69.4 94.2 79.8 59.8 93.7 56.5 67.5 57.5 57.7 68.8
FRRN (Pohlen et al., 2017) 71.8 98.2 83.3 91.6 45.8 51.1 62.2 69.4 72.4 92.6 70 94.9 81.6 62.7 94.6 49.1 67.1 55.3 53.5 69.5
RefineNet (Lin et al., 2017) 73.6 98.2 83.3 91.3 47.8 50.4 56.1 66.9 71.3 92.3 70.3 94.8 80.9 63.3 94.5 64.6 76.1 64.3 62.2 70
GCN (Peng et al., 2017) 76.9 - - - - - - - - - - - - - - - - - - -
DUC (Wang et al., 2018a) 77.6 98.5 85.5 92.8 58.6 55.5 65 73.5 77.9 93.3 72 95.2 84.8 68.5 95.4 70.9 78.8 68.7 65.9 73.8
ResNet-38 (Wu et al., 2019) 78.4 98.5 85.7 93.1 55.5 59.1 67.1 74.8 78.7 93.7 72.6 95.5 86.6 69.2 95.7 64.5 78.8 74.1 69 76.7
DSSPN (Liang et al., 2018b) 77.8 - - - - - - - - - - - - - - - - - - -
DepthSeg (Kong and Fowlkes, 2018) 78.2 - - - - - - - - - - - - - - - - - - -
PSPNet (Zhao et al., 2017b) 78.4 - - - - - - - - - - - - - - - - - - -
BiSeNet (Yu et al., 2018a) 78.9 - - - - - - - - - - - - - - - - - - -
DFN (Yu et al., 2018b) 79.3 - - - - - - - - - - - - - - - - - - -
PSANet (Zhao et al., 2018) 80.1 - - - - - - - - - - - - - - - - - - -
DenseASPP (Yang et al., 2018) 80.6 98.7 87.1 93.4 60.7 62.7 65.6 74.6 78.5 93.6 72.5 95.4 86.2 71.9 96.0 78.0 90.3 80.7 69.7 76.8
\midruleCaseNet 81.9 98.7 87.2 93.7 62.6 64.7 69 76.4 80.7 93.7 73.3 95.6 86.8 72.3 96.2 78.1 90.6 87.9 71.5 76.9
\bottomrule
Table 5. Category-wise comparisons with state-of-the-art approaches on the Cityscapes test set. CaseNet outperforms existing methods and achieves 81.9% in Mean IoU.
\topruleMethods Publication Backbone Val. mIoU (%)
\midruleRefineNet (Lin et al., 2017) CVPR2017 ResNet-101 40.20
RefineNet (Lin et al., 2017) CVPR2017 ResNet-152 40.70
PSPNet (Zhao et al., 2017b) CVPR2017 ResNet-101 43.29
PSPNet (Zhao et al., 2017b) CVPR2017 ResNet-152 43.51
PSPNet (Zhao et al., 2017b) CVPR2017 ResNet-269 44.94
SAC (Zhang et al., 2017) ICCV2017 ResNet-101 44.30
PSANet (Zhao et al., 2018) ECCV2018 ResNet-101 43.77
UperNet (Xiao et al., 2018) ECCV2018 ResNet-101 42.66
DSSPN (Liang et al., 2018b) CVPR2018 ResNet-101 43.68
EncNet (Zhang et al., 2018) CVPR2018 ResNet-101 44.65
\midruleCaseNet ResNet-101 45.28
\bottomrule
Table 6. Comparisons with state-of-the-art approaches on the validation set of the ADE20K dataset.
\topruleMethods Publication Backbone Val. mIoU (%)
\midruleAttention+SSL (Gong et al., 2017) CVPR2017 ResNet-101 44.73
JPPNet (Liang et al., 2018a) TPAMI2018 ResNet-101 51.37
SS-NAN (Zhao et al., 2017a) CVPR2017 ResNet-101 47.92
MMAN (Luo et al., 2018) ECCV2018 ResNet-101 46.81
MuLA (Nie et al., 2018) ECCV2018 ResNet-101 49.30
CE2P (Liu et al., 2018a) AAAI2019 ResNet-101 53.10
\midruleCaseNet ResNet-101 54.38
\bottomrule
Table 7. Comparisons with state-of-the-art approaches on the validation set of the LIP dataset.

Visualization. In order to further understand our Scale Adaptation (SA) module intuitively, we randomly choose some examples from the validation set of Cityscapes and visualize the Scale Attention (SA) map in Fig. 7. For each branch in CaseNet, the overall scale attention map has a size of because it is spatial and channel adaptive, so we show the average scale attention map along the channel dimension to see whether they could capture different scaled objects at different spatial position.

As illustrated in Fig.  7, we can find that each branch has different degrees of response to the objects of different scales. For example, for the first and second branch with dilation rates being 1 and 6, their scale attention maps “Attention map #1” and “Attention map #2” mainly focus on small inconspicuous objects and object boundaries, such as the ‘traffic sign’ in the first row and second row. With the increase of the dilation rates, the third and fourth branch with dilation rates being 12 and 24 both have larger receptive-fields, so their corresponding scale attention maps “Attention map #3” and “Attention map #4” mainly focus on salient objects, such as the ‘train, sidewalk’ in the first row and the ‘terrain’ in the second row. For the fifth (last) branch with dilation rates being 36, its receptive-field is large enough to cover the biggest objects/stuffs in the scene, so its “Attention map #5” mainly focuses on the background region of the scene, such as the ‘building, vegetation’ in the first row and second row. In short, these visualizations further verify that our Scale Adaptation (SA) module helps each branch focus on different objects with different scales, and adaptively select the appropriate receptive fields/scales for each spatial position.

Sigmoid or Softmax? Our scale attention is different from the attention in (Chen et al., 2016; Kong and Fowlkes, 2019) as they typically employ a Softmax/Gumbel Softmax function to map the non-normalized output to a probability distribution, which has to meet the sum 1 constraint on the attention weights across the candidates. We relax the sum 1 constraint on the attention weights by employing Sigmoid activation function across the scales-axis. This provides more flexible optimization space. On the other hand, since the sum of the attention weights in a spatial position is not restricted to being 1, the energy on different spatial position can vary and play a role of spatial attention. We experimentally demonstrate the effectiveness of the relaxing the sum 1 constraint and show the results in Table 4.

4.4. Comparison with State-of-the-Art

Results on Cityscapes Dataset. We compare our method with existing methods on the Cityscapes test set. Specifically, we train our CaseNet with only finely annotated data (including validation set for training) for 80k training iterations and submit our test results (with multi-scale [0.75x, 1.0x, 1.25x] testing strategy) to the official evaluation server. Results are shown in Table 5. We observe that CaseNet significantly outperforms existing approaches. In particular, our model outperforms DenseASPP by 1.3% in mIoU, even though DenseASPP uses a more powerful and more complex pretrained DenseNet model (Huang et al., 2017) as the backbone network.

Results on ADE20K Dataset. We further carry out experiments on the ADE20K dataset to evaluate the effectiveness of our method. Comparisons with previous state-of-the-art methods on the ADE20K validation set are shown in Table 6. Results show that our CaseNet achieves the best performance of 45.28% in Mean IoU. Our CaseNet outperforms the ResNet-101 based state-of-the-art method EncNet by 0.63% in mIoU, and improves over the ResNet-269 based PSPNet by 0.34% in mIoU.

Results on LIP Dataset. We also conduct experiments on the LIP dataset. Comparisons with previous state-of-the-art methods are reported in Table 7. We observe that the proposed CaseNet achieves 54.38% in mIoU, which outperforms previous methods by a large margin, \peek_meaning:NTF . i.e \peek_catcode:NTF a i.e. i.e.1.28%. This indicates that the proposed CaseNet does well on the human parsing task.

5. Conclusion

In this paper, we propose a simple yet effective Content-Adaptive Scale Interaction Network (CaseNet) to adaptively exploit multi-scale features through contextual interaction and adaptation. Specifically, we build the framework based on the classic Atrous Spatial Pyramid Pooling (ASPP), followed by the proposed contextual scale interaction (CSI) module which enables feature interaction among scales, and a spatial and channel adaptive scale adaptation (SA) module to facilitates the appropriate scale selection. Our ablation studies demonstrate the effectiveness of the proposed CSI and SA modules, leading to more precise parsing results. In addition, CaseNet achieves state-of-the-art performance on Cityscapes, ADE20K, and LIP datasets.

References

  • (1)
  • Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
  • Buades et al. (2005) Antoni Buades, Bartomeu Coll, and J-M Morel. 2005. A non-local algorithm for image denoising. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. 60–65.
  • Chen et al. (2015) Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2015. Semantic image segmentation with deep convolutional nets and fully connected crfs. International Conference on Learning Representations (2015).
  • Chen et al. (2018) Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2018. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2018), 834–848.
  • Chen et al. (2017) Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
  • Chen et al. (2016) Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3640–3649.
  • Chorowski et al. (2015) Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-based models for speech recognition. In NeurIPS. 577–585.
  • Cordts et al. (2016) Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213–3223.
  • Dabov et al. (2007) Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. Transactions on image processing 16, 8 (2007), 2080–2095.
  • Ding et al. (2018) Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, and Gang Wang. 2018. Context contrasted feature and gated multi-scale aggregation for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2393–2402.
  • Efros and Leung (1999) Alexei A Efros and Thomas K Leung. 1999. Texture synthesis by non-parametric sampling. In IEEE International Conference on Computer Vision, Vol. 2. 1033–1038.
  • Gong et al. (2017) Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 932–940.
  • He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  • Hu et al. (2018) Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.
  • Huang et al. (2017) Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708.
  • Kong and Fowlkes (2019) Shu Kong and Charless Fowlkes. 2019. Pixel-wise Attentional Gating for Scene Parsing. In IEEE Winter Conference on Applications of Computer Vision. 1024–1033.
  • Kong and Fowlkes (2018) Shu Kong and Charless C Fowlkes. 2018. Recurrent scene parsing with perspective understanding in the loop. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 956–965.
  • Lazebnik et al. (2006) Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. 2169–2178.
  • Li et al. (2016) Xiangyang Li, Xinhang Song, Luis Herranz, Yaohui Zhu, and Shuqiang Jiang. 2016. Image captioning with both object and scene information. In Proceedings of the 24th ACM international conference on Multimedia. ACM, 1107–1110.
  • Liang et al. (2018a) Xiaodan Liang, Ke Gong, Xiaohui Shen, and Liang Lin. 2018a. Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE transactions on pattern analysis and machine intelligence (2018).
  • Liang et al. (2018b) Xiaodan Liang, Hongfei Zhou, and Eric Xing. 2018b. Dynamic-structured semantic propagation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 752–761.
  • Lin et al. (2017) Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. 2017. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1925–1934.
  • Liu et al. (2018b) Daqing Liu, Zheng-Jun Zha, Hanwang Zhang, Yongdong Zhang, and Feng Wu. 2018b. Context-Aware Visual Policy Network for Sequence-Level Image Captioning. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 1416–1424.
  • Liu et al. (2018a) Ting Liu, Tao Ruan, Zilong Huang, Yunchao Wei, Shikui Wei, Yao Zhao, and Thomas Huang. 2018a. Devil in the details: Towards accurate single and multiple human parsing. arXiv preprint arXiv:1809.05996 (2018).
  • Long et al. (2015) Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431–3440.
  • Luo et al. (2018) Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2018. Macro-micro adversarial network for human parsing. In Proceedings of the European Conference on Computer Vision. 418–434.
  • Nie et al. (2018) Xuecheng Nie, Jiashi Feng, and Shuicheng Yan. 2018. Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation. In Proceedings of the European Conference on Computer Vision. 502–517.
  • Park et al. (2018a) Yoon Jung Park, Yoonsik Yang, Hyocheol Ro, JungHyun Byun, Seougho Chae, and Tack Don Han. 2018a. Meet AR-bot: Meeting Anywhere, Anytime with Movable Spatial AR Robot. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 1242–1243.
  • Park et al. (2018b) Yoon Jung Park, Yoonsik Yang, Hyocheol Ro, Jinwon Cha, Kyuri Kim, and Tack Don Han. 2018b. ChildAR-bot: Educational Playing Projection-based AR Robot for Children. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 1278–1282.
  • Peng et al. (2017) Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, and Jian Sun. 2017. Large Kernel Matters–Improve Semantic Segmentation by Global Convolutional Network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4353–4361.
  • Pohlen et al. (2017) Tobias Pohlen, Alexander Hermans, Markus Mathias, and Bastian Leibe. 2017. Full-resolution residual networks for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4151–4160.
  • Rota Bulò et al. (2018) Samuel Rota Bulò, Lorenzo Porzi, and Peter Kontschieder. 2018. In-place activated batchnorm for memory-optimized training of dnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5639–5647.
  • Shuai et al. (2018) Bing Shuai, Zhen Zuo, Bing Wang, and Gang Wang. 2018. Scene segmentation with dag-recurrent neural networks. IEEE transactions on pattern analysis and machine intelligence 40, 6 (2018), 1480–1493.
  • Teichmann et al. (2018) Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, and Raquel Urtasun. 2018. Multinet: Real-time joint semantic reasoning for autonomous driving. In 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1013–1020.
  • Wang et al. (2017) Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3156–3164.
  • Wang et al. (2018a) Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2018a. Understanding convolution for semantic segmentation. In 2018 IEEE Winter Conference on Applications of Computer Vision. IEEE, 1451–1460.
  • Wang et al. (2018b) Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018b. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.
  • Wu et al. (2016) Zifeng Wu, Chunhua Shen, and Anton van den Hengel. 2016. High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv:1604.04339 (2016).
  • Wu et al. (2019) Zifeng Wu, Chunhua Shen, and Anton Van Den Hengel. 2019. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition 90 (2019), 119–133.
  • Xiao et al. (2018) Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, and Jian Sun. 2018. Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision. 418–434.
  • Xu and Saenko (2016) Huijuan Xu and Kate Saenko. 2016. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In European Conference on Computer Vision. Springer, 451–466.
  • Xu et al. (2015) Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In ICML. 2048–2057.
  • Yang et al. (2018) Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, and Kuiyuan Yang. 2018. Denseaspp for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3684–3692.
  • Yu et al. (2018a) Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018a. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision. 325–341.
  • Yu et al. (2018b) Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018b. Learning a discriminative feature network for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1857–1866.
  • Zhang et al. (2018) Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, and Amit Agrawal. 2018. Context encoding for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7151–7160.
  • Zhang et al. (2017) Rui Zhang, Sheng Tang, Yongdong Zhang, Jintao Li, and Shuicheng Yan. 2017. Scale-adaptive convolutions for scene parsing. In Proceedings of the IEEE International Conference on Computer Vision. 2031–2039.
  • Zhao et al. (2017b) Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017b. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2881–2890.
  • Zhao et al. (2018) Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, and Jiaya Jia. 2018. Psanet: Point-wise spatial attention network for scene parsing. In Proceedings of the European Conference on Computer Vision. 267–283.
  • Zhao et al. (2017a) Jian Zhao, Jianshu Li, Xuecheng Nie, Fang Zhao, Yunpeng Chen, Zhecan Wang, Jiashi Feng, and Shuicheng Yan. 2017a. Self-supervised neural aggregation networks for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 7–15.
  • Zhou et al. (2017) Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 633–641.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
353673
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description