Attention Guided Network for Retinal Image Segmentation††thanks: This work was done when S. Zhang is intern at CVTE Research. M. Tan (firstname.lastname@example.org) and Y. Xu (email@example.com) are the corresponding authors.
Learning structural information is critical for producing an ideal result in retinal image segmentation. Recently, convolutional neural networks have shown a powerful ability to extract effective representations. However, convolutional and pooling operations filter out some useful structural information. In this paper, we propose an Attention Guided Network (AG-Net) to preserve the structural information and guide the expanding operation. In our AG-Net, the guided filter is exploited as a structure sensitive expanding path to transfer structural information from previous feature maps, and an attention block is introduced to exclude the noise and reduce the negative influence of background further. The extensive experiments on two retinal image segmentation tasks (i.e., blood vessel segmentation, optic disc and cup segmentation) demonstrate the effectiveness of our proposed method.
Retinal image segmentation plays an important role in automatic disease diagnosis. Compared to general natural images, retinal images contain more contextual structures, e.g., retinal vessel, optic disc and cup, which often provide important clinical information for diagnosis. As the main indicators for eye disease diagnosis, the segmentation accuracy of these information is important. Recently, convolutional neural networks (CNNs) have shown the strong ability in retinal image segmentation with remarkable performances [3, 5, 4, 14]. Existing CNN based models learn increasingly abstract representations by cascade convolutions and pooling operations. However, these operations may neglect some useful structural information such as edge structures, which are important for retinal image analysis. To address this issue, one possible solution is to add extra expanding paths to merge features skipped from the corresponding resolution levels. For example, FCN  sums up the upsampled feature maps and the feature maps skipped from the contractive path. And U-Net  concatenates them and add convolutions and non-linearities. However, these works can not effectively leverage these structural information, which may hamper the segmentation performance. Therefore, it is desirable to design a better expanding path to preserve structural information.
To address this, we introduce guided filter  as a special expanding path to transfer structural information extracted from low-level feature maps to high-level ones. Guided filter  is an edge-preserving image filter, and has been demonstrated to be effective for transferring structural information. Different from existing works which use the guided filter at the image level, we incorporate the guided filter into CNNs to learn better features for segmentation. We further design an attention mechanism in guided filter, called attention guided filter, to remove the noisy components, which are introduced from the complex background by original guided filter. Finally, we propose Attention Guided Network (AG-Net) to preserve the structural information and guide the expanding operation. The experiments on vessel segmentation and optic disc/cup segmentation demonstrate the effectiveness of our proposed method.
For some reason, we temporarily withdraw this part. It will be available later.
In this paper, we evaluate our method in two major tasks of vessel segmentation, and optic disc/cup segmentation from retina fundus images.
3.1 Vessel Segmentation on DRIVE Dataset
We conduct vessel segmentation experiments on DRIVE to evaluate performance of our proposed AG-Net. The DRIVE  (Digital Retinal Images for Vessel Extraction) dataset contains 40 colored fundus images, which are obtained from a diabetic retinopathy screening program in Netherlands. The 40 images are divided into 20 training images and 20 testing images. All the images are made by a 3CCD camera and each has size of . We apply gamma correction to improve the image quality, and resize the preprocessed images into as inputs. In the experiment, we train our AG-Net from scratch using Adam with the learning rate of 0.0015. The batch size is set to 2. The radius of windows and the regularization parameter in attention guided filter are set to and respectively. Following the previous work , we employ Specificity (Spe), Sensitivity (Sen), Accuracy (Acc), intersection-over-union(IOU) and Area Under ROC (AUC) as measurements.
We compare our AG-Net with several state-of-the-art methods, including Li , Liskowski , and Zhang . Li  remolded the task of segmentation as a problem of cross-modality data transformation from retinal image to vessel map, and outputted the label map of all pixels instead of a single label of the center pixel. Liskowski  trained a deep neural network on sample of examples preprocessed with global contrast normalization, zero-phase whitening, and augmented using geometric transformations and gamma corrections. MS-NFN  generates multi-scale feature maps with an ‘up-pool’ submodel and a ‘pool-up’ submodel. To verify the efficacy of attention in guided filter and transfer structural information, we replaced the attention guided filter in AG-Net with the original guided filter, named GF-Net.
Table 1 shows the performances of different methods on DRIVE. Form the results, we could have several interesting observations: Firstly, GF-Net performs better than original M-Net, which demonstrates the superiority of the guided filter compared to the skip connection for transferring structural information. Secondly, AG-Net outperforms GF-Net by 0.0010, 0.0019, 0.0205 and 0.0126 in terms of Acc, AUC, Sen and IOU respectively. This demonstrates the effectiveness of the attention strategy in attention guided filter. Lastly, unlike other deep learning methods which crop images into patches, our method achieves the best performance with the original preprocessed 20 images. We draw similar observations from the results on the CHASE_DB1 dataset, which are shown in Table 2.
Fig. 1 shows an example test, including the ground truth vessel and the segmentation results obtained by M-Net, M-Net+GF and the proposed AG-Net. M-Net+GF produces clearer boundaries than M-Net, which demonstrates the effectiveness of the guided filter to better leverage structure information. Compared with M-Net+GF, our proposed AG-Net produces more precise segmentation boundaries, which verifies that the attention mechanism is able to highlight the foreground and reduce the effect of background.
In terms of time consumption, we compare our AG-Net with M-Net which is the backbone of our method. In our experiment, both algorithms are implemented with Pytorch and tested on a single NVIDIA Titan X GPU (200 iterations on DRIVE dataset). The running time is shown in Table 3.
3.2 Optic Dice/Cup Segmentation on ORIGA Dataset
Optic Dice/Cup Segmentation is another important retinal segmentation task. In this experiment, we use ORIGA dataset, which contains 650 fundus images with 168 glaucomatous eyes and 482 normal eyes. The 650 images are divided into 325 training images (including 73 glaucoma cases) and 325 testing images (including 95 glaucoma cases). We crop the OD area and resize it into as the input. The training setting of our AG-Net is as same as in vessel segmentation task. We compare AG-MNet with several state-of-the-art methods in OD and/or OC segmentation, including ASM , Superpixel , LRR , U-Net , M-Net , and M-Net with polar transformation (M-Net + PT). ASM  employs the circular hough transform initializaiton to segmentation. Superpixel method in  utilizes superpixel classification to detect the OD and OC boundaries. The methods in LRR  obtain good results, but it only focus on OC segmentation.
Following the setting in , we firstly localize the disc center, and then crop pixels to obtain the input images. Inspired by M-Net+PT , we provide the results of AG-Net with polar transformation, called AG-MNet+PT. Besides, to reduce the impacts of changes in the size of OD, we construct a method AG-MNet+PT, which enlarges 50 pixels of bounding-boxes in up, down, right and left, where the bounding boxes are obtained from pretrained LinkNet. We employ overlapping error (OE) as the evaluation metric, which is defined as , where and denote ground truth area and segmented mask, respectively. In particular, and are the overlapping error of OD and OE. is the average of and .
Table 4 shows the segmentation results, where the overlapping errors of other approaches come directly from the published results. Our method outperforms all the state-of-the-art OD and/or OC segmentation algorithms in terms of the aforementioned two evaluation criteria, which demonstrates the effectiveness of our model. Besides, Our AG-Mnet performs much better than original M-Net under the same situation, which further demonstrates our attention guided filter is beneficial for the segmentation performance. More visualization results could be found in Supplementary Material.
In this paper, we propose an attention guided filter as a structure sensitive expanding path. Specially, we employ M-Net as the main body and exploit our attention guided filter to replace the skip-connection and upsampling, which brings better information fusion. In addition, by introducing the attention mechanism into the guided filter, the attention guided filter can highlight the foreground and reduce the effect of background. Experiments on two tasks demonstrate the effectiveness of our method.
Acknowledments. This work was supported by National Natural Science Foundation of China (NSFC) 61602185 and 61876208, Guangdong Introducing Innovative and Enterpreneurial Teams 2017ZT07X183, and Guangdong Provincial Scientific and Technological Fund 2018B010107001, 2017B090901008 and 2018B010108002, and Pearl River S&T Nova Program of Guangzhou 201806010081, and CCF-Tencent Open Research Fund RAGR20190103.
-  (2017) Linknet: exploiting encoder representations for efficient semantic segmentation. In VCIP, Cited by: §3.2.
-  (2013) Superpixel classification based optic disc and optic cup segmentation for glaucoma screening. TMI. Cited by: §3.2, Table 4.
-  (2016) DeepVessel: Retinal Vessel Segmentation via Deep Learning and Conditional Random Field. In MICCAI, Cited by: §1.
-  (2018) Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. IEEE TMI. Cited by: §1, §3.2, §3.2, Table 1, Table 2, Table 4.
-  (2019) CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE TMI. Cited by: §1.
-  (2013) Guided image filtering. IEEE TPAMI. Cited by: §1.
-  (2016) A cross-modality learning approach for vessel segmentation in retinal images. IEEE TMI. Cited by: §3.1, Table 1, Table 2.
-  (2016) Segmenting retinal blood vessels with deep neural networks. TMI. Cited by: §3.1, Table 1, Table 2.
-  (2015) Fully convolutional networks for semantic segmentation. In CVPR, Cited by: §1.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In MICCAI, Cited by: §1, §3.2, Table 1, Table 2, Table 4.
-  (2004) Ridge-based vessel segmentation in color images of the retina. IEEE TMI. Cited by: §3.1.
-  (2018) Multiscale network followed network model for retinal vessel segmentation. In MICCAI, Cited by: §3.1, Table 1, Table 2.
-  (2014) Optic cup segmentation for glaucoma detection using low-rank superpixel representation. In MICCAI, Cited by: §3.2, Table 4.
-  (2017) A skeletal similarity metric for quality evaluation of retinal vessel segmentation. IEEE TMI. Cited by: §1.
-  (2011) Model-based optic nerve head segmentation on retinal fundus images. In EMBC, Cited by: §3.2, Table 4.
-  (2018) Deep supervision with additional labels for retinal vessel segmentation task. In MICCAI, Cited by: §3.1, §3.1.