Robust saliency detection via fusing foreground and background priors

Robust saliency detection via fusing foreground and background priors


Automatic Salient object detection has received tremendous attention from research community and has been an increasingly important tool in many computer vision tasks. This paper proposes a novel bottom-up salient object detection framework which considers both foreground and background cues. First, A series of background and foreground seeds are extracted from an image reliably, and then used for calculation of saliency map separately. Next, a combination of foreground and background saliency map is performed. Last, a refinement step based on geodesic distance is utilized to enhance salient regions, thus deriving the final saliency map. Particularly we provide a robust scheme for seeds selection which contributes a lot to accuracy improvement in saliency detection. Extensive experimental evaluations demonstrate the effectiveness of our proposed method against other outstanding methods.

Robust saliency detection via fusing foreground and background priors

Kan Huang, Chunbiao Zhu* and Ge Lithanks: This work was supported by the grant of National Natural Science Foundationof China (No.U1611461), the grant of Science and Technology Planning Project of Guangdong Province, China(No.2014B090910001) and the grant of Shenzhen PeacockPlan (No.20130408-183003656).
School of Electronic and Computer Engineering Shenzhen Graduate School, Peking University

Index Terms—  Salient object detection, Foreground prior, Background prior, geodesic distance

1 Introduction

The task of saliency detection is to identify the most important and informative part of a scene. It can be applied to numerous computer vision applications including image retrieval [1], image compression[2], content-aware image editing[3] and object recognition[4].Saliency detection methods in general can be categorized into bottom-up models[5, 6] and top-down models[7]. Bottom-up methods are data-driven, training-free, while top-down methods are task-driven, usually trained with annotated data. Saliency models have been developed for eye fixation prediction and salient object detection. Different from eye fixation prediction model which focus on identifying a few human fixation locations on natural images, salient object detection model aims to pop out salient objects with well-defined boundaries, which is useful for many high-level vision tasks. In thin paper, we focus on the bottom-up salient object detection model.

Research works that exploit foreground priors and explicitly extract salient objects from images have shown their effectiveness during past years. Contrast [5][8] tends to be most influential factor among all kinds of saliency measures in exploiting foreground prior. Some other works utilize the measure of rarity [9] to extract salient objects from images. Suggested by Gestalt Psychological figure-ground assignment principle[10], surroundedness cue has shown its effectiveness in saliency especially for eye fixation prediction, as in [11] . But relying solely on it has trouble highlighting the whole salient objects. Another effective type of salient object detection model is to exploit background priors in an image, from which salient objects could be detected implicitly. By assuming most of the narrow border of the image as background regions, information of background priors can be exploited to calculating saliency map, as done in [12, 13]. But it would also incur problems, for image elements distinctive to border regions are not always belonging to salient objects.

Unlike these methods that focus either on exploiting foreground prior or on background prior for saliency detection, we endeavor to establish an efficient framework that integrate both foreground and background cues. In this paper, a novel bottom-up framework is introduced. Surroundedness cue is utilized to exploit foreground priors, which are used for foreground seeds localization and subsequent saliency calculation. Meanwhile background priors are extracted from border regions and used for saliency calculation. Two saliency maps based on background and foreground priors are generated separately, and then a combination is performed. Finally a refinement step based on geodesic distance is adopted to enhance the map and highlight the salient regions uniformly, deriving final saliency map.

Our work has several contributions: (1) surroundedeness cue is utilized to exploit foreground priors, which is proved to be effective for saliency detection when combined with foreground prior; (2) a robust seed estimation scheme which contributes a lot to accuracy of saliency detection is established; (3) a framework that integrate both foreground and background priors is proposed.

2 Proposed approach

There are two main paralleled sub-processes in our framework: background saliency and foreground saliency. They are calculated separately based on foreground and background seeds. Then two saliency maps are fused into one and enhanced by a refinement step based on geodesic distance to derive the final saliency map. The main framework of proposed approach is depicted in Fig.1.

Fig. 1: Overview of the main framework of our proposed approach

2.1 Foreground saliency

This section will detail on how to find reliable foreground seeds and generate saliency map based on these selected seeds.

2.1.1 Foreground seeds estimation

To extract foreground seeds from an image reliably, surroundedness cue is employed. We adopt the binary segmentation based method in BMS[11] , which exploit surroundedness cue thoroughly in an image, to guide our foreground seeds localization. We denote the map generated by BMS as a surroundedness map, , in which pixel value indicates its degree of surroundedness. To better utilize structural information and abstract small noises, We decompose image into a set of superpixels by SLIC algorithm [14]. All operation in rest of this paper is performed on superpixel-level. The surroundedness value of each superpixel is defined by averaging the value of all its pixels inside, denoted by . is the number of superpixels.

Unlike previous works[12, 13] that treat some regions as certain seeds, we provide a more flexible scheme for seeds estimation. We define two types of seed elements: strong seeds and weak seeds. Strong seeds have high probability of belonging to foreground/background while weak seeds have relatively low probability of belonging to foreground/background. For foreground seeds, the two types of seeds are selected by:


where denotes the set of strong seeds and weak seeds, represent th superpixel. mean(.) is the averaging function. It is obvious from formula (1)(2) that elements of higher degree of surroundedness are more likely to be chosen as strong foreground seeds, which is consistent with intuition.

2.1.2 Foreground saliency map

For saliency calculation based on given seeds, a ranking method in [15] that exploits the intrinsic manifold structure of data for graph labelling is utilized. The ranking method is to rank the relevance of every element to the given set of seeds. We construct a graph that can represent an whole image as in work [16], where each node is a superpixel generated by SLIC.

The ranking procedure is as follows: Given a graph ,where the nodes are and the edges are weighted by an affinity matrix . The degree matrix is defined by , where . The ranking function is given by:


The is the resulting vector which stores the ranking results of each element. The is a vector indicating the seed queries.

In this work, the weight between two nodes is defined by:


where and denote the mean of the superpixels corresponding to two nodes in the CIE LAB color space, and is a constant that controls the strength of the weight.

Different from [16] that define if is a query and otherwise, we define as the strength of the query extra. That is, if is a strong query, and if is a weak query, and otherwise.

For foreground seeds based ranking, all elements are ranked by formula (4) given the sets of seeds in (1)(2). The process of foreground saliency is illustrated in Fig.2(first row).

Fig. 2: Illustration of foreground and background saliency. (a) original image; (b) superpixel segmentation; (c)top: foreground seeds, bottom: background seeds(blue : mask of strong seeds, green: mask of weak seeds); (d)top: foreground saliency map, bottom: background saliency map.

2.2 Background saliency

Complementary to foreground saliency, background saliency aims to extract regions that are different from background in feature distribution. We first select a set of background seeds and then calculate saliency of every image element according to its relevance to these seeds. This section elaborates on the process of seeds estimation and background saliency calculation.

Fig. 3: Comparison of different seeds estimation scheme. (a) original image; (b) superpixel segmentation; (c) our scheme for background seeds estimation; (d) saliency map corresponding to (c); (e) common seeds estimation scheme; (f) saliency map corresponding to (e).

2.2.1 Background seeds estimation

Unlike most previous works [7] that use the elements on image boundary as background seeds, we divide the elements on image border into two categories(strong seeds and weak seeds) as in foreground situation. We denote the average value of all border elements as . The euclidean distance between each feature vector and the average feature vector is computed by , the average of is denoted by . The background seeds are estimated by:


where denotes strong background seeds, denotes weak background seeds.

2.2.2 Background saliency map

Similar to foreground situation, the value of indication vector for background seeds is if belongs to , if belongs to and 0 otherwise. Relevance of each element to background seeds is computed by formula (3). Elements in resulting vector indicates the relevance of a node to the background queries, and its complement is the saliency measure.The saliency map using these background seeds can be written as:


The process of background saliency is shown in Fig.2(second row), and comparison between our seeds estimation scheme and common scheme is illustrated in Fig.3. It is noted that our scheme is robust for extracting more salient regions from an image.

Fig. 4: Visual comparison of saliency models
Fig. 5: (a) PR curve on ASD dataset; (b) precision, recall and F-measure on ASD dataset; (c) PR curve on DUT-OMRON dataset; (d) precision, recall and F-measure on DUT-OMRON dataset;
MAE 0.1105 0.2293 0.1707 0.2195 0.2125 0.2464 0.1123 0.0743 0.1556 0.0689 0.1045 0.0596
AUC 0.8990 0.5903 0.8412 0.4980 0.6754 0.5335 0.9303 0.9308 0.8772 0.9390 0.9344 0.9446
Table 1: Quantitative comparision of MAE and AUC on ASD dataset
MAE 0.1485 0.2406 0.1262 0.2363 0.2278 0.2499 0.1798 0.1418 0.1843 0.1143 0.1646 0.1068
AUC 0.6676 0.4674 0.6648 0.3255 0.5330 0.4450 0.7010 0.7259 0.6700 0.7846 0.7202 0.7619
Table 2: Quantitative comparision of MAE and AUC on DUT-OMRON dataset

2.3 Geodesic distance refinement

A combination of Foreground and background saliency maps is performed as follows: elements whose value is larger than the average value of that map is selected as saliency elements separately in these two maps and combined into one set, a ranking is conducted again using these elements as seeds to get a combination map .

The final step of our proposed approach is refinement with geodesic distance [17]. The motivation underlying this operation is based on observation that determining saliency of an element as weighted sum of saliency of its surrounding elements, where weights are corresponding to Euclidean distance, has a limited performance in uniformly highlighting salient object. We tend to find a solution that could enhance regions of salient object more uniformly. From recent works [18] we found the weights may be sensitive to geodesic distance.

For th superpixel, its posterior probability can be denoted , thus the saliency value of the th superpixel is refined by geodesic distance as follows:


where is the total number of superpixels, and is a weight based on geodesic distance [17] between th and th superpixel. Based on the graph model constructed in section 2.1.2 , the geodesic distance between two superpixels can be defined as accumulated edge weights along their shortest path on the graph:


In this way we can get geodesic distance between any two superpixels in the image. Then the weight is defined as where is the deviation for all values. The salient objects are highlighted uniformly after this step of processing, as will be seen in experiment section.

3 Experiment

This section presents evaluation of our proposed method.

Datesets. We test our proposed model on ASD dataset [5], OUT-OMRON dataset [16]. ASD dataset provides 1000 images with annotated object-contour-based ground truth, while DUT-OMRON dataset provide 5168 more challenging images with pixel-level annotation.

Evaluation metircs. For accurate evaluation, we adopts four metrics: Precion-recall(PR) curve, F-measure, mean absolute error(MAE), and AUC score. Fig.5 shows the PR curves, and precision, recall and F-measure values for adaptive threshold that is defined as twice the mean saliency of the image. Table 1 and table 2 shows the MAE and AUC scores on two datasets.

Comparison We compare our proposed method with 11 state-of-the-art models, including CAS[19], wCtr[17], FT[5], DRFI[7], GBVS[20], ITTI[21], MILPS[22], MR[13], PCA[9], SBD[23], BMS[11] . It is noted that our method highlights salient regions more uniformly and achieves better results especially in PR curve, MAE scores. In general our method outperforms other competitive approaches.

4 conclusion

In this paper, we present a novel and efficient framework for salient object detection via complementary combination of foreground and background priors. The key contributions of our method are: (1) surroundedness cue is utilized for exploiting foreground prior, which is proved to be extremely effective when combined with backgournd prior. (2) A robust seed estimation scheme is provided for seeds selection with their confidence of belonging to background/foreground estimated. Extensive experimental results demonstrate the superiority of our proposed method against other outstanding methods. Our proposed also has a efficient implementation which is useful for real-time applications.


  • [1] Shi Min Hu, Tao Chen, Kun Xu, Ming Ming Cheng, and Ralph R. Martin, “Internet visual media processing: a survey with graphics and vision applications,” Visual Computer, vol. 29, no. 5, pp. 393–405, 2013.
  • [2] Chenlei Guo and Liming Zhang, “A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression,” vol. 19, pp. 185 – 198, 02 2010.
  • [3] Ming Ming Cheng, Fang Lue Zhang, Niloy J Mitra, Xiaolei Huang, and Shi Min Hu, “Repfinder: finding approximately repeated scene elements for image editing,” Acm Transactions on Graphics, vol. 29, no. 4, pp. 1–8, 2010.
  • [4] Zhixiang Ren, Shenghua Gao, Liang Tien Chia, and Wai Hung Tsang, “Region-based saliency detection and its application in object recognition,” IEEE Transactions on Circuits & Systems for Video Technology, vol. 24, no. 5, pp. 769–779, 2014.
  • [5] Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk, “Frequency-tuned salient region detection,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 1597–1604.
  • [6] Zheshen Wang and Baoxin Li, “A two-stage approach to saliency detection in images,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 965–968.
  • [7] Jingdong Wang, Huaizu Jiang, Zejian Yuan, Ming Ming Cheng, Xiaowei Hu, and Nanning Zheng, “Salient object detection: A discriminative regional feature integration approach,” vol. 123, no. 2, pp. 2083–2090, 2014.
  • [8] M. M. Cheng, N. J. Mitra, X. Huang, P. H. Torr, and S. M. Hu, “Global contrast based salient region detection,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 37, no. 3, pp. 569–582, 2015.
  • [9] Margolin Ran, Ayellet Tal, and Lihi Zelnikmanor, “What makes a patch distinct?,” vol. 9, no. 4, pp. 1139–1146, 2013.
  • [10] V Mazza, M Turatto, and C Umiltà, “Foreground-background segmentation and attention: a change blindness study,” Psychological Research, vol. 69, no. 3, pp. 201–210, 2005.
  • [11] Jianming Zhang and Stan Sclaroff, “Exploiting surroundedness for saliency detection: A boolean map approach,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 5, pp. 889, 2016.
  • [12] X. Li, H. Lu, L. Zhang, X. Ruan, and M. H. Yang, “Saliency detection via dense and sparse reconstruction,” in 2013 IEEE International Conference on Computer Vision, Dec 2013, pp. 2976–2983.
  • [13] C. Yang, L. Zhang, H. Lu, X. Ruan, and M. H. Yang, “Saliency detection via graph-based manifold ranking,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp. 3166–3173.
  • [14] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Susstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 34, no. 11, pp. 2274, 2012.
  • [15] Zhou, Dengyong, Weston, Jason, Gretton, Arthur, Bousquet, Olivier, Schölkopf, and Bernhard, “Ranking on data manifolds,” Advances in Neural Information Processing Systems, pp. 169–176, 2003.
  • [16] Chuan Yang, Lihe Zhang, Huchuan Lu, Ruan Xiang, and Ming Hsuan Yang, “Saliency detection via graph-based manifold ranking,” in IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 3166–3173.
  • [17] Wangjiang Zhu, Shuang Liang, Yichen Wei, and Jian Sun, “Saliency optimization from robust background detection,” in Computer Vision and Pattern Recognition, 2014, pp. 2814–2821.
  • [18] Keren Fu, Chen Gong, Irene Y. H. Gu, and Jie Yang, “Geodesic saliency propagation for image salient region detection,” in IEEE International Conference on Image Processing, 2014, pp. 3278–3282.
  • [19] Stas Goferman, Lihi Zelnikmanor, and Ayellet Tal, “Context-aware saliency detection,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 34, no. 10, pp. 1915, 2012.
  • [20] Bernhard Schölkopf, John Platt, and Thomas Hofmann, “Graph-based visual saliency,” Advances in Neural Information Processing Systems, vol. 19, pp. 545–552, 2007.
  • [21] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, Nov 1998.
  • [22] Fang Huang, Jinqing Qi, Huchuan Lu, Ruan Xiang, and Ruan Xiang, “Salient object detection via multiple instance learning,” IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, vol. 26, no. 4, pp. 1911–1922, 2017.
  • [23] Tong Zhao, Lin Li, Xinghao Ding, Yue Huang, and Delu Zeng, “Saliency detection with spaces of background-based distribution,” IEEE Signal Processing Letters, vol. 23, no. 5, pp. 683–687, 2016.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description