A Robust Visual System for Small Target Motion Detection Against Cluttered Moving Backgrounds

A Robust Visual System for Small Target Motion Detection Against Cluttered Moving Backgrounds

Hongxin Wang, Jigen Peng, Xuqiang Zheng and Shigang Yue, Manuscript received August 21, 2018; revised January 14, 2019; accepted April 5, 2019. This work was supported in part by EU HORIZON 2020 Project STEP2DYNA under Grant 691154, in part by EU HORIZON 2020 Project ULTRACEPT under Grant 778062, and in part by the National Natural Science Foundation of China under Grant 11771347. (Hongxin Wang and Jigen Peng contributed equally to this work.)(Corresponding author: Shigang Yue.)H. Wang and S. Yue are with the Machine Life and Intelligence Research Center, Guangzhou University, Guangzhou 510006, China, and also with the Computational Intelligence Lab, School of Computer Science, University of Lincoln, Lincoln LN6 7TS, U.K. (email: syue@lincoln.ac.uk).J. Peng is with the School of Mathematics and Information Science, Guangzhou University, Guangzhou 510006, China (email: jgpeng@gzhu.edu.cn).X. Zheng is with the Institute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, China.Color versions of one or more figures in this paper are available at https://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TNNLS.2019.2910418
Abstract

Monitoring small objects against cluttered moving backgrounds is a huge challenge to future robotic vision systems. As a source of inspiration, insects are quite apt at searching for mates and tracking prey – which always appear as small dim speckles in the visual field. The exquisite sensitivity of insects for small target motion, as revealed recently, is coming from a class of specific neurons called small target motion detectors (STMDs). Although a few STMD-based models have been proposed, these existing models only use motion information for small target detection and cannot discriminate small targets from small-target-like background features (named as fake features). To address this problem, this paper proposes a novel visual system model (STMD+) for small target motion detection, which is composed of four subsystems – ommatidia, motion pathway, contrast pathway and mushroom body. Compared to existing STMD-based models, the additional contrast pathway extracts directional contrast from luminance signals to eliminate false positive background motion. The directional contrast and the extracted motion information by the motion pathway are integrated in the mushroom body for small target discrimination. Extensive experiments showed the significant and consistent improvements of the proposed visual system model over existing STMD-based models against fake features.

Visual system model, neural modeling, small target motion detector (STMD), cluttered natural environment, background motion.

I Introduction

The dynamic visual world is often complex, with many motion cues at different speeds, directions, distances and orientations, exhibiting various physical characteristics such as size, colour, texture and shape [1, 2, 3, 4, 5, 6]. Being able to detect target motion in the distance and early would put an entity (a robot or an animal) in a good position to prepare for interaction/competition, for example, a flying insect searching for mates in the distance. In the visual world, detecting visual motion in the distance and early often means dealing with small targets with only one or a few pixels in size let alone other physical characteristics. Small target motion detection has a wide variety of applications in defences, surveillance, security and road safety. However, detecting small targets against cluttered moving backgrounds is always a challenge for artificial visual systems due to limited physical cues of small targets, free motion of camera, and extremely cluttered backgrounds.

How to detect small target motion in cluttered moving backgrounds robustly with limited resources? Research in insects’ visual system have revealed one effective solution. Insects show exquisite sensitivity for small target motion [7] and are able to pursue small flying targets with high capture rates [8]. Biological research demonstrates that a class of specific neurons, called small target motion detectors (STMDs), can account for insects’ exquisite sensitivity for small target motion [9, 10, 7]. These STMD neurons give peak responses to small targets subtending of the visual field, with no response to large bars (typically ) or to background movements represented by wide-field grating stimuli [11]. Build a quantitative STMD model is the first step for not only further understanding of the biological visual system but also providing robust and economic solutions of small target detection for an artificial vision system.

The electrophysiological knowledge about the STMD neurons revealed in the past few decades, makes it possible to propose quantitative models, such as elementary small target motion detector (ESTMD) [12] and directionally selective small target motion detector (DSTMD) [13]. Using motion information111Motion information refers to luminance changes of a pixel with respect to time. From the view of mathematics, it is equivalent to temporal derivative of a pixel. extracted by large monopolar cells (LMCs) [14, 15], these models are able to detect small moving targets in cluttered backgrounds. However, they cannot discriminate small moving targets from small-target-like background features (as shown in Fig. 1), which means that their detection results may contain a large number of false positives. This is because (1) small-target-like background features are embedded in the cluttered background such as bushes, trees and/or rocks, (2) they are moving with the whole background due to a free flying animal/camera. In this case, these small-target-like features (named as fake features) cannot be simply filtered out by existing STMD-based models with motion information only for small target motion detection. To address this problem, other visual information, such as directional contrast222Directional contrast denotes luminance changes of a pixel along different spatial directions. From the view of mathematics, it corresponds to directional derivatives of a pixel., should be combined with motion information for distinguishing small targets from fake features.

In the insects’ visual systems, multiple visual cues are extracted by different specialized neural circuits [16, 17, 18]. Multiple neural circuits could be coordinated to discriminate small target motion. For example, in the lamina layer, large monopolar cells (LMCs) [14, 15] have been described as temporal band-pass filters which extract motion information from luminance signals [12, 13, 19]; and amacrine cells (AMCs) [20, 21, 22] linked to multi adjacent ommatidia with thin extending fibers, may constitute a contrast pathway with their downstream neurons to extract directional contrast from luminance signals. Although the contribution from the AMCs to STMD neural circuits in insects is unknown, it is clear that with directional contrast and motion information together, an artificial vision system could discriminate small moving targets from fake features robustly.

Inspired by the above biological findings, this paper proposes a new visual system model (STMD+) to detect small target motion in cluttered moving backgrounds. The main contribution of this work is combining motion information with directional contrast to successfully discriminate small targets from fake features. The rest of this paper is organized as follows. Section II reviews related work on small target motion detection. In Section III, we introduce our proposed visual system model. Section IV provides extensive performance evaluation as well as comparisons against the existing models. Finally, we conclude this paper in Section V.

Fig. 1: A small target is moving in the cluttered natural background which contains a number of small-target-like features (or called fake features). The small target and fake features all appear as small dim speckles whose sizes vary from one pixel to a few pixels, since they are far away from the animal/camera.

Ii Related Work

Small target motion detection aims to detect objects of interest which move against cluttered natural environments and appear as small dim speckles333The sizes of small dim speckles vary from pixel to pixels, whereas other physical characteristics, such as color, shape and texture, are difficult to recognize and cannot be used for motion detection. in images. Inspired by the insect’s motion-sensitive neurons, several models have been developed to detect small target motion. In this section, we firstly review motion-sensitive neural models, then briefly discuss traditional motion detection and small target detection approaches.

Ii-a Motion-sensitive Neural Models

Small target motion detectors (STMDs) [9, 10, 11] and lobula plate tangential cells (LPTCs) [23, 24] are widely investigated motion-sensitive neurons, where the former shows exquisite sensitivity to small target motion while the latter responds strongly to wide-field motion.

Wiederman et al. [12] presented a mathematical model called ESTMD to simulate STMD neurons. It can detect the presence of small moving targets, but is unable to estimate motion direction. To address this issue, directional selectivity has been introduced into the ESTMD [25, 19, 13]. However, these models cannot discriminate small targets from fake features, as they only make use of motion information.

The first LPTC model called elementary motion detector (EMD) [26], is originally inferred from the insects’ behavior. Following that, several studies have been done to further improve the EMD, such as [27, 28, 29]. These models can detect all objects’ motion, nevertheless they are unable to distinguish small moving objects from large ones.

Ii-B Traditional Motion Detection Methods

Traditional motion detection methods such as optical flow [30], background subtraction [31] and temporal differencing [32], have been developed to detect normal-sized objects like pedestrians and vehicles. They utilize physical characteristics including shape, color and texture, to segment regions corresponding to moving objects from the background. Nonetheless, these methods would be powerless for objects that are as small as one pixel or a few pixels, because it is difficult to identify objects’ physical characteristics in such small sizes. Additionally, the above-mentioned methods may not work for cluttered moving backgrounds, as small moving objects could be submerged among the pixel error when applying background motion compensation [33].

Ii-C Infrared Small Target Detection

Previous research and application of small target detection has mainly focused on infrared images [34, 35, 36]. These infrared-based methods strongly rely on significant temperature differences between the background and objects of interest, such as rockets, jets and missiles. However, such significant temperature difference is rare in natural world. Moreover, the detection environment of these methods were mainly sky and/or ocean, which are much more clear and homogeneous than the cluttered natural environments. These infrared-based methods may not work in a natural environment with lots of bushes, trees, sunlight and shadows, let alone to meet the needs of compact in size and low energy consumption in real applications [37, 38, 39, 40, 41].

Iii Formulation of the System

Fig. 2: (a) Schematic illustration of the proposed visual system model (STMD+). (b) Image processing of the proposed visual system model. (c) Directional contrast on two motion traces which are caused by the small target and fake feature, respectively. Directional contrast is denoted by arrows along different directions where the arrow’s length represents the strength of the directional contrast. For the small target (top), its directional contrast varies significantly with time. However, for the fake feature (bottom), its directional contrast shows little change over time. (d) Directional contrast along direction of the small target (top) and fake feature (bottom) with respect to time.

In this section, we first illustrate the proposed visual system model schematically, then elaborate on its components in following subsections.

The proposed visual system model is composed of four subsystems, including ommatidia, motion pathway, contrast pathway and mushroom body [42, 43], as illustrated in Fig. 2(a). The luminance signals are received and smoothed by the ommatidia, then applied to the motion and contrast pathways. These two pathways separately extract motion information and directional contrast which are finally integrated in the mushroom body to discriminate small targets from fake features.

Fig. 2(b) shows the image processing of the proposed visual system model, where the input image sequence is processed frame by frame. In each frame, both small targets and fake features are located by computing luminance changes of each pixel over time, while directional contrast is obtained by calculating luminance changes of each pixel along different directions. The detected positions and directional contrast are further processed as follows.

  1. Successively record the detected positions to infer motion traces.

  2. Extract the directional contrast on each motion trace.

  3. Compute the standard deviation of directional contrast on each motion trace and compare it with a threshold for distinguishing small targets from fake features.

(a)
(b)
Fig. 5: Wiring sketches of motion and contrast pathways. In subplots, each colored node denotes a neuron. For clear illustration, only one STMD and T1 neurons are presented here. (a) Motion pathway. (b) Contrast pathway. Note that each AMC collects signals from multiple ommatidia while each LMC receives signals from a single ommatidium.
Fig. 6: Schematic illustration of models of motion and contrast pathways. For clear illustration, only one STMD and T1 neurons are presented here. However, these types of neurons are all arranged in matrix form in the proposed visual system.

Our motivation is mainly based on the following observations: the directional contrast of small targets varies significantly with time, since they have relative movement to the background; on the contrary, the directional contrast of fake features shows little change over time, as they are static relative to the background. The variation amount in the directional contrast with time is represented by the standard deviation, which is taken as the criterion for small target discrimination. Fig. 2(c) visually displays the directional contrast on two typical motion traces that are separately caused by the small target and fake feature. As an example, Fig. 2(d) presents the directional contrast along direction, which is used to calculate the standard deviation for this direction.

Iii-a Ommatidia

Ommatidia act as luminance receptors to perceptive visual stimuli from the natural world [44]. In the proposed visual system, they are arranged in a matrix and modelled as spatial Gaussian filters, each of which captures and smooths the luminance of each pixel in the input image. Formally, let denote the input image sequence, where and are spatial and temporal field positions. The output of an ommatidium is given by,

(1)

where is a Gaussian function, defined as

(2)

Iii-B Motion Pathway

As shown in Fig. 5(a), the motion pathway consists of large monopolar cells (LMCs) [14, 15], medulla neurons (i.e., Mi1, Tm1, Tm2 and Tm3)[45, 46], small target motion detectors (STMDs) [9, 10, 11]. The output of ommatidia is first fed into LMCs, then processed by medulla neurons and finally integrated by STMDs. Fig. 6(a) displays the model of the motion pathway, which is elaborated as follows.

1) Large Monopolar Cells (LMCs): Objects’ motion can induce luminance changes of pixels with time. These luminance changes are extracted by the LMCs, each of which is modelled by a temporal band-pass filter that is defined as the difference of two Gamma kernels (see Fig. 6(a)). That is,

(3)
(4)

where denotes the impulse response of the band-pass filter, stands for the Gamma kernel [47], and refers to the order and time constant of the Gamma kernel . Then the output of each LMC can be calculated by convolving with the output of ommatidia ,

(5)

The reflects luminance changes of pixel over time , where a positive means luminance increase while a negative suggests luminance decrease.

2) Medulla Neurons: Medulla neurons including Tm1, Tm2, Tm3 and Mi1, constitute four parallel channels to process the output of LMCs . The Tm3 and Tm2 are modelled as half-wave rectifiers to separate into luminance increase and decrease components. Let and denote the output of the Tm3 and Tm2, respectively, then they are given by

(6)
(7)

where denotes . The Mi1 and Tm1 further temporally delay and by convolving them with a Gamma kernel. That is,

(8)
(9)

where and represent the outputs of the Mi1 and Tm1, respectively; and are the order and time constant of the Gamma kernel, which separately determine the order and time-delay length of the time delay unit (TDU) (see Fig. 6(a)).

3) Small Target Motion Detectors (STMDs): As can be seen from Fig. 6(a), each STMD collects the outputs of medulla neurons located at two pixels, i.e., and which are defined as

(10)

where is a constant, denotes the preferred direction of the STMD. When a dim object successively moves over pixels and , a luminance decrease followed by a luminance increase will appear at each of these two pixels. These luminance increase and decrease signals are first aligned in time domain and then multiplied together so as to produce a large response [13]. That is,

(11)

where denotes the output of the STMD neuron with a preferred direction . Here, belongs to , corresponding to eight preferred directions of STMD neurons (see Fig. 9). It is worthy to note that , and are determined by the different delays among the luminance changes, while , and are accordingly tuned to guarantee appropriate Gamma kernel shapes [13].

So far, the obtained can detect both small and large moving objects in the forms of producing a large response. In order to suppress the responses to large moving objects, the is further laterally inhibited by convolving with an inhibition kernel . That is,

(12)

where represents the inhibited signal; the inhibition kernel is defined as

(13)
(14)

where and respectively denote and ; , , and are constant.

By comparing the with a detection threshold , we can find the positions of small moving objects. Specially, if , then we believe that a small object moving along direction is located at pixel and time . However, it cannot distinguish small targets and fake features that can be both recognized as small moving objects. To address this issue, we construct a contrast pathway accounting for directional contrast calculation.

(a)
(b)
Fig. 9: (a) Illustration of neurons which are located at the same position, but have different preferred directions. The black arrows denote preferred directions. (b) Illustration of different preferred directions in the x-y plane.

Iii-C Contrast Pathway

(a)
(b)
(c)
(d)
Fig. 14: Illustration of the convolution kernel . (a) . (b) . (c) . (d) .

As shown in Fig. 5(b), contrast pathway is composed of amacrine cells (AMCs)[20, 21, 22] and T1 neurons [48, 49]. The output of ommatidia is firstly fed into AMCs, then processed by T1 neurons. Fig. 6(b) displays the model of the contrast pathway, which is elaborated as follows.

1) Amacrine Cells (AMCs): Each AMC receives the output of multiple ommatidia located in a small region and serves as a weighted summation unit, as presented in Fig. 6(b). Here, we define the weight function as

(15)

where is constant. Then the output of each AMC can be given by

(16)

where is the output of ommatidia defined in (1).

2) T1 Neurons: The T1 neuron layer is adopted to extract the directional contrast along different directions. The directional contrast at along direction is defined as the difference between the outputs of two AMCs that are located at and . Here, is a constant. Let denote the output of a T1 neuron with a preferred direction , then it can be given by

(17)

Substituting (16) in (17), we have

(18)

where the convolution kernel represents

(19)

Here belongs to , corresponding to four preferred directions of T1 neurons. It is worthy to note that the convolution kernel is one of the directional derivative operators [50, 51], which can extract anisotropic luminance variations (see Fig. 14).

Iii-D Mushroom Body

In the proposed visual system, the mushroom body [42, 43] receives two types of neural outputs, including the output of STMDs and the output of T1 neurons . These neural outputs are integrated to discriminate small targets from fake features via the following three procedures.

1) Motion Trace Recording: The output of STMDs is employed to record motion traces of small objects. For a detection threshold and a starting time , if there exists a pixel A and motion direction which satisfy , then we believe that a small object444The detected object could be a small target or a fake feature, which cannot be discriminated by the STMDs. is detected at pixel and its motion direction is . Similarly, at next time step , another pixel B and motion direction can be detected. Especially, if pixel B is the nearest detected point to pixel A , and pixel B is in the small neighborhood of pixel A, then we believe that pixels A and B belong to the same motion trace denoted by . Repeating the above steps, the motion trace can be recorded during a time period, as shown in Fig. 15. The can be described as,

(20)

where and represent x and y coordinates at time , denotes motion direction, and are the starting time and current time.

Fig. 15: Motion trace recording. Each node denotes a detected pixel while each circle represents a small neighborhood. If pixel B is the nearest detected point to pixel A, and pixel B is in the neighborhood of pixel A, then we believe that pixels A and B belong to the same motion trace. Repeating this step, a motion trace could be recorded.

2) Information Integration: Once motion traces are recorded, we can obtain their directional contrast by substituting (20) into . That is,

(21)

where denotes the directional contrast along direction on the motion trace ; stands for the point on the motion trace. To quantify the variation amount in the directional contrast, we calculate the standard deviation () of the during a time period , denoted by . Here represents the sample number for the calculation.

3) Small Target Discrimination: We determine whether a detected object is a small target or a fake feature, using the standard deviations of the directional contrast on the object’s motion trace, i.e., . If the is smaller than a certain threshold, we believe that the detected object is a fake feature; Otherwise, it is a small target.

Iii-E Parameter Setting

Parameters of the proposed visual system model are listed in Table I, where the parameters of the motion pathway are determined by the analysis in [13] while those of the contrast pathway are tuned empirically. These parameters are chosen to satisfy the functionality, which are mainly determined by the velocity and size ranges of the moving targets. They will not be changed in the following experiments unless stated.

The proposed visual system model is written in Matlab (The MathWorks, Inc., Natick, MA). The computer used in the experiments is a standard laptop with a GHz Intel Core i7 CPU and GB DDR3 memory. The source code can be found at https://github.com/wanghongxin/STMD-Plus.

Eq. Parameters
(1)
(3)
(10)
(11)
(13)
(14)
(15)
(17)
TABLE I: Parameters of the proposed visual system model.

Iv Results and Discussions

The proposed visual system model is evaluated on a synthetic dataset [52] and a real dataset (STNS dataset) [19]. The synthetic dataset contains a number of image sequences which are synthesized by using real background images and a computer generated small target (a black block). These image sequences all display the motion of the small target against the cluttered moving backgrounds, which are different in the target sizes, target velocities, background velocities, background types and so on. The sampling frequencies of the synthetic videos are all equal to Hz. The STNS dataset is a collection of real videos featuring various moving targets and environments. The scenarios include many kinds of challenges, such as heavy clutter, camera motion and changes in overall brightness. The STNS dataset (videos and manual ground truth annotations) is available at https://figshare.com/articles/STNS_Dataset/4496768.

To quantitatively evaluate the detection performance, two metrics are defined as following [34],

(22)
(23)

where and denote detection rate and false alarm rate, respectively. The detected result is considered correct if the pixel distance between the ground truth and the result is within a threshold ( pixels).

Iv-a Signal Processing in the Motion Pathway

Fig. 16: Input frame at time ms whose resolution is pixels (in horizontal) by pixels (in vertical). The small target (the black block) and the cluttered background are moving from left to right. Their velocities are all equal to pixel/s, where arrow denotes the motion direction of the background. The tree which is regarded as a large object, is also moving due to the background motion.
Fig. 17: In each subplot, the horizontal axis denotes coordinate while the vertical axis represents neural outputs. (a) Input luminance signal . (b) Ommatidium output . (c) LMC output .
Fig. 18: In each subplot, the horizontal axis denotes coordinate while the vertical axis represents neural outputs. (a) Four inputs of the STMDs when the preferred direction is set to , i.e., , , and . (b) STMD output when the preferred direction is equal to .

To intuitively illustrate the signal processing in the motion pathway, we observe the output of each neural layer with respect to by setting and as pixel and ms. Fig. 16 shows the input frame at time ms, where the luminance signal on the middle line is presented in Fig. 17(a). Its resulting ommatidium output and LMC output are displayed in Fig. 17(b) and (c), respectively. The ommatidum output is a smoothed version of the input signal. The LMC output reveals the luminance changes of pixels, where the positive values correspond to luminance increase while the negative values suggest luminance decrease.

Fig. 18(a) demonstrates the four inputs of the STMDs when the preferred direction is set to . Specifically, the is the positive part of the LMC output; the denotes the delayed version of the positive part of the LMC output with a shift of pixels; the stands for the delayed version of the negative part of the LMC output; and the represents the delayed version of negative part of the LMC output with a shift of pixels. Fig. 18(b) further shows the output of STMDs, where a high response appears at the position of the small target () while the responses at other positions are effectively suppressed. This is because the four peaks located at the position of the small target are aligned (see Fig. 18(a)), which will produce a strong response after the multiplication, summation and lateral inhibition in the STMD (see Fig. 6). For other positions e.g., , the peaks on the four curves exhibit a low aligning probability, hence producing a weak response. Note that the lateral inhibition is introduced to suppress the responses to large objects, such as the tree displayed in Fig. 16.

It is worthy to note that the above analysis is based on the presetting of the preferred direction . When we change the preferred direction , different STMD outputs can be calculated. Fig. 19 presents the STMD outputs at the positions and along eight preferred directions , where is the position of the small target and corresponds to the position of the large tree. As shown in Fig. 19(a), for the small target, the STMD shows strong directional selectivity. As the preferred direction deviates from the motion direction of the small target, the STMD output will decrease correspondingly. On the other hand, the direction of the small target can be estimated by computing the summation of these output vectors [13]. For the large tree (see Fig. 19(b)), the outputs of the STMD along eight preferred directions are very low, suggesting that the STMD is not interested in large moving objects.

Fig. 19: In the polar coordinate system, the angular coordinate represents the preferred direction while the radial coordinate denotes the STMD output. (a) STMD outputs at position along eight preferred directions . The blue arrow stands for the motion direction of the small target. (b) STMD outputs at position along eight preferred directions .

Iv-B Characteristics of the STMD

Fig. 20: External rectangle and neighboring background rectangle of an object. Arrow denotes the motion direction of the object. represents object width while stands for object height.
(a)
(b)
(c)
(d)
Fig. 25: STMD outputs to moving objects with different Weber contrast, velocities, widths and heights. (a) Different Weber contrast. (b) Different velocities. (c) Different widths. (d) Different heights.

To further demonstrate the characteristics of the STMD, we compare its outputs to objects with different velocities, widths, heights and Weber contrast. As shown in Fig. 20, width (or height) represents object length extended parallel (or orthogonal) to the motion direction. Weber contrast is defined by the following equation,

(24)

where is the average pixel value of the object, while is the average pixel value in neighboring area around the object. If the size of a object is , the size of its background rectangle is , where is a constant which equals to pixels. The initial Weber contrast, velocity, width and height of the object are set as , pixel/s, pixels and pixels, respectively.

Fig. 25(a) shows the STMD output with respect to the Weber contrast. As can be seen, the STMD output increases as the increase of Weber contrast, until reaches maximum at Weber contrast . This indicates that the higher Weber contrast of an object is, the easier it can be detected. Fig. 25(b) presents the STMD output with regard to the velocity of the moving object. Obviously, the STMD output peaks at an optimal velocity ( pixel/s). The STMD also exhibits high responses to the objects whose velocities range from to pixel/s. Fig. 25(c) and (d) display the output of the STMD when changing the width and height of the object, which indicate that the STMD prefers moving objects whose widths and heights are smaller than pixels.

These characteristics of the STMD revealed in Fig. 25(a)-(d), are called Weber contrast sensitivity, velocity selectivity, width selectivity and height selectivity, respectively, which have been already found in the STMD neurons in biological research [9, 10, 7].

Iv-C Effectiveness of the Contrast Pathway

(a)
(b)
Fig. 28: (a) Representative frame of the input image sequence. A small target (the small black block) highlighted by the circle, is moving against the cluttered background. The background which contains a number of fake features, is also moving from left to right where arrow denotes the background motion direction. (b) Motion trace of the small target during time period ms, i.e., ground truth. In this subplot, color represents motion direction of the small target.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 37: (a)-(d) Motion traces detected by the STMD+ without the contrast pathway under different detection thresholds which are set as , , and , respectively. (e)-(h) Motion traces detected by the STMD+ with the contrast pathway under different detection thresholds which are set as , , and , respectively.
(a)
(b)
(c)
Fig. 41: Motion traces detected by the ESTMD, DSTMD and STMD+. For fair comparison, the three models have fixed detection rates (). (a) ESTMD. (b) DSTMD. (c) STMD+. Since the ESTMD cannot detect motion direction, its outputs are all shown in black color.

In the proposed visual system model, we design a contrast pathway and incorporate it with the motion pathway to discriminate small targets from fake features. To validate its effectiveness, we first compare the performance of the STMD+ with and without the contrast pathway. Then we conduct a performance comparison between the developed STMD+ and two baseline models including ESTMD [12] and DSTMD [13]. The testing setups are detailed as follows: the input image sequence is presented in Fig. 28(a), which displays a small target moving against the cluttered background; the background is moving from left to right and its velocity is pixel/s; the luminance, size and velocity of small target are equal to , pixels and pixel/s, respectively; the motion trace of the small target during time period ms is illustrated in Fig. 28(b).

Fig. 37(a)-(d) displays the motion traces detected by the STMD+ without the contrast pathway under different detection thresholds . As can be seen, these detection results all contain numerous fake features. When increasing the detection threshold, the detected fake features will decrease while the detected motion trace becomes more incomplete. After applying the contrast pathway, the fake features are all filtered out even under different detection thresholds (see Fig. 37(e)-(h)). The specific detection rate () and false alarm rate () are presented Table II.

Fig. 41 demonstrates the motion traces detected by the ESTMD, DSTMD and STMD+, where the detection rates () of the three models are all set to for fair comparison. As can be seen, the detection results of the ESTMD and DSTMD are seriously contaminated by a number of fake features, whereas that of the STMD+ is noiseless.

(a)
(b)
Fig. 44: (a) Directional contrast on the motion trace caused by the small target. (b) Directional contrast on the motion trace caused by the fake feature. In each subplot, the directional contrast along four directions is presented.
(a)
(b)
Fig. 47: Standard deviations under different sample numbers. (a) Standard deviations of the small target. (b) Standard deviations of the fake feature.
Threshold Without* With#
  • The STMD+ without the contrast pathway.

  • The STMD+ with the contrast pathway.

TABLE II: Detection rate () and false alarm rate () of the STMD+ with and without the contrast pathway under different detection thresholds .
Standard deviation
Small target
Fake feature
TABLE III: Standard deviations of the direction contrast.

To reveal the role of the contrast pathway, we analyze the directional contrast on two motion traces chosen from Fig. 37(a), where one is the small target motion trace, and the other is a randomly selected fake feature trace. Fig. 44 presents the directional contrast on these two motion traces. Note that each motion trace has four directional contrast along four directions . As shown in Fig. 44(a), the directional contrast on the motion trace caused by the small target displays significant changes over time. In contrast, the directional contrast of the fake feature trace remains almost unchanged with respect to time (see Fig. 44(b)). The calculated standard deviations of the directional contrast on these two motion traces are listed in Table III, where the sample number is equal to . Obviously, the standard deviations of the small target are much larger than those of the fake feature, suggesting that the small target can be discriminated from fake features by comparing their standard deviations.

We further study the relationship of the standard deviations with regard to the sample number (see Fig. 47). As it is shown, the standard deviations of the small target exhibit a sharp rise when the sample number increases from to . With the continuous growth of the sample number, the standard deviations tend to be stable. Similarly, the standard deviations of the fake feature become stable as the increase of the sample number. Above results indicate that a certain number of samples which is at least greater than , is needed to obtain stable standard deviations.

Iv-D Comparison on Synthetic and Real Datasets

Parameter Initial sequence Group 1 Group 2 Group 3 Group 4 Group 5 Group 6
Target velocity (pixel/s) 250 250 250 250 250 250
Target size ()
Target luminance
Background velocity (pixel/s)
Background motion direction rightward rightward rightward rightward rightward leftward rightward
Background Image Fig.28(a) Fig.28(a) Fig.28(a) Fig.28(a) Fig.28(a) Fig.28(a) Fig.58(a) (c)
TABLE IV: Details of the initial image sequence and six groups of image sequences. Comparing to the initial image sequence, Group to are composed of image sequences with different parameters.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 54: (a) Receiver operating characteristic (ROC) curves of the three models for the initial image sequence. (b)-(f) Detection rates of the three models for the Group 1-5. For fair comparison, the three models have fixed false alarm rate (). (b) Group 1, different target velocities. (c) Group 2, different target sizes. (d) Group 3, different target luminance. (e) Group 4, different background velocities (in rightward motion). (f) Group 5, different background velocities (in leftward motion).

In this section, six groups of synthetic image sequences are first utilized to evaluate the performance of the proposed model in terms of different target velocities, target sizes, target luminance, background velocities, background motion directions and background images. The details of the synthetic image sequences are listed in Table IV. Then the proposed model is further tested on the real dataset (STNS dataset [19]). The performance comparison between the proposed STMD+ and two baseline models (namely, ESTMD and DSTMD), is also conducted.

Fig. 54(a) shows the receiver operating characteristics (ROC) curves of the three models for the initial synthetic image sequence. It can be seen that the STMD+ has better performance than the baseline models. More precisely, the STMD+ has higher detection rates () compared to the baseline models while the false alarm rates () are low. Fig. 54(b)-(d) display the detection rates of the three models for the Group to , where the false alarm rates of the three models are all equal to for fair comparison. From Fig. 54(b) and (c), we can see that the STMD+ significantly outperforms the baseline models. The STMD+ has higher detection rates than the baseline models for different target velocities and sizes. The detection rate of the STMD+ remains stable when the target velocity (or size) ranges from to pixel/s (or from to ). In contrast, the detection rates of the two baseline models significantly decrease after reach the maximum points. As it shown in Fig. 54(d), the STMD+ consistently performs best under different target luminance. It is worthy to note that the detection rates of the three models all decrease with the increase of target luminance. In Fig. 54(e) and (f), we can see that the STMD+ has the better performance than the baseline models under different background velocities and directions.

(a)
(b)
(c)
Fig. 58: Background images and receiver operating characteristic (ROC) curves of the three models for the Group 6, different backgrounds.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 65: Receiver operating characteristic (ROC) curves of the three models for the six real image sequences. (a) Real image sequence (STNS-). (b) Real image sequence (STNS-). (c) Real image sequence (STNS-). (d) Real image sequence (STNS-). (e) Real image sequence (STNS-). (f) Real image sequence (STNS-).

Fig. 58 presents the ROC curves of the three models for the Group . As can be seen, the STMD+ outperforms the baseline models in different backgrounds. Note that the three models all perform well in Fig. 58(a). Their detection rates are all close to when the false alarm rates are low, and show small differences. This is because the background is much more homogeneous and contains less fake features. However, in more cluttered backgrounds such as Fig. 58(b) and (c), the STMD+ has a much better performance than the other two models.

We further tested the developed model on the publicly available STNS dataset [19]. Fig. 65 illustrates the ROC curves of the three models for the six real image sequences, where the numbers of these six image sequences in the STNS dataset are , , , , and , respectively. As it is shown in the six subplots, the detection rates of the STMD+ are higher than those of two baseline models when the false alarm rates are given. That is, the STMD+ obtains the best performance for all six real sequences, which means that the STMD+ can work more stably for different cluttered backgrounds and target types.

V Conclusion

In this paper, we have proposed a visual system model (STMD+) for small target motion detection in cluttered backgrounds. The visual system contains two parallel information pathways and is capable of discriminating small targets from fake features. The first pathway called motion pathway, is intended to locate all small moving objects by calculating luminance changes over time at each pixel. The second pathway called contrast pathway, is designed to capture the directional contrast by computing luminance changes of each pixel along different directions. The mushroom body is introduced to fuse the two types of information from the two pathways. Finally, small targets are distinguished from fake features by comparing the standard deviations of the directional contrast on their motion traces. Comprehensive evaluation on the synthetic and real datasets, and comparisons with the existing STMD models demonstrate the effectiveness of the proposed visual system in filtering out fake features and improving detection rates. In the future, we will investigate the self-adaptability of the proposed visual system in different environments to further improve the robustness.

Appendix

For demonstration of actual implementations, we attach pseudo-code form of the STMD Plus (see Algorithm 1). We further briefly discuss the complexity of the proposed method for small target motion detection. As shown in Algorithm 1, the computational time of our method mainly consists of four parts: the ommatidia, the motion pathway, the contrast pathway and the mushroom body.

The computational complexity of the ommatidia is essentially determined by a 2-D spatial convolution of the input image with a Gaussian kernel (see Equation (1)), which can be implemented in time for an input image and a kernel.

In the motion pathway, the LMC output can be regarded as the difference of two Gamma convolutions (see Equation (3)-(5)). Since the temporal Gamma convolution needs cost where is the length of the Gamma kernel, the computational complexity of the LMC scales with . Similarly, the total cost of the four medulla neurons is about . According to (11), the computational complexity of the STMD is for each preferred direction, so its entire cost grows like where denotes the number of the preferred directions. Finally, the lateral inhibition mechanism which is implemented by a 2-D convolution (see Equation (12)), needs cost. Thus the entire computational complexity of the motion pathway is .

In the contrast pathway, the directional contrast of each pixel along different spatial directions is calculated by convolving the ommatidium output with directional derivative operators (see Equation (17)). Since the 2-D spatial convolution needs cost for each spatial direction, the entire computational complexity of the contrast pathway is .

In the mushroom body, the nearest neighbor of each detected object is calculated for recording motion trace, which can be implemented in time [53] where is the number of the detected objects. In addition, the cost of standard deviation calculation is around where represents the sample number. So the entire computational complexity of the mushroom body is around .

Based on the aforementioned analysis, the entire computational complexity of the proposed STMD Plus is around where stands for the number of input images.

0:  Image sequence , where .
0:  Positions of small moving targets in each input image.
1:  for each input image do
2:     // Ommatidia
3:     Calculate the output of the ommatidium via (1);
4:     // Motion Pathway
5:     Calculate the output of the LMC via (5);
6:     Calculate the outputs of the medulla neurons via (6)-(9);
7:     Calculate the output of the STMD via (11);
8:     Calculate the laterally inhibited output via (12);
9:     // Contrast Pathway
10:     Calculate the output of the AMC via (16);
11:     Calculate the output of the T1 neuron via (17);
12:     // Mushroom Body
13:     Calculate motion traces of the detected objects via (20);
14:     for each motion trace do
15:        Calculate the directional contrast of the motion trace via (21);
16:        Calculate the standard deviations () of the directional contrast on the motion trace;
17:        if  then
18:           the detected object is a small target;
19:        else
20:           the detected object is a fake feature.
21:        end if
22:     end for
23:  end for
Algorithm 1 Detection Process of the STMD Plus

References

  • [1] S. Yue, F. C. Rind, M. S. Keil, J. Cuadri, and R. Stafford, “A bio-inspired visual collision detection mechanism for cars: Optimisation of a model of a locust neuron to a novel environment,” Neurocomputing, vol. 69, no. 13, pp. 1591–1598, Feb. 2006.
  • [2] S. Yue and F. C. Rind, “Collision detection in complex dynamic scenes using an lgmd-based visual neural network with feature enhancement,” IEEE Trans. Neural Netw., vol. 17, no. 3, pp. 705–716, May 2006.
  • [3] M. Woźniak and D. Połap, “Adaptive neuro-heuristic hybrid model for fruit peel defects detection,” Neural Netw., vol. 98, pp. 16–33, Feb. 2018.
  • [4] C. Yan, H. Xie, J. Chen, Z. Zha, X. Hao, Y. Zhang, and Q. Dai, “A fast uyghur text detector for complex background images,” IEEE Trans. Multimed., vol. 20, no. 12, pp. 3389–3398, Dec. 2018.
  • [5] C. Yan, H. Xie, S. Liu, J. Yin, Y. Zhang, and Q. Dai, “Effective uyghur language text detection in complex background images for traffic prompt identification,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 1, pp. 220–229, Jan. 2018.
  • [6] C. Yan, H. Xie, D. Yang, J. Yin, Y. Zhang, and Q. Dai, “Supervised hash coding with deep neural network for environment perception of intelligent vehicles,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 1, pp. 284–295, Jan. 2018.
  • [7] P. D. Barnett, K. Nordström, and D. C. O’Carroll, “Retinotopic organization of small-field-target-detecting neurons in the insect visual system,” Curr. Biol., vol. 17, no. 7, pp. 569–578, Apr. 2007.
  • [8] M. Mischiati, H.-T. Lin, P. Herold, E. Imler, R. Olberg, and A. Leonardo, “Internal models direct dragonfly interception steering,” Nature, vol. 517, no. 7534, pp. 333–338, Jan. 2015.
  • [9] M. F. Keleş and M. A. Frye, “Object-detecting neurons in drosophila,” Curr. Biol., vol. 27, no. 5, pp. 680–687, Mar. 2017.
  • [10] K. Nordström, P. D. Barnett, and D. C. O’Carroll, “Insect detection of small targets moving in visual clutter,” PLoS Biol., vol. 4, no. 3, p. e54, Feb. 2006.
  • [11] K. Nordström, “Neural specializations for small target detection in insects,” Curr. Opin. Neurobiol., vol. 22, no. 2, pp. 272–278, Apr. 2012.
  • [12] S. D. Wiederman, P. A. Shoemaker, and D. C. O’Carroll, “A model for the detection of moving targets in visual clutter inspired by insect physiology,” PLoS One, vol. 3, no. 7, pp. 1–11, Jul. 2008.
  • [13] H. Wang, J. Peng, and S. Yue, “A directionally selective small target motion detecting visual neural network in cluttered backgrounds,” IEEE Trans. Cybern., to be published, doi: 10.1109/TCYB.2018.2869384.
  • [14] L. Freifeld, D. A. Clark, M. J. Schnitzer, M. A. Horowitz, and T. R. Clandinin, “Gabaergic lateral interactions tune the early stages of visual processing in drosophila,” Neuron, vol. 78, no. 6, pp. 1075–1089, Jun. 2013.
  • [15] R. Behnia, D. A. Clark, A. G. Carter, T. R. Clandinin, and C. Desplan, “Processing properties of on and off pathways for drosophila motion detection,” Nature, vol. 512, no. 7515, p. 427, Aug. 2014.
  • [16] Q. Fu, C. Hu, J. Peng, and S. Yue, “Shaping the collision selectivity in a looming sensitive neuron model with parallel on and off pathways and spike frequency adaptation,” Neural Netw., vol. 106, pp. 127–143, Oct. 2018.
  • [17] B. Hu, S. Yue, and Z. Zhang, “A rotational motion perception neural network based on asymmetric spatiotemporal visual information processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 11, pp. 2803–2821, Nov 2016.
  • [18] S. Yue and F. C. Rind, “Redundant neural vision systems-competing for collision recognition roles,” IEEE Trans. Auton. Mental Develop., vol. 5, no. 2, pp. 173–186, Apr. 2013.
  • [19] Z. M. Bagheri, S. D. Wiederman, B. S. Cazzolato, S. Grainger, and D. C. O’Carroll, “Performance of an insect-inspired target tracker in natural conditions,” Bioinspir. & Biomim., vol. 12, no. 2, p. 025006, Feb. 2017.
  • [20] A. L. Stöckl, W. A. Ribi, and E. J. Warrant, “Adaptations for nocturnal and diurnal vision in the hawkmoth lamina,” J. Comp. Neurol., vol. 524, no. 1, p. 160, Jul. 2016.
  • [21] Z. Riveraalvidrez, I. Lin, and C. M. Higgins, “A neuronally based model of contrast gain adaptation in fly motion vision,” Visual Neurosci., vol. 28, no. 5, p. 419, Aug. 2011.
  • [22] N. Lessios, R. L. Rutowski, J. H. Cohen, M. E. Sayre, and N. J. Strausfeld, “Multiple spectral channels in branchiopods. i. vision in dim light and neural correlates,” J. Exp. Biol., pp. jeb–165 860, Apr. 2018.
  • [23] Y.-J. Lee, H. O. Jönsson, and K. Nordström, “Spatio-temporal dynamics of impulse responses to figure motion in optic flow neurons,” PLoS One, vol. 10, no. 5, pp. 1–16, May 2015.
  • [24] J. Li, J. Lindemann, and M. Egelhaaf, “Local motion adaptation enhances the representation of spatial structure at emd arrays,” PLOS Comput. Biol., vol. 13, no. 12, pp. 1–23, Dec. 2017.
  • [25] S. D. Wiederman and D. C. O’Carroll, “Biologically inspired feature detection using cascaded correlations of off and on channels,” J. Artif. Intell. Soft Comput. Res., vol. 3, no. 1, pp. 5–14, Dec. 2013.
  • [26] B. Hassenstein and W. Reichardt, “Systemtheoretische analyse der zeit-, reihenfolgen-und vorzeichenauswertung bei der bewegungsperzeption des rüsselkäfers chlorophanus,” Zeitschrift für Naturforschung B, vol. 11, no. 9-10, pp. 513–524, Oct. 1956.
  • [27] H. Eichner, M. Joesch, B. Schnell, D. F. Reiff, and A. Borst, “Internal structure of the fly elementary motion detector,” Neuron, vol. 70, no. 6, pp. 1155–1164, Jun. 2011.
  • [28] H. Wang, J. Peng, and S. Yue, “An improved lptc neural model for background motion direction estimation,” in Proc. IEEE Conf. ICDL-EpiRob, 2017, pp. 47–52.
  • [29] D. A. Clark, L. Bursztyn, M. A. Horowitz, M. J. Schnitzer, and T. R. Clandinin, “Defining the computational structure of the motion detector in drosophila,” Neuron, vol. 70, no. 6, pp. 1165–1177, Jun. 2011.
  • [30] D. Fortun, P. Bouthemy, and C. Kervrann, “Optical flow modeling and computation: a survey,” Comput. Vis. Image Underst., vol. 134, pp. 1–21, May 2015.
  • [31] H. Yong, D. Meng, W. Zuo, and L. Zhang, “Robust online matrix factorization for dynamic background subtraction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 7, pp. 1726–1740, Jul. 2017.
  • [32] Z. Li, G. Zhao, S. Li, H. Sun, R. Tao, X. Huang, and Y. J. Guo, “Rotation feature extraction for moving targets based on temporal differencing and image edge detection.” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 10, pp. 1512–1516, Oct. 2016.
  • [33] Y. Ren, C.-S. Chua, and Y.-K. Ho, “Motion detection with nonstationary background,” Mach. Vis. Appl., vol. 13, no. 5-6, pp. 332–343, Mar. 2003.
  • [34] C. Gao, D. Meng, Y. Yang, Y. Wang, X. Zhou, and A. G. Hauptmann, “Infrared patch-image model for small target detection in a single image,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 4996–5009, Sep. 2013.
  • [35] Y. Wei, X. You, and H. Li, “Multiscale patch-based contrast measure for small infrared target detection,” Pattern Recognit., vol. 58, pp. 216–226, Oct. 2016.
  • [36] X. Bai and Y. Bi, “Derivative entropy-based contrast measure for infrared small-target detection,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 4, pp. 2452–2466, Apr. 2018.
  • [37] C. Hu, F. Arvin, C. Xiong, and S. Yue, “Bio-inspired embedded vision system for autonomous micro-robots: the lgmd case,” IEEE Trans. Cogn. Develop. Syst., vol. 9, no. 3, pp. 241–254, Sep. 2016.
  • [38] G. Indiveri and R. Douglas, “Neuromorphic vision sensors,” Science, vol. 288, no. 5469, pp. 1189–1190, May 2000.
  • [39] F. C. Rind, R. D. Santer, J. M. Blanchard, and P. F. Verschure, “Locust’s looming detectors for robot sensors,” in Sensors and sensing in biology and engineering.   Springer, 2003, pp. 237–250.
  • [40] W. He and Y. Dong, “Adaptive fuzzy neural network control for a constrained robot using impedance learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 4, pp. 1174–1186, Apr. 2018.
  • [41] D. Połap, M. Woźniak, W. Wei, and R. Damaševičius, “Multi-threaded learning control mechanism for neural networks,” Future Gener. Comput. Syst., vol. 87, pp. 16–34, Oct. 2018.
  • [42] P. Ardin, F. Peng, M. Mangan, K. Lagogiannis, and B. Webb, “Using an insect mushroom body circuit to encode route memory in complex natural environments,” PLoS Comput. Biol., vol. 12, no. 2, pp. 1–12, Feb. 2016.
  • [43] B. Webb and A. Wystrach, “Neural mechanisms of insect navigation,” Curr. Opin. Insect Sci., vol. 15, pp. 27–39, Jun. 2016.
  • [44] E. J. Warrant, “Matched filtering and the ecology of vision in insects,” in The Ecology of Animal Senses.   Springer, Dec. 2016, pp. 143–167.
  • [45] S.-y. Takemura, A. Bharioke, Z. Lu, A. Nern, S. Vitaladevuni, P. K. Rivlin, W. T. Katz, D. J. Olbris, S. M. Plaza, P. Winston et al., “A visual motion detection circuit suggested by drosophila connectomics,” Nature, vol. 500, no. 7461, p. 175, Aug. 2013.
  • [46] R. Behnia and C. Desplan, “Visual circuits in flies: beginning to see the whole picture,” Curr. Opin. Neurobiol., vol. 34, pp. 125–132, Oct. 2015.
  • [47] B. De Vries and J. C. Príncipe, “A theory for neural networks with time delays,” in Proc. NIPS, 1990, pp. 162–168.
  • [48] S. Yamaguchi and M. Heisenberg, “Photoreceptors and neural circuitry underlying phototaxis in insects,” Fly, vol. 5, no. 4, pp. 333–336, Dec. 2011.
  • [49] S. M. Rogers and S. R. Ott, “Differential activation of serotonergic neurons during short-and long-term gregarization of desert locusts,” Proc. Royal Soc. B, vol. 282, no. 1800, p. 20142062, Feb. 2015.
  • [50] W. T. Freeman, E. H. Adelson et al., “The design and use of steerable filters,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 9, pp. 891–906, Sep. 1991.
  • [51] W.-C. Zhang and P.-L. Shui, “Contour-based corner detection via angle difference of principal directions of anisotropic gaussian directional derivatives,” Pattern Recognit., vol. 48, no. 9, pp. 2785–2797, Sep. 2015.
  • [52] A. D. Straw, “Vision egg: an open-source library for realtime visual stimulus generation.” Front. Neuroinf., vol. 2, no. 4, Nov. 2008.
  • [53] A. Mariello and R. Battiti, “Feature selection based on the neighborhood entropy,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 12, pp. 6313–6322, Dec. 2018.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
350893
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description