Detection of Salient Regions in Crowded Scenes
The increasing number of cameras and a handful of human operators to monitor the video inputs from hundreds of cameras leave the system ill equipped to fulfil the task of detecting anomalies. Thus, there is a dire need to automatically detect regions that require immediate attention for a more effective and proactive surveillance. We propose a framework that utilises the temporal variations in the flow field of a crowd scene to automatically detect salient regions, while eliminating the need to have prior knowledge of the scene or training. We deem the flow fields to be a dynamic system and adopt the stability theory of dynamical systems, to determine the motion dynamics within a given area. In the context of this work, salient regions refer to areas with high motion dynamics, where points in a particular region are unstable. Experimental results on public, crowd scenes have shown the effectiveness of the proposed method in detecting salient regions which correspond to unstable flow, occlusions, bottlenecks, entries and exits.
Conventional CCTV monitoring by human operators becomes increasingly demanding as the average number of cameras deployed grows. Research findings have shown that besides fatigue and boredom, human attention tends to decline after 20 minutes. Therefore a high percentage of questionable activities, are often overlooked. This is made even more challenging when monitoring crowded scenes such as the footage of pilgrimage as shown in Fig.1. Anomalous activity or behavior in a crowded scene can be very subtle and imperceptible to a human operator . Thus, an automated detection of suspicious regions is critical to direct the attention of security personnel to areas that require further investigation. Automated saliency detection is useful in numerous applications, such as identifying bottlenecks, which may help in avoiding congestion or evacuation planning.
Most work in saliency detection are focused on detection of salient regions in an image, where saliency originates from visual uniqueness and is often deciphered from image attributes such as colour, gradient and edges . Saliency in image differs from saliency in video sequence and using image attributes alone is not sufficient to infer the motion dynamics of crowded scenes. Boiman et.al in  proposed a graphical inference algorithm to detect irregularities in videos based on the spatio-temporal features of different scales of image patches. While their method works well in detecting irregularities in both image and videos, they are not utilising the benefit of motion information from videos and does not cope with large scale crowd scenes.
Research into the motion dynamics in dense crowds, [1, 11, 8] is limited to learning the coherent motion patterns or dominant crowd flows, where regions with similar motion information are grouped into the same cluster. In contrast, our method ignores dominant flows and instead, is focused on regions with high motion dynamics or unstable, to infer salient regions. The closest work to ours, thus far, is by Loy and et. al. in  where dominant flows are suppressed, while focusing on motion flows that deviate from the norm. However, their method which is based on spectral analysis of motion flows is only reliable when detecting obvious saliency such as crowd instability and counter flow detection. They do not deal with more subtle scenarios of saliency such as bottlenecks and occlusions. In , a set of rules is applied to the eigenvalue map to discover the different motion behaviors such as bottleneck and arch. While their method is able to discriminate the different types of saliency, it is restricted by pre-defined conditions and requires characteristic flows. Our method on the other hand is not restricted by the set of rules, and assumes anomaly when a particular region exhibits high motion dynamics.
This work extend the definition of salient regions to include subtle anomaly which corresponds to bottlenecks and occlusions. In addition, we introduce the simple, yet effective idea of amplifying regions with unstable motion instead of disregarding them as noise. This alludes to the social behaviour of humans in crowds. In a dense crowded scene, the motions of individuals tend to follow the regular or dominant flow of a particular region due to the physical constraints of the environment (i.e. path, junction) and the social conventions of crowd dynamics. We can therefore consider the possibility of irregularities or anomalies occuring in the scene, when the motion dynamics of individuals differ from its close neighbours. In our work, we first magnify and then examine the unstable regions by performing a two stages segmentation process to infer salient regions. Our method does not rely on tracking each object or on prior learning, thus can adapt to the environment over time.
2 Magnification of Unstable Flow
We firstly estimate the velocity field at each point, by employing the dense optical flow algorithm in . The velocity components of each point are accumulated and an average velocity is calculated within an interval of time, comprising frames.
While the mean velocity field may be a good indicator of the global flow of individuals in a crowd, it is unstructured and may change over time. A particle advection process is implemented to keep track of the velocity changes for each point, along its velocity field, .
The suffix, indicates the motion of a particular particle or point, . Assuming that the initial position of is the mean velocity fields, , we deem the dynamic system as an initial value problem. Thus, the pathlines which trace the points from their and positions at time, to their positions, and at time, can be solved using the fourth-order Runge Kutta Scheme (RK4) as in .
We adopted the Jacobian method as in , to measure the separation between particle’s pathlines which are seeded spatially close to a point, , within a time instance, . Assuming that a particle’s position is slightly shifted from at time , to at time , the Jacobian, denoted as , multiplied by the offset, , indicates the coordinate offset at time . This is based on the assumption that the displacement, , is small. The Jacobian of the flow map is computed by the partial derivatives of and , where:
According to the theory of linear stability analysis, the square root of the largest eigenvalue, of indicates the maximum offset or displacement if the particle’s seeding location is shifted by one unit as it satisfies the condition that . In the context of this study, a large eigenvalue indicates that the query point is unstable, and vice versa for a small eigenvalue. Since we are interested in regions that have high motion dynamics, based on the maximum eigenvalue, we can compute the stability of a point within its spatially close neighbouring points using equation:
We propose two stages of segmentation that combine the output of fine and coarse segmentation obtained from the local and global flow segmentation steps, followed by a flow magnification of regions with high motion instability to synthesize the signal, where is the magnification factor and is the segmentation threshold:
3 Experiment: Instability Detection
A set of 4 test sequences which comprises large scale crowd scenes were used for evaluation. The first sequence is obtained from the National Geographic documentary, ’Inside Mecca’, while the second depicts a marathon scene. Synthetic noise was injected into both scenes to simulate instability in the motion of the crowd. A comparison between our work, Loy et al.  and Ali et al.  is performed. It is observed that all three methods are able to detect instability successfully as indicated by the red bounding boxes in Fig. 2. However, our method identified additional regions as salient. After a thorough investigation of the original sequence by 3 operators, we noticed that these regions correspond to areas where there are strong interactions motion dynamics within the crowd. It is worth noting that manual annotation of ground truth salient region due to bottlenecks or turbulence is an open issue because these types of salient regions are considered subjective. In the pilgrimage sequence, we noticed that the additional salient regions detected by our method in fact do correspond to regions where there are strong interactions and motion dynamics. Due to the structure of the scene, or physical constraints of the Kaaba which is situated at the centre of the scene, the crowd tend to slow down their pace during the turning. In addition, the salient region detected near the synthetic instability is caused by the high motion dynamics near the entry and exit point. Thus, we argue that it is unfair to deem these detections as false positive. Instead, we presuppose if the detected regions can aid us in investigating and understanding the non-obvious motion dynamics of a scene.
4 Bottleneck Detection
We further validated the capability of our method by using the original sequences, where no synthetic instability is introduced as shown in Fig. 3(b). The detections of bottleneck has tremendous potential as an indication of impending danger such as stampede taking place due the stop-and-go waves in the crowd motion.
5 Occlusion and Turbulence Detection
We further test the robustness of the proposed method by using other scenarios of large scale crowd; the school of fish and marathon sequence (where there is a lamp post obstructing the flow); the results are as shown in Fig. 4.
We have proposed a framework that detects salient regions by observing the flow activities in a given scene with minimal observations. In addition, the proposed method eliminates the need to track each object individually or prior learning of the scene, which is critical for real-time operation. Experimental results show that the proposed method is not only able to detect salient regions that correspond to clear instability, but bottleneck and occlusion which is often difficult to be noticed by the naked eyes. The promising results obtained are definitely worthy of future investigation since it is able to detect regions that would otherwise go unnoticed by the human operator. The capability of the proposed method in spotting patterns of crowd activities that are subtle play a very important role in triggering real-time alarm to alert of potential danger such as stampedes, failed evacuations and crushes for operational decision making.
This work is supported by the University of Malaya Program Rakan Penyelidikan UM (PRPUM) under Grant CG065-2013.
Mei Kuan Lim and Chee Seng Chan (Center of Image & Signal Proc., Fac. of Comp. Sci. & Info. Tech., Uni. of Malaya, MALAYSIA) Dorothy Monekosso and Paolo Remagnino (Comp. and Info. Sys., Kingston University, UNITED KINGDOM)
-  S. Ali and M. Shah, “A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis,” in CVPR, 2007, pp. 1–6.
-  O. Boiman and M. Irani, “Detecting irregularities in images and in video,” IJCV, vol. 74, no. 1, pp. 17–31, 2007.
-  R. Challenger, C. W. Clegg, and M. A. Robinson, “Understanding crowd behaviours: Supporting evidence,” in Understanding crowd behaviours, M. Leigh, Ed. Crown, 2009, pp. 1–326.
-  G. Haller, “Finding finite-time invariant manifolds in two-dimensional velocity fields,” Chaos, vol. 10, pp. 99–108, 2000.
-  C. A. Kennedy and M. H. Carpenter, “Additive runge-kutta schemes for convection-diffusion-reaction equations,” ANM, vol. 44, pp. 139–181, 2003.
-  C. Liu, W. T. Freeman, E. H. Adelson, and Y. Weiss, “Human-assisted motion annotation,” in CVPR, 2008, pp. 1–8.
-  C. Loy, T. Xiang, and S. Gong, “Salient motion detection in crowded scenes,” in ISCCSP, 2012, pp. 1–4.
-  M. Rodriguez, J. Sivic, I. Laptev, and J.-Y. Audibert, “Data-driven crowd analysis in videos,” ICCV, vol. 0, pp. 1235–1242, 2011.
-  B. Solmaz, B. E. Moore, and M. Shah, “Identifying behaviors in crowd scenes using stability analysis for dynamical systems.” PAMI, vol. 34, no. 10, pp. 2064–2070, 2012.
-  D. Xu and J. An, “Attribute based salient image extrema detection algorithm,” Electronics Letters, vol. 41, no. 1, pp. 13–14, 2005.
-  B. Zhou, X. Tang, and X. Wang, “Coherent filtering: detecting coherent motions from crowd clutters,” in ECCV, vol. 7573, 2012, pp. 857–871.