Local and global gestalt laws: A neurally based spectral approach
A mathematical model of figure-ground articulation is presented, which takes into account both local and global gestalt laws and is compatible with the functional architecture of the primary visual cortex (V1).
The local gestalt law of good continuation is described by means of suitable connectivity kernels, that are derived from Lie group theory, and quantitatively compared with long range connectivity in V1.
Global gestalt constraints are then introduced in terms of spectral analysis of connectivity matrix derived from these kernels. This analysis performs grouping of local features and individuates perceptual units with the highest saliency. Numerical simulations are performed and results are obtained applying the technique to a number of stimuli.
keywords: Mathematical modelling, Quantitative gestalt, Figure-Ground Segmentation, Perceptual Grouping, Neural models, Cortical architecture.
The research leading to these results has received funding from the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme FP7/2007-2013/ under REA grant agreement nÂ°607643
Gestalt laws have been proposed to explain several phenomena of visual perception, such as grouping and figure-ground segmentation ((Wertheimer, 1938; Kohler, 1929; Koflka, 1935) and for a recent review we quote: (Wagemans et al., 2012)). In particular, in order to individuate perceptual units, gestalt theory has introduced local and global laws. Among the local laws we recall the principle of proximity, similarity and good continuation. Particularly the local law of good continuation plays a central role in perceptual grouping (see Figure 1, left).
Regarding global laws, in the construction of percepts the feature of saliency is crucial and at the same time it escapes to easy quantitative modelling. In the Berliner Gestaltheory the concept of saliency denotes the relevance of a form with respect of a contextual frame, the power of an object to be present in the visual field. The role of saliency is pivotal also in figure-ground articulation. Due to the perceptual grouping process the scenes are perceived as constituted by a finite number of figures and the saliency assigns a discrete value to each of them. In particular the most salient configuration pops up from the ground and becomes a figure (Merleau-Ponty, 1945). Note that in case of continuous deformation of the visual stimulus, the salient figures can change abruptly from one percept to a different one (Merleau-Ponty, 1945). This happens for example in Figure 1 where a regular deformation is applied to the Kanizsa square: we progressively perceive a more curved square, until it suddenly disappears and the 4 inducers are perceived as stand alone (see for example (Lee & Nguyen, 2001; Pillow & Nava, 2002; Petitot, 2008)).
A number of results have been provided in order to refine the principles of psychology of form and assess neural correlates of the good continuation law. In particular, Grossberg and Mingolla in (Grossberg & Mingolla, 1985) introduced a “cooperation field” to model illusory contour formation. Similar fields of association and perceptual grouping have been produced by Parent and Zucker in (Parent & Zucker, 1989). In this contest, in the 1990s Kellman and Shipley provided a theory of object perception that specifically adressed perception of partially occluded objects and illusory contours (Kellman & Shipley, 1991; Shipley & Kellman, 1992, 1994). Heitger and von der Heydt (Von Der Heydt et al., 1993) provided a theory of figural completion which can be applied to both illusory contour figures (as the Kanizsa triangle) and real images. In the same years Field, Hayes and Hess (Field et al., 1993) introduced through psychophysical experiments the notion of association fields, describing the Gestalt principle of good continuation. They studied how the perceptual unit visualized in Figure 2 (b) pops up from a stimulus of Gabor patches (see Figure 2 (a)). Through a series of similar experiments, they constructed an association field, that defines the pattern of position-orientation elements of stimuli that can be associated to the same perceptual unit (see Figure 2 (c)).
Starting from the classical results of Hubel and Wiesel (Hubel & Wiesel, 1977) it has been possible to justify on neurophysiological bases these perceptual phenomena. The results of (Bosking et al., 1997) and (Fregnac & Shulz, 1999) confirmed that neurons sensitive to similar orientation are preferentially connected. This suggests that the rules of proximity and good continuation are implemented in the horizontal connectivity of low level visual cortices. A stochastic model which takes into account the structure of the cortex, with position an orientation features, was proposed by Mumford (Mumford, 1994), and further exploited by Williams and Jacobs in (Williams & Jacobs, 1997) and (August & Zucker, 2000). They modelled the analogous of the association fields with Fokker-Planck equations, taking into account different geometric features, such as orientation or curvature. Petitot and Tondut (Petitot & Tondut, 1999) introduced a model of the functional architecture of V1, compatible with the association field. Citti and Sarti in (Citti & Sarti, 2006) proposed the model of the functional architecture as a Lie groups, showing the relation between geometric integral curves, association fields, and cortical properties. This method has been implemented in (Sanguinetti et al., 2008) and (Boscain et al., 2012). Exact solution of the Fokker-Planck equation has been provided by Duits and van Almsick (Duits & van Almsick, 2008), and their results have been applied by Duits and Franken (Duits & Franken, 2009) to image processing.
The described local laws are insufficient to explain the constitution of a percept, since a perceived form is characterized by a global consistency. Different authors qualitatively defined this consistency as pregnancy or global saliency (Merleau-Ponty, 1945), but only a few quantitative models have been proposed (Koch & Ullman, 1985). In particular spectral approach for image processing were proposed by (Perona & Freeman, 1998; Shi & Malik, 2000; Weiss, 1999; Coifman & Lafon, 2006). In (Sarti & Citti, 2015) it is shown how this spectral mechanism is implemented in the neural morphodynamics, in terms of symmetry breaking of mean field neural equations. In that sense, (Sarti & Citti, 2015) can be considered as an extension of (Bressloff et al., 2002).
In this paper we further develop the approach introduced in (Sarti & Citti, 2015) and describe an algorithm for the individuation of perceptual units, using both local and global constraints: local constraints are modelled by suitable connectivity kernels, which represent neural connections, and the global percepts are computed by means of spectral analysis. The model is described in the geometric setting of a Lie group equipped with a Sub-Riemannian metric introduced in (Petitot & Tondut, 1999; Citti & Sarti, 2006; Sarti et al., 2008). Despite the apparent mathematical difficulty, it helps to clarify in a rigorous way the gestalt law of good continuation.
Here we introduce various substantial differences from the techniques in literature. While studying the local properties of the model, we focus on the properties of the connectivity kernels. The Fokker Planck and the Laplacian kernel in the motion group are already largely used for the description of the connectivity, since they qualitatively fit the experimental data (Sarti & Citti, 2015). Here we perform a quantitative fitting between the computed kernels and the experimental ones, in order to validate the model. Moreover we propose to use also the Subelliptic Laplacian kernel, in order to account for the variability of connectivity patterns. Secondly we accomplish grouping with a spectral analysis inspired from the work of (Sarti & Citti, 2015), who proved the neurophysiological plausibility of this process. In the experiments we manipulate the stimuli to demonstrate the relation between the pop up of the figure and the eigenvalue analysis. We will analyze in particular the swap between one solution and the other while smoothly changing the stimulus in many grouping experiments. Finally we enrich the model, exploiting the role of the polarity feature, which allows to work with two competing kernels.
The plan of the paper is the following. The Section 2 is divided in two parts, in the first we describe local constraints and in the second the global ones. We will first recall the neurogeometry of the visual cortex and see how the cortical connectivity is represented by the fundamental solution of Fokker Planck, Sub-Riemannian Laplacian and isotropic Laplacian equations. We propose a method for the individuation of perceptual units, first recalling the notions of spectral analysis of connectivity matrices, obtained by the connectivity kernels. We will see how eigenvectors of this matrix represent perceptual units in the image. In Section 3 we present numerical approximations of the kernels and we will compare kernels with neurophysiological data of horizontal connectivity (Angelucci et al., 2002; Bosking et al., 1997). We also perform a quantitative validation of the kernel considering the experiment of (Gilbert et al., 1996), showing the link between the connectivity kernel and cell’s response. Finally in Section 4 we present the results of simulations using the implemented connectivity kernels. We will identify perceptual units in different Kanizsa figures, highlighting the role of polarity, discussing and comparing the behavior of the different kernels.
2 The mathematical model
In this section we identify a possible neural basis of local Gestalt laws in the functional architecture of the primary visual cortex, that is the first cortical structure that underlies the processing of the visual stimulus. We do not claim here that the process of grouping has to be attributed exclusively to V1, since several cortical areas are involved in segmentation of a figure. However neural evidence ensures that it takes place already in V1 (see (Lee & Nguyen, 2001; Pillow & Nava, 2002)). Hence we focus on this area where the first elaboration is made and it is particular important for all the geometrical aspects of the process.
2.1 Local constraints - The neurogeometry of V1
In the 70s Hubel and Wiesel discovered that this cortical area is organized in the so called hypercolumnar structure (see (Hubel & Wiesel, 1962, 1977)). This means that for each retinal point there is an entire set of cells each one sensitive to a specific orientation of the stimulus.
The first geometrical models of this structure are due to Hoffman (Hoffman, 1989), Koenderink (Koenderink & van Doorn, 1987), Williams and Jacobs (Williams & Jacobs, 1997) and Zucker (Zucker, 2006). They described the cortical space as a fiber bundle, where the retinal plane is the basis, while the fiber concides with the hypercolumnar variable . More recently Petitot and Tondut (Petitot & Tondut, 1999), Citti, Sarti (Citti & Sarti, 2006) and Sarti, Citti, Petitot (Sarti et al., 2008), proposed to describe this structure as a Lie group with a Sub-Riemannian metric (see also the results of (Duits & Franken, 2009)). This expresses the fact that each filter can be recovered from a fixed one by translation of the point and rotation of an angle . In particular the visual cortex can be described as the subset of points of . Every simple cell is characterized by its receptive field, classically defined as the domain of the retina to which the neuron is sensitive. The shape of the response of the cell in presence of a visual input is called receptive profile (RP) and can be reconstructed by electrophysiological recordings (Ringach, 2002). In particular simple cells of V1 are sensitive to orientation and are strongly oriented. Hence their RPs are interpreted as Gabor patches (Daugman, 1985; Jones & Palmer, 1987). Precisely they are constituted by two coupled families of cells: an even and an odd-symmetric one.
Via the retinotopy, the retinal plane can be identified with the 2-dimensional plane . A visual stimulus at the retinal point activates the whole hypercolumnar structure over that point. All cells fire, but the cell with the same orientation of the stimulus is maximally activated, giving rise to orientation selectivity.
Formally curves and edges are lifted to new cortical curves, identified by the variables , where is the direction of the boundary at the point . In (Citti & Sarti, 2006) it has been shown that these curves are always tangent to the planes generated by the vector fields. These curves have been modelled by (Citti & Sarti, 2006) as integral curves of suitable vector fields in the cortical structure. Precisely, the vector fields they considered are:
All lifted curves are integral curves of these two vector fields such that a curve in the cortical space is:
It has been noted in (Citti & Sarti, 2006) that these curves, projected on the 2D cortical plane are a good model of the association fields.
2.2 A model of cortical connectivity
From the neurophysiological point of view, there is experimental evidence of the existence of connectivity between simple cells belonging to different hypercolumns. It is the so called long range horizontal connectivity. Combining optical imaging of intrinsic signals with small injections of biocytin in the cortex, Bosking et al. in (Bosking et al., 1997) led to clarify properties of horizontal connections on V1 of the tree shrew. The propagation of the tracer is strongly directional and the direction of propagation coincides with the preferential direction of the activated cells. Hence connectivity can be summarized as preferentially linking neurons with co-circularly aligned receptive fields.
The propagation along the connectivity can be modeled as the stochastic counter part of the deterministic curves defined in Eq.(2.2) for the description of the output of simple cells. If we assume a deterministic component in direction (which describes the long range connectivity) and stochastic component along (the direction of intracolumnar connectivity), the equation can be written as follows:
where is a normally distributed variable with zero mean and variance equal to . The probability density of this process, denoted by , was first used by Williams and Jacobs (Williams & Jacobs, 1997) to compute stochastic completion field, by August and Zucker (August & Zucker, 2000, 2003) to define the curve indicator random field, and more recently by R. Duits and Franken in (Duits & van Almsick, 2008; Duits & Franken, 2009) to perform contour completion, de-noising and contour enhancement. The kernel obtained integrating in time the density
is the fundamental solution of the Fokker Planck operator
The kernel is strongly biased in direction and not symmetric; a new symmetric kernel can be obtained as following:
In Figure 3 (a) it is visualized an isosurface of the simmetrized kernel , showing its typical twisted butterfly shape. The kernel has been proposed in (Sanguinetti et al., 2008) as a model of the statistical distribution of edge co-occurrence in natural images, as described in (Sanguinetti et al., 2008). The similarity between the two is proved both at a qualitative and at a quantitative level (see (Sanguinetti et al., 2008)) (see also Figure 3 (a) and (b)).
If we assume that intracolumnar and long range connections have comparable strength, the stochastic equation Eq.(2.3) reduces to:
where are normally distributed variables with zero mean and variance equal to . In this case the speed of propagation in directions and is comparable. The associated probability density is the fundamental solution of the Sub-Riemannian Heat equation (Jerison & Sanchez-Calle, 1986). The integral in time of this probability density
is the fundamental solution of the Sub-Riemannian Laplacian (SRL):
It is a symmetric kernel, so that we do not need to symmetrize it and we use it as a model of the connectivity kernel:
In Figure 3 (c) it is shown an isosurface of the connectivity kernel .
We will see in Section 3.2 that a combination of Fokker-Planck and Sub-Riemannian Laplacian fits the connectivity map measured by Bosking in (Bosking et al., 1997), where the Fokker-Planck fundamental solution represents well the long distances of the trajectory, while the Sub-Riemannian Laplacian the short ones. Combination of different Fokker-Planck fundamental solutions can also be used to model the functional architecture of primates experimentally measured by Angelucci in (Angelucci et al., 2002).
While validating the model, we will see that a standard Riemannian kernel does not provide the same accurate results. In order to show this we will introduce an isotropic version of the previous model which is a standard Riemannian kernel. To constuct it, we complete the family of vector fields in Eq.(2.1) with an orthonormal one:
choosing stochastic propagation in any direction, in such a way that equation Eq.(2.3) becomes:
where are normally distributed variables with zero mean and variance equal to . Its probability density will be denoted and the associated time independent kernel
will be the fundamental solution of the standard Laplacian operator: One of its level sets is represented in Figure 3 (d).
In section 3.1 we will describe a numerical technique to construct the 3 kernels described above.
2.3 Global integration
Since the beginning of the last century perception has been considered by gestaltist as a global process. Moreover, following Koch-Ullman and Merleau-Ponty, visual perception is a process of the visual field, that individuates figure and background at the same time (Koch & Ullman, 1985; Merleau-Ponty, 1945). Then it continues in segmentation of the structures by succeeding differentiations.
A cortical mechanism responsible for this analysis has been outlined by (Sarti & Citti, 2015), starting from the classical mean field equation of Ermentrout and Cowan (Ermentrout & Cowan, 1980) and Bressloff and Cowan (Bressloff et al., 2002; Bressloff & Cowan, 2003). This equation describes the evolution of the cortical activity and depends on the connectivity kernels. The discrete output of the simple cells, selects in the cortical space the set of active cells and the cortical connectivity, restricted on this set, defines a neural affinity matrix. The eigenvectors of this matrix describe the stationary states of the mean field equation hence the emergent perceptual units. The system will tend to the eigenvector associated to the highest eigenvalue, which corresponds to the most important object in that scene. Mathematically the approach is strongly linked to spectral analysis techniques for locality-preserving embeddings of large data sets (Coifman & Lafon, 2006; Belkin & Niyogi, 2003; Roweis & Saul, 2000), for data segregation and partitioning (Perona & Freeman, 1998; Meila & Shi, 2001; Shi & Malik, 2000), grouping process in real images (Weiss, 1999).
2.4 The cortical activity equation
We have seen that in presence of a visual stimulus cells aligned to its boundary give the maximal response. We will assume that a discrete number of cells are maximally activated and we will denote them for . In Figure 9 (b) we show as an example the cells responding to a Kanizsa figure, represented with their Gabor-like receptive profiles. Following (Sarti & Citti, 2015) the cortical connectivity is restricted to this discrete set and reduces to a matrix :
In this discrete setting the mean field equation for the cortical activity reduces to:
where is a sigmoidal function and is a physiological parameter. The solution tends to its stationary states, which are the eigenvectors of the associated linearized equation:
Hence these are the emergent states of the cortical activity, that individuate the coherent perceptual unit in the scene and allow to segment it. This is why we will assign to the eigenvalues of the affinity matrix the meaning of a saliency index of the objects. Since we have defined three different kernels different affinity matrices will be defined. However all kernels are real and symmetric, so that the matrix is a real symmetric matrix = . Their eigenvalues are real and the highest eigenvalue is defined. The associated principal eigenvectors emerge as symmetry breaking of the stationary solutions of mean fields equations and they pop up abruptly as emergent solutions. The first eigenvalue will correspond to the most salient object in the image.
2.5 Individuation of perceptual units
Since the three different kernels assign different role to different direction of connectivity, the different affinity matrices and their spectrum will reflect these different behavior. Consequently the resulting data set partitioning will be stronger in the straight direction using the Fokker Planck kernel, or will allow rotation using the kernel (see also (Cocci et al., 2015) for a deeper analysis). Using the kernel we expect an equal grouping capability in the collinear direction and in the ladder direction.
In Figure 4 we visualize the affinity matrix of the image presented in Figure 9 (a). It is a square matrix with dimensions NxN, where is exactly the number of active patches. It represents the affinity of each patch with respect to all the others. The structure of the affinity matrix is composed by blocks and the principal ones corresponds to coherent objects. On the right we visualize the complete set of eigenvalues in a graph (eigenvalue number, eigenvalues). Let us explicitly note that the first eigenvector will have the meaning of emergent perceptual unit. The other eigenvectors do not describe an ordered sequence of figures with different rank. However, their presence is important, above all when two eigenvalue have similar values. In this case, small deformation of the stimulus can induce a change in the order of the eigenvalues and produce a sudden emergence of the correspondent eigenvector with an abroupt change in the perceived image.
This is in good agreement with the perceptual characteristics of salient figures of temporal and spatial discontinuity, since they pop up abruptly from the background, while the background is perceived as indifferentiated (Merleau-Ponty, 1945). Spectral approaches give reason to the discontinuous character of figure-ground articulation better than continuous models, who instead introduce a graduality in the perception of figure and background (Lorenceau & Alais, 2001).
To find the remainig objects in the image, the process is then repeated on the vector space orthogonal to , the second and the following eigenvectors can be found, until the associated eigenvalue is sufficiently small. In this way only eigenvectors are selected, with , this procedure reduces the dimensionality of the description. This procedure neurally reinterprets the process introduced by Perona and Freeman in (Perona & Freeman, 1998).
3 Quantitative kernel validations
3.1 Numerical approximations of the kernels
In this section we numerically approximate the connectivity kernels , defined in Section 2.
We obtain the discrete fundamental solution of Eq.(2.4) by developing random paths from the numerical solution of the system (2.3), that can be approximated by:
where is the number of steps of the random path and is a generator of numbers taken from a normal distribution with mean 0 and variance . In that way, the kernel is numerically estimated with Markov Chain Monte Carlo methods (MCMC) (Robert & Casella, 2013). Various realizations of the stochastic path will be given solving this finite difference equation times; the estimated kernel is obtained averaging their passages over discrete volume elements, as described in detail in (Higham, 2001; Sarti & Citti, 2015). Proceeding with the same methodology the numerical evaluation of fundamental solution of the hypoelliptic Laplacian (Eq.(2.7)) is obtained and the system (2.6) discretized:
where and is the variance in the direction. The kernel represented in Figure 3 (c) is obtained by the numerical integration of that system and averaging as before the resulting paths.
Finally, the system (2.10), that is a model for isotropic diffusion equation (Eq. 2.11), is approximated by:
where , are the variances in the , , directions. In order to obtain the approximation of the kernel , visualized in Figure 3 (d), the system is integrated with the same technique used before.
These kernels will be used to construct the affinity matrices in Eq.(2.12).
3.2 Stochastic paths and cortical connectivity
We will now study in which extent kernels , are models of connectivity. The kernel is used for comparison and to show that an uniform Euclidean kernel does not capture the anysotropic structure of the cortex. Random paths that we compute through MCM are implemented in the functional architectures in terms of horizontal connectivity of a single cell. On the other hand the connectivity of an entire population of cells corresponds to the set of all single cells connectivities, then to the Fokker Planck fundamental solution.
A first qualitative comparison between the kernels , and the connectivity pattern has been provided in (Sarti & Citti, 2015). Here we follow the same framework, but we propose a more accurate, quantitative comparison.
It is well known that the 3D cortical structure is implemented in the 2D cortical layer as a pinwheel structure, which codes for position and orientations (see Figure 5 (b)). The pinwheel structure has a large variability from one subject to one other, but within each species common statistical properties have been obtained. Cortico-cortical connectivity has been measured by Bosking in (Bosking et al., 1997) by injecting a tracer in a simple cell and recording the trajectory of the tracer. In Figure 5 (a) the propagation through the lateral connections is represented by black points. Bosking found a large variability of injections, which however have common stochastic properties as the direction of propagation, a patchy structure with small blobs at approximately fixed distance and the decay of the density of tracer along the injection site.
We model each injection with stochastic paths solutions of Eq. (2.3). Then we evaluate the stochastic paths on the pinwheel structure.
Due to the stochastic nature of the problem, we do not compare pointwise the image of the tracer and the stochastic paths but we average them on the pinwheels. We partition both the images of the tracer and of the stochastic paths in regions corresponding to the pinwheels:
and for every we compute the density of tracer and the density of the stochastic paths . The two vectors and are then rescaled in such a way to have unitary -norm and the mean square error is computed:
The free parameters of the model are the value of the standard deviation, the number of paths, the number of steps, appearing in Eq.(2.3) and in the system (3.1). The best fit between the experimental and simulated distributions has been accomplished by minimizing the mean square error by varying these parameters.
Due to the different role of the directions and in the definition of these kernels, the Sub-Riemannian Laplacian paths and the Fokker Planck paths have different structure.
The Subriemannian Laplacian allows diffusion in direction , favors the changement of the angle and it can be used to describe short range connectivity as described in Section 4.4. Hence it is responsible for the central blob, in a neighborhood of the injection points (see Figure 5 (c)). The Fokker Planck kernel produces an elogated, patchy structure and seems responsible for the long range connection (see Figure 5 (d)). We apply our quantitative fit only to the long range connectivity, hence discarding the tracer in a neighborhood of the injection. For this reason the Sub-Riemannian Laplacian is not involved in the validation of the model.
The method is first applied to fit the image of the tracer taken by Bosking (Bosking et al., 1997) (see Figure 5 (a)). All the kernels are evaluated on the pinwheels provided in the same paper (see figure 5 (b)), to obtain a patchy structure. In order to apply the formula (3.4), we cover both the image of the tracer and the Fokker Planck with a regular distribution of rectangles, with edges equal to the mean distances between pinwheels (see Figure 5 (c),(d)) (clearly we do not cover the central zone, where we can not fit the Fokker Planck kernel). The resulting error value is , showing that the model accurately represents the experimental distribution.
A similar procedure has been applied to the image of the tracer provided in (Angelucci et al., 2002) (see Figure 5 (e)). The result of Angelucci is obtained with various injections in a neighborhood of a pinwheel, so that all orientations are present, and the tracer propagates in all directions. In this case we do not have natural pinwheels, hence we use artificial pinwheels, obtained with the algorithm presented in (Barbieri, 2012) (see Figure 5 (f)), with the constraint that the mean distance between the artificial pinwheels is equal to the mean distance between the blobs produced by the tracer. Here we consider Fokker Planck paths with all directions, to obtain the apparent isotropic diffusion. Also in this case we cover with rectangles and perform a best fit and the minimum error value is , (see figure 5 (g), (h).
In his paper (Bosking et al., 1997) Bosking showed a famous image, with the tracer superimposed to the piwheel structure (see Figure 5 (i)). In particular in this case we have the tracer and the pinwheel of the same animal. This allows to go below the scale of the pinwheel and we correctly recover the orientation with the pinwheel (see Figure 5 (j)). The estimated kernel is again a combination of Fokker Plank. As before, we focus on orientations, hence we only model the long range part of the image, discarding the central blob. The evaluation of the error is made with squared regions at a scale smaller that the pinwheel and the error goes below .
3.3 Perceptual facilitation and density kernels
In order to obtain a stable and deterministic estimate of this stochastic model, we used the density kernel, which is a regular deterministic function, coding the main properties of the process. We perform here a quantitative validation of these regular kernel comparing to an experiment of (Gilbert et al., 1996).
This work studies the capability of cells to integrate information out of the single receptive field of the cells. This integration process is due to the long-range horizontal connections, hence it can be used to validate our model of long range connectivity. As we have recognized in the previous section it is the Fokker Planck kernel which can be considered as a model for long range connectivity, hence we use here this kernel.
In Figure 6 (left) it is shown the results of (Gilbert et al., 1996), where it is visualized the cell’s response to randomly placed and oriented lines in a black histogram. A vertical line is present in the receptive field of a cell selective to this orientation and the intensity of its response is represented in the first column of the histograms. If the stimulus is surrounded by random elements aligned with the first one, the cell’s response increases (respectively in the second, third and the last column of the histograms). When the other random elements are not aligned with the fixed one (as in the fifth, sixth, seventh columns), the cell’s response decreases because it reflects an inhibitory effect.
On the right in the blue histogram we evaluate the probability density modelled by the kernel in Eq.(2.5) in presence of the same configuration of elements. The same trend is obtained considering the probability density distribution, as visualized in Figure 6 (right). In order to consider the inhibitory effect we evaluate the kernel with 0 mean. A quantitative analysis of the differences between them have been evaluated considering the mean square error between the two normalized histograms. The error of underlines how this connectivity kernel well represents neural connections.
4 Emergence of percepts
In the following experiments some numerical simulations will be performed in order to test the reliability of the method for
performing grouping and detection of perceptual units in an image.
The kernel considered here only depends on orientation. Hence it can be applied
to detect the saliency of geometrical figures
which can be very well described using this feature.
The purpose is to select the perceptual units in these images, using the following algorithm:
1. Define the affinity matrix from the connectivity kernel.
2. Solve the eigenvalue problem = , where the order of is such that is decreasing.
3. Find and project on the segments the eigenvector associated to the largest eigenvalue.
The parameters used are: 1000000 random paths with in the system (3.1), , in the system (3.2), , in the system (3.3). The value of is defined as follows: , where is the maximum distance between the inducers of the stimulus. Similar parameters have been used for all the experiments.
4.1 The Field, Hayes and Hess experiment
In this section we consider some experiments similar to the ones of Field, Hayes and Hess (Field et al., 1993), where a subset of elements organized in a coherent way is presented out of a ground formed by a random distribution of elements. A first stimulus of this type is represented in Figure 7 (first row). The connectivity among these elements is defined as in equations (2.4) and (2.7).
After the affinity matrix and its eigenvalues, the eigenvector corresponding to the highest eigenvalue is visualized in red. The results show that the stimulus is well segmented with the fundamental solutions of Fokker Planck and Sub-Riemannian Laplacian equations (Figure 7 (b)).
Second row. In red the first eigenvector of the affinity matrix considering images containing paths in which the orientation of successive elements differs by 15 (c), 30 (d), 45 (e), 60 (f) and 90 (g) degrees.
Third and fourth rows. Examples (h) with two units in the scene, where a change in the agle leads to a change in the order of the eigenvalues (i),(j),(k).
Now we consider a similar experiment proposed in (Field et al., 1993), where the orientation of successive elements differs by 15, 30, 45, 60 and 90 degrees and the ability of the observer
to detect the path was measured experimentally. It was proved
that the path can be identified when the successive elements differ by 60 deg or less. With our method, we obtain similar results: if the angle between successive elements is less than 60 degree (Figure 7 (c), (d),(e)),
the identification of the unit is correctly performed. With an angle equal to 60 degree (Figure 7 (f)) only a part of the curve is correctly detected: this can be interpreted as the increasing observer’s difficulty to detect the path. Finally with higher angles (Figure 7 (g)) the first eigenvector of the affinity matrix corresponds to random inducers, confirming the results.
Finally we present an example where there are two units in the scene with roughly-equal salience, they have roughly-equal eigenvalues. In the third and in the fourth row of Figure 7 the stimuli are composed by a curve and a line in a background of random elements. In the stimulus (h) represented in the third row, the elements composing the curve are perfectly aligned and very nearby, so that this has the highest saliency and it represents the eigenvector associated to the first eigenvalue (as shown in red in Figure 7 (i)). The second eigenvalue in this case is sligtly smaller. After the computation of the first eigenvector, the stimulus is updated (Figure 7 (j)), the first eigenvector of the new affinity matrix is computed and it corresponds to the inducers of the line (Figure 7 (k)).
In the fourth row we slightly modify the stimuli, in particular the alignement of the element forming the curve (e.g. an angle of pi/18).
As a consequence, the line becomes the most salient perceptual unit and the first eigenvector
(Figure 7 (i), fourth row). The stimulus is updated (Figure 7 (j), fourth row) and the first eigenvector of the new affinity matrix corresponds to the inducers of the curve (Figure 7 (k), fourth row).
It is notable that in this case a small changement of the eigenvalues corresponds to small changement of the eigenvectors,
but the first eigenvalue swaps with the second one and consequently we obtain an abrupt change in the perceved object.
4.2 The role of polarity
The term of polarity leads to insert in the model the feature of contrast: contours with the same orientation but opposite contrast are referred to opposite angles. For this reason we assume that the orientation takes values in ) when we consider the odd filters and in ) while studying the even ones.
In the second row: a cartoon image (a), the first eigenvector of the affinity matrix without polarity (b), its representation with polarity dependent Gabor patches (b) and the corresponding first eigenvector (d).
The response of the odd filters in presence of a cartoon image is schematically represented in Figure 8. At every boundary point the maximally activated cell is the one with the same direction of the boundary. Then the maximally firing cells are aligned with the boundary (Figure 8, top right).
In order to clarify the role of polarity we consider an image in Figure 8 (a), that has been studied by (Kanizsa, 1980), in the contest of a study of convexity in perception. In this case, if we consider only orientation of the boundaries without polarity, we completely loose any contrast information and we obtain the grouping in figure 8 (b). Here the upper edge of the square is grouped as an unique perceptual unit. On the other side, while inserting polarity, the Gabor patches on the upper edge boundary of the black or white region have opposite contrast and detect values of which differs of (see Figure 8 (c)). In this way, there is no affinity between these patches, and the first eigenvector of the affinity matrix represented in red correctly detects the unit present in the image and corresponds to the inducers of the semicircle (see Figure 8 (d)). This underlines the important role of polarity in perceptual individuation and segmentation. We also note that the fist perceptual unit detected is the convex one, as predicted by the gestalt law (see (Kanizsa, 1980)).
4.3 The Kanizsa illusory figures
We consider here stimuli formed by Kanizsa figures, represented by oriented segments that simulate the output of simple cells. In (Lee & Nguyen, 2001) it is described that completion of Kanizsa figures takes place in V1.
The results of simulations with the fundamental solutions of Fokker Planck and Sub-Riemannian Laplacian equations are shown in Figure 9. The first eigenvector is visualized in red and it corresponds to the inducers of the Kanizsa triangle (Figure 9 (c)). In this example, after the computation of the first eigenvector of the affinity matrix, this matrix is updated removing the identified perceptual unit and then the first eigenvector of the new matrix is computed (Figure 9 (d))): these simulations show that circles are associated to the less salient eigenvectors. In that way, the first eigenvalue can be considered as a quantitative measure of saliency, because it allows to segment the most important object in the scene and the results of simulations confirm the visual grouping.
When the affinity matrix is formed by different eigenvectors with almost the same eigenvalues, as in Figure 9 (d), it is not possibile to recognize a most salient object, because they all have the same influence.
We choose here to show just one inducer in red. The other two have the same eigenvalue. That also happens, for example, when the inducers are not co-circularly aligned or they are rotated.
Now we consider as stimulus the Kanizsa square and then we change the angle between the inducers, so that the subjective contours become curved (Figure 10 (a), (b), (c), (d), first row). The fact that illusory figures are perceived depends on a limit curvature. Indeed we perceive a square in the first three cases, but not in the last one. The results of simulations with the fundamental solutions of Fokker Planck and Sub-Riemannian Laplacian equations confirm the visual grouping (Figure 10 (a), (b), (c), (d), second row): when the angle between the inducers is not too high (cases (a), (b), (c)) the first eigenvector corresponds to the inducers that form the square, otherwise (case (d)) the pacman becomes the most salient objects in the image. In this case, we obtain 4 eigenvectors with almost the same eigenvalue.
Now we consider a Kanizsa bar (Figure 10 (e), first row), that is perceived only if the inducers are aligned. Also in that case, the result of simulation confirms the visual perception if we use the fundamental solutions of the Fokker-Planck and the Sub-Riemannian Laplacian equations. When the inducers are not aligned, all the kernels confirm the visual perception, showing two different perceptual units (Figure 10 (f)).
Considering a stimulus composed of rotated or not-aligned inducers, as in Figure 10 (g), (h) it is not possible to perceive it and the results of simulations, using all the connectivity kernels described, confirm the visual grouping. In that case, the affinity matrix is decomposed in 3 eigenvectors with almost the same eigenvalues, which represent the 3 perceptual units in the scene.
4.4 Sub-Riemannian Fokker Planck versus Sub-Riemannian Laplacian
We have outlined in Section 2.2 and 3.2 that the Fokker Planck kernel accounts for long range connectivity, while Sub-Riemannian Laplacian for short range. In the previous examples we obtain good results with both kernels, but this difference emerges while we suitable change the parameters. In Figure 11 we compare the action of these two kernels.
In the first row we see some segments, which form an unique perceptual unit. If they are not too far, the grouping is correctly performed both by the Fokker Planck and the Sub-Riemannian Laplacian (Figure 11 (a),(b)). When we separate the inducers, the perceptual unit is correctly detected by the Fokker Planck kernel (Figure 11 (c)), while the Sub-Riemannian Laplacian is not able to perform the grouping (Figure 11 (d)). This confirms that the Fokker Planck kernel is responsible for long range connectivity. In the second row we consider an angle. When the angle is sufficiently big, the Fokker Planck becomes unable to perform the grouping (Figure 11 (e)), while the Sub-Riemannian Laplacian, correctly performs the grouping of the perceptual unit (Figure 11 (f)). This confirms that the Sub-Riemannian Laplacian can be used as a model of short range connectivity.
4.5 Sub-Riemannian versus Riemannian kernels
In order to further validate the Sub-Riemannian model we show that the model applied with the isotropic Laplacian kernel does not perform correctly. As shown in Figure 12 (first row) the visual perception is not correctly modeled: the first eigenvectors coincide with one of the inducers and the squares are not recognized. That also happens for the stimulus of Figure 9 (a) and when the inducers are not co-circularly aligned or they are rotated.
In this work we have presented a neurally based model for figure-ground segmentation using spectral methods, where segmentation has been performed by computing eigenvectors of affinity matrices.
Different connectivity kernels that are compatible with the functional architecture of the primary visual cortex have been used. We have modelled them as fundamental solution of Fokker-Planck, Sub-Riemannian Laplacian and isotropic Laplacian equations and compared their properties.
With this model we have identified perceptual units of different Kanizsa figures, showing that this can be considered a good quantitative model for the constitution of perceptual units equipped by their saliency. We have also shown that the fundamental solutions of Fokker-Planck and Sub-Riemannian Laplacian equations are good models for the good continuation law, while the isotropic Laplacian equation is less representative for this gestalt law. However it retrieves information about ladder parallelism, a feature that can be analysed in the future. All the three kernels are able to accomplish boundary completion with a preference for the operators Fokker Planck and the Sub-Riemannian Laplacian.
The proposed mathematical model is then able to integrate local and global gestalt laws as a process implemented in the functional architecture of the visual cortex. The kernel considered here only depends on orientation. Hence it can be applied to detect the saliency of geometrical figures which can be very well described using this feature. The same method can be applied to natural images if their main features are related to orientations, as in retinal images (see (Favali, 2015)). The ideas presented here could be extended to more general kernels able to detect geometrical features different from orientation and we are confident that there is a relation between the highest eigenvector and the salient object. However for general images we can not rely on this simple geometric method, since different cortical areas can be involved in the definition of the saliency, with a modulatory effect on the connectivity of V1.
- Angelucci et al. (2002) Angelucci, A., Levitt, J. B., Walton, E. J. S., Hupe, J. M., Bullier, J., & Lund, J.S.(2002). Circuits for Local and Global Signal Integration in Primary Visual Cortex. The Journal of Neuroscience, 22(19):86338646.
- August & Zucker (2000) August, J., & Zucker S. W. (2000). The curve indicator random field: Curve organization via edge correlation. In Perceptual organization for artificial vision systems, 265–288. Springer.
- August & Zucker (2003) August, J. & Zucker, S. W. (2003). Sketches with curvature: the curve indicator random field and Markov processes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(4): 387–400.
- Barbieri (2012) Barbieri, D., Citti, G., Sanguinetti, G., & Sarti, A. (2012). An uncertainty principle underlying the functional architecture of V1. Journal of Physiology 106(5-6):183-193.
- Belkin & Niyogi (2003) Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396.
- Boscain et al. (2012) Boscain, U., Duplaix, J., Gauthier, J. P., & Rossi, F. (2012). Anthropomorphic image reconstruction via hypoelliptic diffusion. SIAM Journal on Control and Optimization, 50(3):1309–1336.
- Bosking et al. (1997) Bosking, W., Zhang, Y., Schofield, B. & Fitzpatrick D. (1997). Orientation selectivity and the arrangement of horizontal connections in tree shrew striate cortex. The Journal of neuroscience, 17(6):2112–2127.
- Bressloff et al. (2002) Bressloff, P. C., Cowan, J. D., Golubitsky, M., Thomas, P. J., & Wiener M. C. (2002). What Geometric Visual Hallucinations Tell Us about the Visual Cortex. Neural Computation, 14(3): 473–491.
- Bressloff & Cowan (2003) Bressloff, P.C., & Cowan, J. D. (2003). The functional geometry of local and horizontal connections in a model of V1. Journal of Physiology-Paris, 221–236.
- Citti & Sarti (2006) Citti, G., & Sarti, A. (2006). A cortical based model of perceptual completion in the roto-translation space. Journal of Mathematical Imaging and Vision, 24(3):307–326.
- Cocci et al. (2015) Cocci, G., Barbieri, D., Citti, G., & Sarti, A. (2015). Cortical spatio-temporal dimensionality reduction dor visual grouping. Neural computation.
- Coifman & Lafon (2006) Coifman, R. R., & Lafon, S. (2006). Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30.
- Daugman (1985) Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. JOSA A, 2(7): 1160–1169.
- Duits & Franken (2009) Duits, R.,& Franken E. M. (2009). Line Enhancement and Completion via Linear Left Invariant Scale Spaces on SE(2). In Scale Space and Variational Methods in Computer Vision, 795 – 807, Springer.
- Duits & van Almsick (2008) Duits, R.,& van Almsick, M. (2008). The explicit solutions of linear left-invariant second order stochastic evolution equations on the 2D-Euclidean motion group. Quarterly of Applied Mathematics, 66(1): 27 – 67.
- Ermentrout & Cowan (1980) Ermentrout,G.B.,& Cowan J.D.(1980). Large scale spatially organized activity in neural nets. SIAM: SIAM Journal on Applied Mathematics, 38(1):1–21.
- Favali (2015) Favali, M., Abbasi-Sureshjani, S., ter Haar Romeny, B.: & Sarti, A. (2015). Analysis of Vessel Connectivities in Retinal Images by Cortically Inspired Spectral Clustering. in press, Journal of Mathematical Imaging and Vision.
- Field et al. (1993) Field, D., Hayes, A., & Hess, R. F. (1993). Contour integration by the human visual system: evidence for a local association field. Vision Research, 33(2):173 – 193.
- Fregnac et al. (2010) Fregnac, Y., Carelli, P., Pananceau, M., & Monier, C. (2010). Stimulus-driven Coordination of Cortical Cell Assemblies and Propagation of Gestalt Belief in V1. Dynamic Coordination in the Brain: From Neurons to Mind, 169–192.
- Fregnac & Shulz (1999) Fregnac, Y., & Shulz, D. E. (1999). Activity-dependent regulation of receptive field properties of cat area 17 by supervised hebbian learning. Journal of neurobiology,41(1): 69–82.
- Gilbert et al. (1996) Gilbert, C. D., Das, A., Ito, M., Kapadia, M., & Westheimer, G. (1996). Spatial integration and cortical dynamics. Proceedings of the National Academy of Sciences, 93(2): 615–622.
- Grossberg & Mingolla (1985) Grossberg, S. & Mingolla, E. (1985). Neural dynamics of form perception: boundary completion, illusory figures, and neon color spreading. Psychological review, 92(2):173.
- Higham (2001) Higham, D. J. (2001). An algorithmic introduction to numerical simulation of stochastic differential equations. SIAM review, 43(3):525–546.
- Hoffman (1989) Hoffman, W. C. (1989). The visual cortex is a contact bundle. Applied Mathematics and Computation, 32(2):137–167.
- Hubel & Wiesel (1962) Hubel, D. H., & Wiesel, T. N. (1962). Receptive fiedls, binocular interaction and functional architecture in the cat’s visual cortex. The journal of physiology, 160(1):106.
- Hubel & Wiesel (1977) Hubel, D. H., & Wiesel, T. N. (1977). Ferrier lecture: Functional architecture of macaque monkey visual cortex. Proceedings of the Royal Society of London B: Biological Sciences,, 198(1130):1–59.
- Jerison & Sanchez-Calle (1986) Jerison, D. S., & Sanchez-Calle, A. (1986). ”Estimates for the heat kernel for a sum of squares of vector fields.” Indiana University mathematics journal, 35.4:835–854.
- Jones & Palmer (1987) Jones, J. P., & Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of neurophysiology, 58(6):1233–1258.
- Kanizsa (1980) Kanizsa G. (1980). Grammatica del vedere Il Mulino, Bologna.
- Kellman & Shipley (1991) Kellman, P. J., & Shipley, T. F. (1991). A theory of visual interpolation in object perception. Cognitive psychology, 23(2):141–221.
- Koch & Ullman (1985) Koch, C.,& Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4:219â-227.
- Koenderink & van Doorn (1987) Koenderink, J. J., & van Doorn, A. J. (1987). Representation of local geometry in the visual system. Biological cybernetics, 55(6):367–375.
- Koflka (1935) Koflka, K. (1935). Principles of Gestalt Psychology. New York: Har.
- Kohler (1929) Kohler, W. (1929). Gestalt Psychology. New York: Liveright.
- Lee & Nguyen (2001) Lee, T. S., & Nguyen M. Dynamics of subjective contour formation in the early visual cortex. Proceedings of the National Academy of Sciences (2001).
- Lorenceau & Alais (2001) Lorenceau, J., & Alais, D. (2001). Form constraints in motion binding. Nature neuroscience, 4(7):745–751.
- Meila & Shi (2001) Meila, M., & Shi, J. (2001). A random walks view of spectral segmentation.. 8th International Workshop on Articial Intelligence and Statistics.
- Merleau-Ponty (1945) Merleau-Ponty, M. (1945). Translated in Phenomenology of perception (1996). Motilal Banarsidass Publishe.
- Mumford (1994) Mumford, D. (1994). Elastica and computer vision. Algebraic geometry and its applications, 491–506, Springer.
- Parent & Zucker (1989) Parent, P., & Zucker, S. W. (1989). Trace inference, curvature consistency, and curve detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 11(8):823–839.
- Perona & Freeman (1998) Perona, P., & Freeman, W. (1998). A factorization approach to grouping. Computer Vision—ECCV’98, 655–670, Springer.
- Petitot (2003) Petitot, J. (2003). The neurogeometry of pinwheels as a sub-Riemannian contact structure. Journal of Physiology-Paris, 97(2):265–309.
- Petitot (2008) Petitot, J. (2008). NeurogÃ©omÃ©trie de la vision. Modeles mathÃ©matiques et physiques des architectures fonctionelles. Paris: Ãd. Ãcole Polytech (2008).
- Petitot & Tondut (1999) Petitot, J., & Tondut, Y. (1999). Vers une neurogeometrie. Fibrations corticales, structures de contact et contours subjectifs modaux. Mathematiques informatique et sciences humaines, (145):5–102.
- Pillow & Nava (2002) Pillow, J., & Nava, R. (2002). Perceptual completion across the vertical meridian and the role of early visual cortex. Neuron, 33.5: 805-813.
- Ringach (2002) Ringach, D. (2002). Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. Journal of neurophysiology, 88(1):455–463.
- Robert & Casella (2013) Robert, C., & Casella, G. (2013). Monte Carlo statistical methods. Springer Science & Business Media..
- Roweis & Saul (2000) Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326.
- Sanguinetti et al. (2008) Sanguinetti, G., Citti, G., & Sarti, A. (2008). Image completion using a diffusion driven mean curvature flowin a Sub-Riemannian Space. Int. Conf. on Computer Vision Theory and Applications (VISAPP 2008),46–53.
- Sarti & Citti (2011) Sarti, A., & Citti, G. (2011). On the origin and nature of neurogeometry. La Nuova Critica.
- Sarti & Citti (2015) Sarti, A., & Citti, G. (2015). The constitution of visual perceptual units in the functional architecture of V1. Journal of computational neuroscience, 38(2):285–300.
- Sarti et al. (2008) Sarti, A., Citti, G., & Petitot, J. (2008). The symplectic structure of the primary visual cortex. Biological Cybernetics, 98(1):33–48.
- Sarti & Piotrowski (2015) Sarti, A., & Piotrowski, D. (2015). Individuation and semiogenesis: An interplay between geometric harmonics and structural morphodynamics. Morphogenesis and Individuation, 49–73, Springer.
- Shi & Malik (2000) Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8):888–905.
- Shipley & Kellman (1992) Shipley, T. F., & Kellman, P. J. (1992). Perception of partly occluded objects and illusory figures: Evidence for an identity hypothesis. Journal of Experimental Psychology: Human Perception and Performance, 18(1):106.
- Shipley & Kellman (1994) Shipley, T. F., & Kellman, P. J. (1994). Spatiotemporal boundary formation: Boundary, form, and motion perception from transformations of surface elements. Journal of Experimental Psychology: General, 123(1):3.
- Von Der Heydt et al. (1993) Von Der Heydt, R., Heitger, F., & Peterhans, E. (1993). Perception of occluding contours: Neural mechanisms and a computational model. Biomedical research, 14:1–6.
- Wagemans et al. (2012) Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R.(2012). Century of Gestalt Psychology in Visual Perception: I. Perceptual grouping and Figure-Ground Organization. Psychological bulletin, 138(6):1172.
- Weiss (1999) Weiss, Y. (1999). Segmentation using eigenvectors: a unifying view. Computer vision, 1999. The proceedings of the seventh IEEE international conference on, (2):975–982.
- Wertheimer (1938) Wertheimer, M. (1938). Laws of organization in perceptual forms. London: Harcourt (Brace and Jovanovich).
- Williams & Jacobs (1997) Williams, L. R., & Jacobs, D. W. (1997). Stochastic completion fields. Neural computation, 9(4):837–858.
- Zucker (2006) Zucker, S. (2006) Differential geometry from the Frenet point of view: boundary detection, stereo, texture and color. Handbook of Mathematical Models in Computer Vision, 357–373, Springer.