Normative theory of visual receptive fields
Abstract
This article gives an overview of a normative computational theory of visual receptive fields. It is described how idealized functional models of early spatial, spatiochromatic and spatiotemporal receptive fields can be derived in an axiomatic way based on structural properties of the environment in combination with assumptions about the internal structure of a vision system to guarantee consistent handling of image representations over multiple spatial and temporal scales. Interestingly, this theory leads to predictions about visual receptive field shapes with qualitatively very good similarity to biological receptive fields measured in the retina, the LGN and the primary visual cortex (V1) of mammals.
Keywords—Receptive field, Functional model, Gaussian derivative, Scale covariance, Affine covariance, Galilean covariance, Temporal causality, Illumination invariance, Retina, LGN, Primary visual cortex, Simple cell, Doubleopponent cell, Vision.
ptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptpt
Normative theory of visual receptive fields
Tony Lindeberg Computational Brain Science Lab, Department of Computational Science and Technology, KTH Royal Institute of Technology, SE100 44 Stockholm, Sweden. Email: tony@kth.se
I Introduction
When light reaches a visual sensor such as the retina, the information necessary to infer properties about the surrounding world is not contained in the measurement of image intensity at a single point, but from the relations between intensity values at different points. A main reason for this is that the incoming light constitutes an indirect source of information depending on the interaction between geometric and material properties of objects in the surrounding world and on external illumination sources. Another fundamental reason why cues to the surrounding world need to be collected over regions in the visual field as opposed to at single image points is that the measurement process by itself requires the accumulation of energy over noninfinitesimal support regions over space and time. Such a region in the visual field for which a neuron responds to visual stimuli is traditionally referred to as a receptive field (Hubel and Wiesel [1, 2, 3]) (see Figure 1). In this work, we focus on a functional description of receptive fields, regarding how a neuron with a purely spatial receptive field responds to visual stimuli over image space, and regarding how a neuron with a spatiotemporal receptive field responds to visual stimuli over space and time (DeAngelis et al. [4, 5]).
If one considers the theoretical and computational problem of designing a vision system that is going to make use of incoming reflected light to infer properties of the surrounding world, one may ask what types of image operations should be performed on the image data. Would any type of image operation be reasonable? Specifically regarding the notion of receptive fields one may ask what types of receptive field profiles would be reasonable. Is it possible to derive a theoretical model of how receptive fields “ought to” respond to visual data?
Initially, such a problem might be regarded as intractable unless the question can be further specified. It is, however, possible to address this problem systematically using approaches that have been developed in the area of computer vision known as scalespace theory (Iijima [6]; Witkin [7]; Koenderink [8]; Koenderink and van Doorn [9, 10]; Lindeberg [11, 12, 13, 14]; Florack [15]; Sporring et al. [16]; Weickert et al. [17]; ter Haar Romeny [18]). A paradigm that has been developed in this field is to impose structural constraints on the first stages of visual processing that reflect symmetry properties of the environment. Interestingly, it turns out to be possible to substantially reduce the class of permissible image operations from such arguments.
The subject of this article is to describe how structural requirements on the first stages of visual processing as formulated in scalespace theory can be used for deriving idealized functional models of visual receptive fields and implications of how these theoretical results can be used when modelling biological vision. A main theoretical argument is that idealized functional models for linear receptive fields can be derived by necessity given a small set of symmetry requirements that reflect properties of the world that one may naturally require an idealized vision system to be adapted to. In this respect, the treatment bears similarities to approaches in theoretical physics, where symmetry properties are often used as main arguments in the formulation of physical theories of the world. The treatment that will follow will be general in the sense that spatial, spatiochromatic and spatiotemporal receptive fields are encompassed by the same unified theory.
This paper gives a condensed summary of a more general theoretical framework for receptive fields derived and presented in [13, 19, 20, 21] and in turn developed to enable a consistent handling of receptive field responses in terms of provable covariance or invariance properties under natural image transformations (see Figure 2). In relation to the early publications on this topic [13, 19, 20], this paper presents an improved version of that theory leading to an improved model for the temporal smoothing operation for the specific case of a timecausal image domain [21], where the future cannot be accessed and the receptive fields have to be solely based on information from the present moment and a compact buffer of the past. Specifically, this paper presents the improved axiomatic structure on a compact form more easy to access compared to the original publications and also encompassing the better timecausal model.
It will be shown that the presented framework leads to predictions of receptive field profiles in good agreement with receptive measurements reported in the literature (Hubel and Wiesel [1, 2, 3]; DeAngelis et al. [4, 5]; Conway and Livingstone [22]; Johnson et al. [23]). Specifically, explicit phenomenological models will be given of LGN neurons and simple cells in V1 and will be compared to related models in terms of Gabor functions (Marčelja [24]; Jones and Palmer [25, 26]; Ringach [27, 28]), differences of Gaussians (Rodieck [29]) and Gaussian derivatives (Koenderink and van Doorn [9]; Young [30]; Young et al. [31, 32]). Notably, the evolution properties of the receptive field profiles in this model can be described by diffusion equations and are therefore suitable for implementation on a biological architecture, since the computations can be expressed in terms of communications between neighbouring computational units, where either a single computational unit or a group of computational units may be interpreted as corresponding to a neuron or a group of neurons. Specifically, computational models involving diffusion equations arise in mean field theory for approximating the computations that are performed by populations of neurons (Omurtag et al. [33]; Mattia and Guidic [34]; Faugeras et al. [35]).
Ia Structure of this article
This paper is organized as follows: Section II gives an overview of and motivation to the assumptions that the theory is based on. A set of structural requirements is formulated to capture the effect of natural image transformations onto the illumination field that reaches the retina and to guarantee internal consistency between image representations that are computed from receptive field responses over multiple spatial and temporal scales.
Section III describes linear receptive families that arise as consequences of these assumptions for the cases of either a purely spatial domain or a joint spatiotemporal domain. The issue of how to perform relative normalization between receptive field responses over multiple spatial and temporal scales is treated, so as to enable comparisons between receptive field responses at different spatial and temporal scales. We also show how the influence of illumination transformations and exposure control mechanisms on the receptive field responses can be handled, by describing invariance properties obtained by applying the derived linear receptive fields over a logarithmically transformed intensity domain.
Section IV shows examples of how spatial, spatiochromatic and spatiotemporal receptive fields in the retina, the LGN and the primary visual cortex can be well modelled by the derived receptive field families.
Section V gives relations to previous work, including conceptual and theoretical comparisons to previous use of Gabor models of receptive fields, approaches for learning receptive fields from image data and previous applications of a logarithmic transformation of the image intensities. Finally, Section VI summarizes some of the main results.
Ii Assumptions underlying the theory: Structural requirements
In the following, we shall describe a set of structural requirements that can be stated concerning: (i) spatial geometry, (ii) spatiotemporal geometry, (iii) the image measurement process with its close relationship to the notion of scale, (iv) internal representations of image data that are to be computed by a general purpose vision system and (v) the parameterization of image intensity with regard to the influence of illumination variations.
For modelling the image formation process, we will at any point on the retina approximate the spherical retina by a perspective projection onto the tangent plane of the retinal surface at that image point, below represented as the image plane. Additionally, we will approximate the possibly nonlinear geometric transformations regarding spatial and spatiotemporal geometry by local linearizations at every image point, and corresponding to the derivative of the possibly nonlinear transformation. In these ways, the theoretical analysis can be substantially simplified, while still enabling accurate modelling of essential functional properties of receptive fields in relation to the effects of natural image transformations as arising from interactions with the environment.
Iia Static image data over a spatial domain
In the following, we will describe a theoretical model for the computational function of applying visual receptive fields to local image patterns.
For timeindependent data over a twodimensional spatial image domain, we would like to define a family of image representations over a possibly multidimensional scale parameter , where the internal image representations are computed by applying some parameterized family of image operators to the image data :
(1) 
Specifically, we will assume that the family of image operators should satisfy:
IIA0a Linearity
For the earliest processing stages to make as few irreversible decisions as possible, we assume that they should be linear
(2) 
Specifically, linearity implies that any particular scalespace properties (to be detailed below) that we derive for the zeroorder image representation will transfer to any spatial derivative of , so that
(3) 
In this sense, the assumption of linearity reflects the requirement of a lack of bias to particular types of image structures, with the underlying aim that the processing performed in the first processing stages should be generic, to be used as input for a large variety of visual tasks. By the assumption of linearity, local image structures that are captured by e.g. first or secondorder derivatives will be treated in a structurally similar manner, which would not necessarily be the case if the first local neighbourhood processing stage of the first layer of receptive fields would instead be genuinely nonlinear.^{1}^{1}1Note, however, that the assumption about linearity of some first layers of receptive fields does, however, not exclude the possibility of defining later stage nonlinear receptive fields that operate on the output from the linear receptive fields, such as the computations performed by complex cells in the primary visual cortex. Neither does this assumption of linearity exclude the possibility of transforming the raw image intensities by a pointwise nonlinear mapping function prior to the application of linear receptive fields based on processing over local neighbourhoods. In Section IIID it will be specifically shown that a pointwise logarithmic transformation of the image intensities prior to the application of linear receptive fields has theoretical advantages in terms of invariance properties of derivativebased receptive field responses under local multiplicative illumination transformations.
This genericity property is closely related to the basic property of the mammalian vision system, that the computations performed in the retina, the LGN and the primary visual cortex provide general purpose output that is used as input to higherlevel visual areas.
IIA0b Shift invariance
To ensure that the visual interpretation of an object should be the same irrespective of its position in the image plane, we assume that the first processing stages should be shift invariant, so that if an object is moved a distance in the image plane, the receptive field response should remain on a similar form while shifted with the same distance. Formally, this requirement can be stated that the family of image operators should commute with the shift operator defined by :
(4) 
In other words, if we shift the input by a translation and then apply the receptive field operator , the result should be similar as applying the receptive field operator to the original input and then shifting the result.
IIA0c Convolution structure
Together, the assumptions about linearity and shiftinvariance imply that will correspond to a convolution operator [36]. This implies that the representation can be computed from the image data by convolution with some parameterized family of convolution kernels :
(5) 
IIA0d Semigroup structure over spatial scales
To ensure that the transformation from any finer scale to any coarser scale should be of the same form for any (a requirement of algebraic closedness), we assume that the result of convolving two kernels and from the family with each other should be a kernel within the same family of kernels and with added parameter values :
(6) 
This assumption specifically implies that the representation at a coarse scale can be computed from the representation at a finer scale by a convolution operation of the same form (IIA0c) as the transformation from the original image data while using the difference in scale levels as the parameter
(7) 
This property does in turn imply that if we are able to derive specific properties of the family of transformations (to be detailed below), then these properties will not only hold for the transformation from the original image data to the representations at coarser scales, but also between any pair of scale levels , with the aim that image representations at coarser scales should be possible to regard as simplifications of corresponding image representations at finer scales.
In terms of mathematical concepts, this form of algebraic structure is referred to as a semigroup structure over spatial scales
(8) 
IIA0e Scale covariance under spatial scaling transformations
If a visual observer looks at the same object from different distances, we would like the internal representations derived from the receptive field responses to be sufficiently similar, so that the object can be recognized as the same object while appearing with a different size on the retina. Specifically, it is thereby natural to require that the receptive field responses should be of a similar form while resized in the image plane.
This corresponds to a requirement of spatial scale covariance under uniform scaling transformations of the spatial domain :
(9) 
to hold for some transformation of the scale parameter .
IIA0f Affine covariance under affine transformations
If a visual observer looks at the same local surface patch from two different viewing directions, then the local surface patch may be deformed in different ways onto the different views and with different amounts of perspective foreshortening from the different viewing directions. If we approximate the local deformations caused by the perspective mapping by local affine transformations, then the transformation between the two differently deformed views of the local surface patch can in turn be described by a composed local affine transformation . If we are to use receptive field responses as a basis for higher level visual operations, it is natural to require that the receptive field response of an affine deformed image patch should remain on a similar form while being reshaped by a corresponding affine transformation.
This corresponds to a requirement of affine covariance under general affine transformations :
(10) 
to hold for some transformation of the scale parameter.
IIA0g Noncreation of new structure with increasing scale
If we apply the family of transformations for computing representations at coarser scales from representations at finer scales according to (IIA) and (IIA0d), there could be a potential risk that the family of transformations could amplify spurious structures in the input to produce macroscopic amplifications in the representations at coarser scales that do not directly correspond to simplifications of corresponding structures in the original image data. To prevent such undesirable phenomena from occurring, we require that local spurious structures must not be amplified and express this condition in terms of the evolution properties over scales at local maxima and minima in the image intensities as smoothed by the family of convolution kernels : If a point for some scale is a local maximum point in the image plane, then the value at this maximum point must not increase to coarser scales . Similarly, if a point is a local minimum point in the image plane, then the value at this minimum point must not decrease to coarser scales .
Formally, this requirement that new structures should not be created from finer to coarser scales, can be formalized into the requirement of nonenhancement of local extrema, which implies that if at some scale a point is a local maximum (minimum) for the mapping from to , then (see Figure 3):

at any spatial maximum,

at any spatial minimum.
This condition implies a strong condition on the class of possible smoothing kernels .
IiB Timedependent image data over spacetime
To model the computational function of spatiotemporal receptive fields in timedependent image patterns, we do for a timedependent spatiotemporal domain first inherit the structural requirements regarding a spatial domain and complement the spatial scale parameter by a temporal scale parameter . In addition, we assume:
IIB0a Scale covariance under temporal scaling transformations
If a similar type of spatiotemporal event occurs at different speeds, faster or slower, it is natural to require that the receptive field responses should be of a similar form, while occurring correspondingly faster or slower.
This corresponds to a requirement of temporal scale covariance under a temporal scaling transformation of the temporal domain :
(11) 
to hold for some transformation of the spatiotemporal scale parameters .
IIB0b Galilean covariance under Galilean transformations
If an observer looks at the same object in the world for different relative motions between the object and the observer, it is natural to require that the internal representations of the object should be sufficiently similar so as to enable a coherent perception of the object under different relative motions relative to the observer. Specifically, we may require that the receptive field responses under relative motions should remain on the same form while being transformed in a corresponding way as the relative motion pattern.
If we at any point in spacetime locally linearize the possibly nonlinear motion pattern by a local Galilean transformation over spacetime
(12) 
then the requirement of guaranteeing a consistent visual interpretation under different relative motions between the object and the observer can be stated as a requirement of Galilean covariance:
(13) 
to hold for some transformation of the spatiotemporal scale parameters .
IIB0c Semigroup structure over temporal scales in the case of a noncausal temporal domain
To ensure that the representations between different spatiotemporal scale levels and should be sufficiently wellbehaved internally, we will make use of different types of assumptions depending on whether the temporal domain is regarded as timecausal or noncausal. Over a timecausal temporal domain, the future cannot be accessed, which is the basic condition for realtime visual perception by a biological organism. Over a noncausal temporal domain, the temporal kernels may extend to the relative future in relation to any prerecorded time moment, which is sometimes used as a conceptual simplification when analysing prerecorded timedependent data although not at all realistic in a realworld setting.
For the case of a noncausal temporal domain, we make use of a similar type of semigroup property (IIA0d) as formulated over a purely spatial domain, while extending the semigroup property over both the spatial scale parameter and the temporal scale parameter :
(14) 
In analogy with the case of a purely spatial domain, this requirement guarantees that the transformation from any finer spatiotemporal scale level to any coarser spatiotemporal scale level will always be of the same form (algebraic closedness)
(15) 
Specifically, this assumption implies that if we are able to establish desirable properties of the family of transformations (to be detailed below), then these relations hold between any pair of spatiotemporal scale levels and with .
IIB0d Cascade structure over temporal scales in the case of a timecausal temporal domain
Since it can be shown that the assumption of a semigroup structure over temporal scales leads to undesirable temporal dynamics in terms of e.g. longer temporal delays for a timecausal temporal domain [37, Appendix A], we do for a timecausal temporal domain instead assume a weaker cascade smoothing property over temporal scales for the temporal smoothing kernel over temporal scales
(16) 
where the temporal kernels should for any triplets of temporal scale values and temporal delays , and obey the transitive property
(17) 
This weaker assumption of a cascade smoothing property (IIB0d) still ensures that a representation at a coarser temporal scale should with a corresponding requirement of an accompanying simplifying condition on the family of kernels (to be detailed below) constitute a simplification of the representation at a finer temporal scale , while not implying as hard constraints as a semigroup structure.
IIB0e Nonenhancement of local spacetime extrema in the case of a noncausal temporal domain
In the case of a noncausal temporal domain, we again build on the notion of nonenhancement of local extrema to guarantee that the representations at coarser spatiotemporal scales should constitute true simplifications of corresponding representations at finer scales Over a spatiotemporal domain, we do, however, state the requirement in terms of local extrema over joint spacetime instead of over local extrema over image space. If a point for some scale is a local maximum point over spacetime, then the value at this maximum point must not increase to coarser scales . Similarly, if a point is a local minimum point over spacetime, then the value at this minimum point must not decrease to coarser scales .
Formally, this requirement of noncreation of new structure from finer to coarser spatiotemporal scales, can be stated as follows: If at some scale a point is a local maximum (minimum) for the mapping from to , then

at any spatiotemporal maximum

at any spatiotemporal minimum
should hold in any positive spatiotemporal direction defined from any nonnegative linear combinations of and . This condition implies a strong condition on the class of possible smoothing kernels .
IIB0f Noncreation of new local extrema or zerocrossings for a purely temporal signal in the case of a noncausal temporal domain
In the case of a timecausal temporal domain, we do instead state a requirement for purely temporal signals, based on the cascade smoothing property (IIB0d). We require that for a purely temporal signal , the transformation from a finer temporal scale to a coarser temporal scale must not increase the number of local extrema or the number of zerocrossings in the signal.
Iii Idealized receptive field families
Iiia Spatial image domain
Based on the above assumptions in Section IIA, it can be shown [13] that when complemented with certain regularity assumptions in terms of Sobolev norms, they imply that a spatial scalespace representation as determined by these must satisfy a diffusion equation of the form
(18) 
for some positive semidefinite covariance matrix and some translation vector . In terms of convolution kernels, this corresponds to Gaussian kernels of the form
(19) 
which for a given and a given satisfy (IIIA). If we additionally require these kernels to be mirror symmetric through the origin, then we obtain affine Gaussian kernels
(20) 
Their spatial derivatives constitute a canonical family for expressing receptive fields over a spatial domain that can be summarized on the form
(21) 
Incorporating the fact that spatial derivatives of these kernels are also compatible with the assumptions underlying this theory, this does specifically for the case of a twodimensional spatial image domain lead to spatial receptive fields that can be compactly summarized on the form
(22) 
where

denote the spatial coordinates,

denotes the spatial scale,

denotes a spatial covariance matrix determining the shape of a spatial affine Gaussian kernel,

and denote orders of spatial differentiation,

, denote spatial directional derivative operators in two orthogonal directions and aligned with the eigenvectors of the covariance matrix ,

is an affine Gaussian kernel with its size determined by the spatial scale parameter and its shape by the spatial covariance matrix .
Figure 5 and Figure 5 show examples of spatial receptive fields from this family up to second order of spatial differentiation. Figure 5 shows partial derivatives of the Gaussian kernel for the specific case when the covariance matrix is restricted to a unit matrix and the Gaussian kernel thereby becomes rotationally symmetric. The resulting family of receptive fields is closed under scaling transformations over the spatial domain, implying that if an object is seen from different distances to the observer, then it will always be possible to find a transformation of the scale parameter between the two image domains so that the receptive field responses computed from the two image domains can be matched. Figure 5 shows examples of affine Gaussian receptive fields for covariance matrices that do not correspond to rescaled copies of the unit matrix. The resulting full family of affine Gaussian derivative kernels is closed under general affine transformations, implying that for two different perspective views of a local smooth surface patch, it will always be possible to find a transformation of the covariance matrices between the two domains so that the receptive field responses can be matched, if the transformation between the two image domains is approximated by a local affine transformation.
In the most idealized version of the theory, one should think of receptive fields for all combinations of filter parameters as being present at every image point, as illustrated in Figure 6 concerning affine Gaussian receptive fields over different orientations in image space and different eccentricities.
IiiB Spatiotemporal image domain
Over a noncausal spatiotemporal domain, corresponding arguments as in Section IIIA lead to a similar form of diffusion equation as in Equation (IIIA), while expressed over the joint spacetime domain and with interpreted as a local drift velocity. After splitting the composed affine Gaussian spatiotemporal smoothing kernel corresponding to (IIIA) while expressed over the joint spacetime domain into separate smoothing operations over space and time, this leads to zeroorder spatiotemporal receptive fields of the form [13, 19]:
(23) 
After combining that result with the results from corresponding theoretical analysis for a timecausal spatiotemporal domain in [13, 21], the resulting spatiotemporal derivative kernels constituting the spatiotemporal extension of the spatial receptive field model (IIIA) can be reparametrised and summarized on the following form (see [13, 19, 20, 21]):
(24) 
where

denote the spatial coordinates,

denotes time,

denotes the spatial scale,

denotes the temporal scale,

denotes a local image velocity,

denotes a spatial covariance matrix determining the shape of a spatial affine Gaussian kernel,

and denote orders of spatial differentiation,

denotes the order of temporal differentiation,

and denote spatial directional derivative operators in two orthogonal directions and aligned with the eigenvectors of the covariance matrix ,

is a velocityadapted temporal derivative operator aligned to the direction of the local image velocity ,

is an affine Gaussian kernel with its size determined by the spatial scale parameter and its shape determined by the spatial covariance matrix ,

denotes a spatial affine Gaussian kernel that moves with image velocity in spacetime and

is a temporal smoothing kernel over time corresponding to a Gaussian kernel in the case of noncausal time or a cascade of firstorder integrators or equivalently truncated exponential kernels coupled in cascade according to (26) over a timecausal temporal domain.
This family of spatiotemporal scalespace kernels can be seen as a canonical family of linear receptive fields over a spatiotemporal domain.
For the case of a timecausal temporal domain, the result states that truncated exponential kernels of the form
(25) 
coupled in cascade constitute the natural temporal smoothing kernels. These do in turn lead to a composed temporal convolution kernel of the form
(26) 
and corresponding to a set of firstorder integrators coupled in cascade (see Figure 7).
Two natural ways of distributing the discrete time constants over temporal scales are studied in detail in [21, 37] corresponding to either a uniform or a logarithmic distribution in terms of the composed variance
(27) 
Specifically, it is shown in [21] that in the case of a logarithmic distribution of the discrete temporal scale levels, it is possible to consider an infinite number of temporal scale levels that cluster infinitely dense near zero temporal scale
(28) 
so that a scaleinvariant timecausal limit kernel can be defined obeying selfsimilarity and scale covariance over temporal scales and with a Fourier transform of the form
(29) 
Figure 9 and Figure 9 show spatiotemporal kernels over a 1+1dimensional spatiotemporal domain using approximations of the timecausal limit kernel for temporal smoothing over the temporal domain and the Gaussian kernel for spatial smoothing over the spatial domain. Figure 9 shows spacetime separable receptive fields corresponding to image velocity , whereas Figure 9 shows unseparable velocityadapted receptive fields corresponding to a nonzero image velocity .
The family of spacetime separable receptive fields for zero image velocities is closed under spatial scaling transformations for arbitrary spatial scaling factors as well as for temporal scaling transformations with temporal scaling factors that are integer powers of the distribution parameter of the timecausal limit kernel. The full family of velocityadapted receptive fields for general nonzero image velocities is additionally closed under Galilean transformations, corresponding to variations in the relative motion between the objects in the world and the observer. Given that the full families of receptive fields are explicitly represented in the vision system, this means that it will be possible to perfectly match receptive field responses computed under the following types of natural image transformations: (i) objects of different size in the image domain as arising from e.g. viewing the same object from different distances, (ii) spatiotemporal events that occur with different speed, faster or slower, and (iii) objects and spatiotemporal that are viewed with different relative motions between the objects/event and the visual observer.
If additionally the spatial smoothing is performed over the full family of spatial covariance matrices , then receptive field responses can also be matched (iv) between different views of the same smooth local surface patch.
IiiC Scale normalisation of spatial and spatiotemporal receptive fields
When computing receptive field responses over multiple spatial and temporal scales, there is an issue about how the receptive field responses should be normalized so as to enable appropriate comparisons between receptive field responses at different scales. Issues of scale normalisation of the derivative based receptive fields defined from scalespace operations are treated in [40, 41, 42] regarding spatial receptive fields and in [21, 37, 43] regarding spatiotemporal receptive fields.
IIIC0a Scalenormalized spatial receptive fields
Let and denote the eigenvalues of the composed affine covariance matrix in the spatial receptive field model (IIIA) and let and denote directional derivative operators along the corresponding eigendirections. Then, the scalenormalized spatial derivative kernel corresponding to the receptive field model (IIIA) is given by
(30) 
where denotes the spatial scale normalization parameter of normalized derivatives and specifically the choice leads to maximum scale invariance in the sense that the magnitude response of the spatial receptive field will be covariant under uniform spatial scaling transformations , provided that the spatial scale levels are appropriately matched .
IIIC0b Scalenormalized spatial receptive fields in the case of a noncausal spatiotemporal domain
For the case of a noncausal spatiotemporal domain, where the temporal smoothing operation in the spatiotemporal receptive field model is performed by a noncausal Gaussian temporal kernel , the scalenormalized spatiotemporal derivative kernel corresponding to the spatiotemporal receptive field model (IIIB) is with corresponding notation regarding the spatial domain as in (30) given by
(31) 
where and denote the spatial and temporal scale normalization parameters of normalized derivatives and specifically the choice and leads to maximum scale invariance in the sense that the magnitude response of the spatiotemporal receptive field will be invariant under independent scaling transformations of the spatial and the temporal domains , provided that both the spatial and temporal scale levels are appropriately matched .
IIIC0c Scalenormalized spatial receptive fields in the case of a timecausal spatiotemporal domain
For the case of a timecausal spatiotemporal domain, where the temporal smoothing operation in the spatiotemporal receptive field model is performed by truncated exponential kernels coupled in cascade (26), the corresponding scalenormalized spatiotemporal derivative kernel corresponding to the spatiotemporal receptive field model (IIIB) is given by
(32) 
where and denote the spatial and and temporal scale normalization parameters of normalized derivatives and is the temporal scale normalization factor, which for the case of variancebased normalization is given by
(33) 
in agreement with (IIIC0b) while for the case of normalization it is given by [21, Equation (76)]
(34) 
with denoting the norm of the th order scalenormalized derivative of a noncausal Gaussian temporal kernel with scale normalization parameter . In the specific case when the temporal smoothing is performed using the scaleinvariant limit kernel (29), the magnitude response will for the maximally scale invariant choice of scale normalization parameters and be invariant under independent scaling transformations of the spatial and the temporal domains for general spatial scaling factors and for temporal scaling factors that are integer powers of the distribution parameter of the scaleinvariant limit kernel, provided that both the spatial and temporal scale levels are appropriately matched .
IiiD Invariance to local multiplicative illumination variations or variations in exposure parameters
The treatment so far has been concerned with modelling receptive fields under natural geometric image transformations, modelled as local scaling transformations, local affine transformations and local Galilean transformations representing the essential dimensions in the variability of a local linearization of the perspective mapping from a local surface patch in the world to the tangent plane of the retina. A complementary issue concerns how to model receptive field responses under variations in the external illumination and under variations in the internal exposure mechanisms of the eye that adapt the diameter of the pupil and the sensitivity of the photoreceptors to the external illumination. In this section, we will present a solution for this problem regarding the subset of intensity transformations that can be modelled as local multiplicative intensity transformations.
To obtain theoretically wellfounded handling of image data under illumination variations, it is natural to represent the image data on a logarithmic luminosity scale
(35) 
Specifically, receptive field responses that are computed from such a logarithmic parameterization of the image luminosities can be interpreted physically as a superposition of relative variations of surface structure and illumination variations. Let us assume: (i) a perspective camera model extended with (ii) a thin circular lens for gathering incoming light from different directions and (iii) a Lambertian illumination model extended with (iv) a spatially varying albedo factor for modelling the light that is reflected from surface patterns in the world. Then, it can be shown [19, Section 2.3] that a spatiotemporal receptive field response
(36) 
of the image data , where represents the spatiotemporal smoothing operator (here corresponding to a spatiotemporal smoothing kernel of the form (IIIB)) can be expressed as
(37) 
where

is a spatially dependent albedo factor that reflects properties of surfaces of objects in the environment with the implicit understanding that this entity may in general refer to points on different surfaces in the world depending on the viewing direction and thus the (possibly timedependent) image position ,

denotes a spatially dependent illumination field with the implicit understanding that the amount of incoming light on different surfaces may be different for different points in the world as mapped to corresponding image coordinates over time ,

represents the possibly timedependent internal camera parameters with the ratio referred to as the effective number, where denotes the diameter of the lens and the focal distance, and

represents a geometric natural vignetting effect corresponding to the factor for a planar image plane, with denoting the angle between the viewing direction and the surface normal of the image plane. This vignetting term disappears for a spherical camera model.
From the structure of Equation (37) we can note that for any nonzero order of spatial differentiation with at least either or , the influence of the internal camera parameters in will disappear because of the spatial differentiation with respect to or , and so will the effects of any other multiplicative exposure control mechanism. Furthermore, for any multiplicative illumination variation , where is a scalar constant, the logarithmic luminosity will be transformed as , which implies that the dependency on will disappear after spatial or temporal differentiation.
Thus, given that the image measurements are performed on a logarithmic brightness scale, the spatiotemporal receptive field responses will be automatically invariant under local multiplicative illumination variations as well as under local multiplicative variations in the exposure parameters of the retina and the eye.
Iv Computational modelling of biological receptive fields
In two comprehensive reviews, DeAngelis et al. [4, 5] present overviews of spatial and temporal response properties of (classical) receptive fields in the central visual pathways. Specifically, the authors point out the limitations of defining receptive fields in the spatial domain only and emphasize the need to characterize receptive fields in the joint spacetime domain, to describe how a neuron processes the visual image. Conway and Livingstone [22] and Johnson et al. [23] show results of corresponding investigations concerning spatiochromatic receptive fields.
In the following, we will describe how the above derived idealized functional models of linear receptive fields can be used for modelling the spatial, spatiochromatic and spatiotemporal response properties of biological receptive fields. Indeed, it will be shown that the derived idealized functional models lead to predictions of receptive field profiles that are qualitatively very similar to all the receptive field types presented in (DeAngelis et al. [4, 5]) and schematic simplifications of most of the receptive fields shown in (Conway and Livingstone [22]) and (Johnson et al. [23]).
Iva Spatial and spatiotemporal receptive fields in the LGN
Regarding visual receptive fields in the lateral geniculate nucleus (LGN), DeAngelis et al. [4, 5] report that most neurons (i) have approximately circular centersurround organization in the spatial domain and that (ii) most of the receptive fields are separable in spacetime. There are two main classes of temporal responses for such cells: (i) a “nonlagged cell” is defined as a cell for which the first temporal lobe is the largest one (Figure 11(left)), whereas (ii) a “lagged cell” is defined as a cell for which the second lobe dominates (Figure 11(right)).
When using a timecausal temporal smoothing kernel, the first peak of a firstorder temporal derivative will be strongest, whereas the second peak of a secondorder temporal derivative will be strongest (see [21, Figure 2]). Thus, according to this theory, nonlagged LGN cells can be seen as corresponding to firstorder timecausal temporal derivatives, whereas lagged LGN cells can be seen as corresponding to secondorder timecausal temporal derivatives.
The spatial response, on the other hand, shows a high similarity to a Laplacian of a Gaussian, leading to an idealized receptive field model of the form [19, Equation (108)]
(38) 
Figure 11 shows a comparison between the spatial component of a receptive field in the LGN with a Laplacian of the Gaussian. This model can also be used for modelling spatial oncenter/offsurround and offcenter/onsurround receptive fields in the retina. Figure 11 shows results of modelling spacetime separable receptive fields in the LGN in this way, using a cascade of truncated exponential kernels of the form (26) for temporal smoothing over the temporal domain.
Regarding the spatial domain, the model in terms of spatial Laplacians of Gaussians is closely related to differences of Gaussians, which have previously been shown to constitute a good approximation of the spatial variation of receptive fields in the retina and the LGN (Rodieck [29]). This property follows from the fact that the rotationally symmetric Gaussian satisfies the isotropic diffusion equation
(39) 
which implies that differences of Gaussians can be interpreted as approximations of derivatives over scale and hence to Laplacian responses. Conceptually, this implies very good agreement with the spatial component of the LGN model (38) in terms of Laplacians of Gaussians. More recently, Bonin et al. [44] have found that LGN responses in cats are well described by differenceofGaussians and temporal smoothing complemented by a nonlinear contrast gain control mechanism (not modelled here).