Geometric-Algebra Adaptive Filters

Geometric-Algebra Adaptive Filters

Wilder B. Lopes1,  Cassio G. Lopes2,  1R&D Dept. at UCit (ucit.fr) and OpenGA.org, wil@openga.org. 2Dept. of Electronic Systems Engineering, University of Sao Paulo, Brazil, cassio@lps.usp.br.
Abstract

This paper presents a new class of adaptive filters, namely Geometric-Algebra Adaptive Filters (GAAFs). They are generated by formulating the underlying minimization problem (a deterministic cost function) from the perspective of Geometric Algebra (GA), a comprehensive mathematical language well-suited for the description of geometric transformations. Also, differently from standard adaptive-filtering theory, Geometric Calculus (the extension of GA to differential calculus) allows for applying the same derivation techniques regardless of the type (subalgebra) of the data, i.e., real, complex numbers, quaternions, etc. Relying on those characteristics (among others), a deterministic quadratic cost function is posed, from which the GAAFs are devised, providing a generalization of regular adaptive filters to subalgebras of GA. From the obtained update rule, it is shown how to recover the following least-mean squares (LMS) adaptive filter variants: real-entries LMS, complex LMS, and quaternions LMS. Mean-square analysis and simulations in a system identification scenario are provided, showing very good agreement for different levels of measurement noise.

Adaptive filtering, geometric algebra, quaternions.

I Introduction

Adaptive filters (AFs), usually described via linear algebra (LA) and standard vector calculus [1], can be interpreted as instances for geometric estimation, since the weight vector to be estimated represents a directed line in an underlying vector space, i.e, a n-dimensional vector encodes the length and direction of an edge in a n-dimensional polytope (see Fig. 1). However, to estimate the polytope as a whole (edges, faces, and so on), a regular AF designed in light of LA might not be very helpful.

Indeed, LA has limitations regarding the representation of geometric structures [2, p. 20]. Take for instance the matrix product between two vectors: it always results in either a scalar or a matrix (2-dimensional array of numbers). Thus, one may wonder if it is possible to construct a new kind of product that takes two vectors (directed lines or edges) and returns a hypersurface (the face of an n-dimensional polytope); or takes a vector and a hypersurface and returns another polytope. Similar ideas have been present since the advent of algebra, in an attempt to establish a deep connection with geometry [3, 4].

The geometric product, which is the product operation of GA [5, 6], captures the aforementioned idea. It allows one to map a set of vectors not only onto scalars, but also onto hypersurfaces and n-dimensional polytopes. The use of GA increases the portfolio of geometric shapes and transformations one can represent. Also, its extension to calculus, namely geometric calculus (GC), allows for a clear and compact way to perform calculus with hypercomplex quantities, i.e., elements that generalize complex numbers for higher dimensions [3, 4, 5, 2, 7, 6, 8, 9, 10].

GA-based AFs were first introduced in [11, 12], where they were successfully employed to estimate the geometric transformation (rotation and translation) that aligns a pair of 3D point clouds (a recurrent problem in computer vision), while keeping a low computational footprint. This work takes a step further: it uses GA and GC to generate a new class of AFs able to naturally estimate hypersurfaces, hypervolumes, and elements of greater dimensions (multivectors), generalizing regular AFs from the literature. Filters like the regular least-mean squares (LMS) – real entries, the Complex LMS (CLMS) – complex entries [13], and the Quaternion LMS (QLMS) – quaternion entries [14, 15, 16], are recovered as special cases of the more comprehensive GA-LMS introduced herein.

Fig. 1: A polyhedron (3-dimensional polytope) can be completely described by the geometric multiplication of its edges (oriented lines, vectors), which generate the faces and hypersurfaces (in the case of a general n-dimensional polytope).

The text is organized as follows. Section II covers the transition from linear to geometric algebra, presenting the fundamentals of GA and GC. In Section III, the standard quadratic cost function, usually adopted in adaptive filtering, is recovered as a particular case of a more comprehensive quadratic cost function that can only be written using the geometric product. In Section IV the gradient of that cost function is calculated via GC techniques and the GA-LMS is formulated. Section V provides a mean-square analysis (steady state) with the support of the energy conservation relations [1]. Experiments are shown in Section VI to assess the performance of the GAAFs against the theoretical analysis in a system-identification scenario. Finally, conclusion remarks are presented in Section VII.

Remarks about notation

Table I summarizes the notation convention. While those are deterministic quantities, their boldface versions represent random quantities. The name array of multivectors was chosen to avoid confusion with vectors in , which in this text have the usual meaning of collection of real numbers (row or column). In this sense, an array of multivectors can be interpreted as a “vector” that allows for hypercomplex entries (numbers constructed by adding “imaginary units” to real numbers [17]).

Symbol Definition
a Arrays of multivectors, vectors and scalars.
A General multivector or matrix.
r Rotor (type of multivector).
Time-varying scalar, vector (or array of multivectors),
and general multivector (or matrix), respectively.
TABLE I: Summary of notation.

Ii From Linear to Geometric Algebra

To start transitioning from LA to GA, one needs to recall the definition of an algebra over the reals [18, 6, 9]: a vector space over the field , equipped with a bilinear map denoted by (the product operation of the algebra), is said to be an algebra over if the following relations hold and ,

(1)

Linear (matrix) algebra, utilized to describe adaptive filtering theory, is constructed from the definition above. The elements of this algebra are matrices and vectors, which multiplied among themselves via the matrix product generate new matrices and vectors. Additionally, to express the notion of vector length and angle between vectors, LA adopts the bilinear form , i.e., inner product, which returns a real number as a result of the multiplication between two vectors in (one says that is a normed vector space) [18, p. 180].

Geometric (Clifford) Algebra derives from (1), however with a different product operation. Such product, called geometric product, is what allows for GA to be a mathematical language that unifies different algebraic systems trying to express geometric relations/transformations, e.g., rotation and translation. The following systems are examples of algebras integrated into GA theory: vector/matrix algebra, complex numbers, and quaternions [5, 2, 6, 9]. Such unifying quality is put into use in this work to expand the capabilities of AFs.

The fundamentals of GA are provided in the sequel. For an in-depth discussion, the reader is referred to [5, 2, 7, 6, 8, 9, 10, 19, 20, 21, 22, 23, 3, 4].

Ii-a Fundamentals of Geometric Algebra

Consider vectors in , i.e., arrays with real entries. The inner product is the standard bilinear form that describes vector length and angle between vectors in linear (matrix) algebra. This way, results in a scalar,

(2)

in which is the angle between and , and denotes the vector magnitude (norm). Additionally, the inner product is commutative, i.e., .

The outer product is the usual product of the exterior algebra introduced by Grassmann’s Ausdehungslehre (theory of extension) [3, 24, 2, 9]. It captures the geometric fact that two nonparallel directed segments determine a parallelogram, a notion which can not be described by the inner product. The multiplication results in an oriented surface or bivector (see Fig. 2a). Such a surface can be interpreted as the parallelogram (hyperplane) generated when vector is swept on the direction determined by vector . Alternatively, the outer product can be defined as a function of the angle between and

(3)

where is the unit bivector111An unit bivector is the result of the outer product between two unit vectors, i.e., vectors with unitary norm. that defines the orientation of the hyperplane  [2, p.66]. Note that in the particular case of 3D Euclidean space, (3) is reduced to the cross product , where is the unit vector normal to the plane containing . From Fig. 2a it can be concluded that the outer product is noncommutative, i.e., : the orientation of the surface generated by sweeping along is opposite to the orientation of the surface generated by sweeping along .

Finally, the geometric product222In this text, from now on, all products are geometric products, unless otherwise noted. of vectors and is denoted by their juxtaposition and defined as

(4)

in terms of the inner () and outer () products [6, Sec. ]. Note that due to the geometric product is noncommutative, resulting in , and it is associative, , .

In linear algebra the fundamental elements are matrices/vectors. In a similar way, in GA the basic elements are the so-called multivectors (Clifford numbers). The structure of a multivector can be seen as a generalization of complex numbers and quaternions for higher dimensions. For instance, a complex number has a scalar part combined with an imaginary part ; quaternions like expand that by adding two extra imaginary-valued parts. Multivectors generalize this structure, which contains one scalar part and several other parts named grades. Thus, a general multivector has the form

(5)

which is comprised of its g-grades (or g-vectors) , e.g., (scalars), (vectors), (bivectors, generated via the geometric multiplication of two vectors), (trivectors, generated via the geometric multiplication of three vectors), and so on. The grade (scalar) is also denoted as . Additionally, in , ,  [2]. The ability to group together scalars, vectors, and hypercomplex quantities in a unique element (multivector) is the foundation on top of which GA theory is built – it allows for “summing apples and oranges” in a well-defined fashion. Section II-B will show how to recover complex numbers and quaternions as special cases of (5).

The multivectors that form the basis of a geometric algebra over the vector space , denoted , are obtained by multiplying the vectors that compose the orthonormal basis of via the geometric product (4). This action generates multivectors, which implies that  [5, p. 19]. Those multivectors are called blades of . For the special case , with orthonormal basis , the procedure above yields the following blades

(6)

which together are a basis for with dimension . Note that (6) has one scalar, three orthonormal vectors (basis for ), three bivectors (oriented surfaces) (), and one trivector (pseudoscalar) (Fig. 2b). In general, the pseudoscalar is defined as the multivector of highest grade in an algebra . It commutes with any multivector in , hence the name pseudoscalar [5, p. 17].

To illustrate the geometric multiplication between elements of , take two multivectors and . Then, (a scalar plus a bivector). Another example is provided to highlight the reverse of a multivector , which is the GA counterpart of complex conjugation in linear algebra, defined as

(7)

Thus, given , its reverse is . Therefore, the reverse of the multiplication above is . Note that since the -grade of a multivector is not affected by reversion, mutually reverse multivectors, say and , have the same -grade, .

Fig. 2: (a) Visualization of the outer product in . The orientation of the circle defines the orientation of the surface (bivector). (b) The elements of basis (besides the scalar ): vectors, bivectors (oriented surfaces) , and the trivector (pseudoscalar/oriented volume). Note that in , ,  [2, p.42].

Ii-B Subalgebras and Isomorphisms

The GAAFs are designed to compute with any multivector-valued quantity, regardless if it is real, complex, quaternions, etc. Indeed, real (), complex (), and quaternion algebras (), commonly used in adaptive filtering and optimization literature [25, 14, 15, 26, 16, 27, 28], are subsets (subalgebras) of the GA created by general multivectors like (5). Thus, to support the generalization of standard AFs achieved by GAAFs (Section IV), this section shows how those subalgebras of interest can be retrieved from , the complete GA of , via isomorphisms333For simulation purposes, this paper focuses on the case , i.e., the subalgebras of . However, the GAAFs can work with any subalgebra of (Section IV)..

The Complete Geometric Algebra of is obtained by multiplying the elements of (6) via the geometric product. As depicted in Fig. 3, this results in several terms which linearly combined represent all the possible multivectors in . Also, note that since , the multiplication table for can be recovered from Fig. 3.

The even subalgebras [6, 2, 9] of and , i.e., those whose elements have only even grades ( in (5)) and denoted and , are of special interest. One can show that , with basis (also and ), is isomorphic to the complex numbers [6]. This is established by identifying the imaginary unit with the bivector , . From Fig. 3 it is known that . Then, , demonstrating the isomorphism. Similarly, with basis is shown to be isomorphic to quaternion algebra via the adoption of the following correspondences: , , , where are the three imaginary unities of a quaternion. The minus signs are necessary to make the product of two bivectors equal to the third one and not minus the third, e.g., , just like quaternions, i.e. , , and  [6, 29, 30]. Additionally, note that .

It follows that the dimension of the complete GA of can be obtained from its subalgebras, i.e., .

Fig. 3: Multiplication table of via the geometric product. The elements highlighted in magenta constitute the multiplication table of (even subalgebra isomorphic to complex numbers). Similarly, the elements enclosed in the green rectangles form the multiplication table of (even subalgebra isomorphic to quaternions).

Ii-C Performing Rotations with Multivectors (Rotors)

The even subalgebra is also known as the algebra of rotors, i.e., its elements are a special type of multivector (called rotors) able to apply rotations to vectors in  [5, 2]. Given a vector , it can be rotated by applying the rotation operator

(8)

where is a rotor, is its reverse, and , i.e., is a unit rotor. Note that the unity constraint is necessary to avoid the rotation operator to scale the vector , i.e., to avoid changing its norm.

A rotor can be generated from the geometric multiplication of two unit vectors in . Given , , with an angle between them, and using Equations 24, the exponential form of a rotor is [5, p. 107]

(9)

As shown in Fig. 6, is rotated by an angle of about the normal of the oriented surface (rotation axis) [2]. Note that both quantities ( and ) define the rotor in (9). The geometric transformation enabled by rotors was applied in [11, 12] to devise AFs that estimate the relative rotation between 3D Point Clouds (PCDs).

(a)
(b)
Fig. 6: (a) A rotor can be generated from the geometric multiplication of two unit vectors in . (b) Applying the rotation operator: the vector is rotated by an angle of about the normal of the oriented surface (a hyperplane). In two dimensions (2D), is a complex number, its conjugate, and they rotate about the normal of the complex plane.

Iii Linear Estimation in GA

This section introduces an instantaneous quadratic cost function with multivector entries. This is key to expand the estimation capabilities of AFs, generating the GAAFs further ahead in Section IV. Moreover, it is shown that the standard LA counterpart can be recovered as a special case.

Iii-a Definitions

The scalar product between two multivectors is , i.e., it is the scalar part (-grade) of the geometric multiplication between and (for the special case of vectors, ). Its commutativity originates the cyclic reordering property [5] .

An array of multivectors is a collection (row or column) of general multivectors. Given multivectors in , the array collects them as follows

(10)

The array is denoted using lower case letters, the same as scalars and vectors (1-vectors). However, the meaning of the symbol will be evident from the context. Additionally, this work adopts the notion of matrix of multivectors as well: it is a matrix whose elements are multivector-valued.

Given two arrays of multivectors, and , the array product between them is defined as

(11)

which results in a general multivector. The underlying product in each of the terms , , is the geometric product. Observe that due to the noncommutativity of the geometric product, in general.

Similarly, multiplications involving matrices of multivectors follow the general rules of matrix algebra, however using the geometric product as the underlying product, just like in (11).

The reverse transpose array is the extension of the reverse operation of multivectors to include arrays of multivectors. Given the array in (10), its reverse version, denoted by , is

(12)

Note the similarity with the Hermitian conjugate, its counterpart in LA. From (11) and (12) it follows that

(13)

which results in a general multivector.

The array product between and is represented by the notation . Note the same notation is employed in LA to represent the squared norm of a vector in . However, from (13) it is known that is a general multivector, i.e., it is not a pure scalar value which in LA provides a measure of distance. In GA, the distance metric is given by the magnitude of a multivector, defined in terms of the scalar product, e.g., which is indeed a scalar value. Thus, for an array and a multivector ,

(14)

The product between a multivector and an array , namely , is defined as the geometric multiplication of by each entry of (a procedure similar to multiplying a scalar by a vector in LA). Due to the noncommutativity of the geometric product, in general.

Iii-B General Cost Function in GA

Following the guidelines in [5, p.64 and p.121], one can formulate a minimization problem in GA by defining the following cost function

(15)

where are general multivectors. The term (the addition of multiplications) represents the canonical form of a multilinear transformation applied to the multivector  [5, p.64 and p.121]. The goal is to optimize in order to minimize (15).

Special cases of (15) are obtained depending on the values chosen for and . For instance, making , (1-vector), (1-vector), (rotor), and yields , the instantaneous version of the cost function minimized in [11, 12] (subject to ) to estimate the relative rotation between 3D PCDs.

In this paper it is studied the case in which , , (general multivectors), so that

(16)

where now also represents the system order (the number of taps in the filter), is the estimation error, and the definition of array product (13) was employed to make . Notice that (16) is the GA counterpart of the standard LA cost function [1, p.477]. And similarly to its LA counterpart, (multivector) is estimated as a linear combination of the multivector entries of .

Iv Geometric-Algebra Adaptive Filters

In this section, the GAAFs are motivated by deriving the GA-LMS to minimize the cost function (16) in an adaptive manner. This is done by writing (16) at instant , yielding an instantaneous cost function  [31, 32], as shown in the sequel.

At each iteration , two multivector-valued signals and are observed. Samples of are collected into an array . The linear combination (array product) is used as an estimate of , generating the a priori output estimation error

(17)

This way, from (16) and (17)

(18)

Given and , the GAAFs update the estimate for the array of multivectors via a recursive rule of the form

(19)

where (a scalar) is the AF step size (learning rate), and is an array of multivectors related to the estimation error . This work adopts an instantaneous steepest-descent rule [1, 31, 32], in which the AF is designed to follow the opposite direction of the instantaneous gradient of (18), namely . This way, , yielding the general form of a GA-based adaptive rule

(20)

in which is a matrix with multivector entries. Choosing appropriately is a requirement to define the type of adaptive algorithm [1].

Iv-a A Note on Multivector Derivative

Equation (20) requires the calculation of a multivector derivative. In GA, the differential operator has the algebraic properties of a multivector in  [33]. Put differently, the gradient can be interpreted as the geometric multiplication between the multivector-valued quantities and .

This way, it follows from (5) that can be decomposed into its basis blades. In fact, it is known that any multivector can be decomposed into blades via

(21)

in which is scalar-valued, and and , are the so-called reciprocal (dual) bases of 444Two symbols are used to refer to GA basis, each with a different purpose. The symbol (with two indexes) is adopted when dealing with specific algebras, e.g., or . For general algebras and analytical derivation, the symbol (with only one index) is more appropriate.. The concept of reciprocal (dual) bases is a useful analytical tool in GA to convert from nonorthogonal to orthogonal vectors and vice versa – it simplifies the analytical procedure ahead since mutually orthogonal elements cancel out. Details are provided in [5, Section 1-3] and [6]. For the purpose of this paper, it suffices to know that the following relation holds for dual bases: , where for (Kronecker delta). It is easy to show that the basis for , and its reversed version comply with the relation above (i.e., they are dual bases). Therefore, they are utilized from now on to decompose multivectors into blades. In particular, applying (21) to results in

(22)

where each term in the sum is the usual derivative from standard calculus, which affects only blade . Form (22) provides some analytical advantages (see [34]) and is employed next to calculate the gradient .

Iv-B Calculating the Multivector-valued Gradient

Noticing that (18) can be written in terms of the scalar product

(23)

one can decompose it in terms of its blades via (21),

(24)

The gradient is obtained by multiplying (22) and (24)

(25)

From (17) one can notice that , where (similar for ). Thus

(26)

where since does not depend on the weight array .

Plugging (26) into (25) results in

(27)

The term is obtained by decomposing into its blades

(28)

which requires to perform the decomposition of and (arrays of multivectors). Indeed, arrays of multivectors can be decomposed into blades as well. For instance

(29)

Thus, employing (21) once again, and can be written in terms of their blades

(30)

where and are respectively and arrays with real entries. Plugging (30) back into (28)555From now on, the iteration subscripts and are omitted from and for clarity purposes.

(31)
(32)

is the expression of as a function of the blades of .

From (32), the term of (27) becomes

(33)

It is important to notice that will be different from zero only when , i.e., when and are of same blade. This is the case since is the partial derivative of with respect to the blade only. Therefore, if then the partial derivation yields zero, i.e., (note that does not depend on ). Thus, adopting the Kronecker delta function [34], , and (33) becomes

(34)

Finally, substituting (34) into (27), the stochastic gradient is obtained

(35)

In the AF literature, setting equal to the identity matrix in (20) results in the stochastic-gradient update rule [1]. This is adopted here as well in order to devise the GA version of the LMS filter – however, GA allows for selecting with multivector entries, opening up the possibility to generate other types of GA-based adaptive filters. Substituting (35) into (20) and setting equal to the identity matrix yields the GA-LMS update rule

(36)

where the in (35) was absorbed by the scalar step size .

Note that the GA-LMS (36) has the same format of standard LMS AFs [31], namely the real-valued LMS ( and have real-valued entries) and the complex-valued LMS ( and have complex-valued entries). However, this work puts no constraints on the entries of the arrays and – they can be any kind of multivector. This way, the update rule (36) is valid for any and whose entries are general multivectors in . In other words, (36) generalizes the standard LMS AF for several types of and entries: general multivectors, rotors, quaternions, complex numbers, real numbers – any subalgebra of .

This is a very interesting result, accomplished due to the comprehensive analytic tools provided by Geometric Calculus. Recall that, in adaptive filtering theory, the transition from real-valued AFs to complex-valued AFs requires one to abide by the rules of differentiation with respect to complex-valued variables, represented by the Cauchy-Riemann conditions (see [1, p.25]). Similarly, quaternion-valued AFs require further differentiation rules that are captured by the Hamilton-real (HR) calculus [14, 15, 16] and its generalized version (GHR) [28]. Although those approaches are successful, each time the underlying algebra is changed, the analytic tools need an update as well. This is not the case if one resorts to GA and GC to address the minimization problem: the calculations are always performed the same way.

V Mean-Square Analysis (Steady State)

The goal of the analysis is to derive an expression for the mean-square error (MSE) in steady-state of GAAFs via energy conservation relations (ECR) [1]. To achieve that, first some quantities and metrics are recast into GA.

V-a Preliminaries

A random multivector is one whose blade coefficients are random variables. For instance, a random multivector in is

(37)

where the terms are independent and identically distributed (i.i.d.) real-valued random variables. By extension, random arrays are arrays of random multivectors.

The ECR technique is an energy balance in terms of the following (random) error quantities

(38)

together with the AF’s recursion.

The stationary data model is captured by the following set of assumptions

(39)

As in linear algebra, the steady-state MSE in GA must be scalar-valued. To this end, the MSE is defined as

(40)

where , defined in (14), is applied to compactly denote the geometric product .

From the stationary linear data model (39),

(41)

The term is the a priori error, from which the steady-state excess mean-square error (EMSE) is defined,

(42)

From (41), , since is independent of any other random quantity and its samples are assumed to be drawn from a zero-mean white Gaussian process. Therefore, applying (40) and (42), , analogous to the LA case.

V-B Steady-State Analysis

The ECR technique performs an interplay between the energies of the weight error array and the error at two successive time instants, (a priori) and (a posteriori). As a result, an expression for the variance relation is obtained, which is then particularized for each AF of interest. For details on the ECR procedure, please refer to [1, p.228].

Consider a GAAF whose update rule has the following general shape

(43)

where is a multivector-valued function of the estimation error . Depending on the type of the GAAF (LMS, NLMS etc), assumes a specific value.

Subtracting (43) from the optimal weight array yields

(44)

in which . Multiplying from the left by (array product) results in

(45)

where is the a posteriori error, is the a priori error (See (38)), and in the last equation (See (14)).

The multivector is assumed to be factorable into a product of invertible vectors666This assumes that has only one type of grade: vector or bivector or trivector and so on. In practice, can be a general multivector composed by different grades. However, such assumption allows for a clearer analysis procedure, and ultimately does not compromise the accuracy of the EMSE expression, as shown in the simulations. [5, p. 14], which guarantees the existence of a multiplicative inverse . This allows for solving (45) for

(46)

which substituted into (44) results in

(47)

Taking the squared magnitude of both sides,

(48)

The left-hand side (LHS) is expanded as

(49)

in which is the GA scalar product and    is the reverse. Further expansion gives

(50)

in which since holds (See after (14)). Applying the definition of GA scalar product and observing that the third and fourth terms of (50) are each other’s reverse (i.e., their -grades are the same), their sum can be written as ,

where the cyclic reordering property was used. Note that the term is the definition of the a posteriori error (see (45)). This way, (49) assumes the form