# Multi-step Online Unsupervised Domain Adaptation

## Abstract

In this paper, we address the Online Unsupervised Domain Adaptation (OUDA) problem, where the target data are unlabelled and arriving sequentially. The traditional methods on the OUDA problem mainly focus on transforming each arriving target data to the source domain, and they do not sufficiently consider the temporal coherency and accumulative statistics among the arriving target data. We propose a multi-step framework for the OUDA problem, which institutes a novel method to compute the mean-target subspace inspired by the geometrical interpretation on the Euclidean space. This mean-target subspace contains accumulative temporal information among the arrived target data. Moreover, the transformation matrix computed from the mean-target subspace is applied to the next target data as a preprocessing step, aligning the target data closer to the source domain. Experiments on four datasets demonstrated the contribution of each step in our proposed multi-step OUDA framework and its performance over previous approaches.

J. H. Moon, Debasmit Das and C.S. George Lee \addressPurdue University, West Lafayette, IN, USA \ninept

^{1}

^{2}

Unsupervised domain adaptation, online domain adaptation, mean subspace, Grassmann manifold.

## 1 Introduction

Domain Adaptation (DA) [16] aims to reduce the discrepancy between different distributions of the source and the target domains. In particular, the Unsupervised Domain Adaptation (UDA) problem focuses on the study that the target data are completely unlabelled, which is more plausible assumption for the recognition tasks in the real world.

There have been many studies on the UDA problem. For one branch of the studies, Gong et al. [7] and Fernando et al. [5] assumed that the source and the target domains share the common low-dimensional subspace. For another branch of studies on the UDA problem, Long et al. [13] and Sun et al. [18] directly minimized the discrepancy between the source and the target domains. Furthermore, Zhang et al. [25] and Wang et al. [21] combined the techniques in both branches. Vascon et al. [20] and Wulfmeier et al. [23] suggested a new technique for the UDA problem using Nash equilibrium [15] and Generative Adversarial Networks (GAN) [8], respectively.

We notice that only a few work has been conducted on the Online Unsupervised Domain Adaptation (OUDA) problem, which assumes that the target data are arriving sequentially as a small batch. Mancini et al. [14] adopted a batch normalization technique [10] for online domain adaptation, which was restricted to the kitting task only. Wulfmeier et al. [23] expanded his previous work on GANs to the online case. Bitarafan et al. proposed Incremental Evolving Domain Adaptation (IEDA) [2] algorithm, which computes the target data transformation using Geodesic Flow Kernel (GFK) [7] followed by updating the source subspace using Incremental Partial Least Square (IPLS) [24]. This approach is vulnerable when the target data are predicted incorrectly because the ill-labelled target data would be merged with the source-domain data, leading to worse prediction of future target data. Hoffman et al. [9] proposed an OUDA method using Continuous Manifold-based Adaptation (CMA), which formulated the OUDA problem as a non-convex optimization problem. However, this method merely considered the coherency among the adjacent target-data batches.

To overcome the drawbacks of the above methods – contamination of the source domain and lack of temporal coherency, we propose a multi-step framework for the OUDA problem, which institutes a novel method of computing the mean-target subspace inspired by the geometrical interpretation in the Euclidean space. Previous subspace-based methods on the OUDA problem merely compute the transformation matrix between the source subspace and each target subspace. Our method instead computes the transformation matrix between the source subspace and the mean-target subspace, which is incrementally obtained on the Grassmann manifold. Since a subspace is represented as a single point on the Grassmann manifold, the mean-target subspace is regarded as the mean point of multiple points that represent target subspaces of target-data batches. Although Karcher mean [3] is a well-known method for computing the mean point on the Grassmann manifold, it is not suitable for the OUDA problem since the Karcher mean is computed with an iterative process. Instead of the Karcher mean, we propose to compute the mean-target subspace by a geometrical process, which resembles the process of incremental computation for the mean point of a given multiple points on the Euclidean space. The transformation matrix computed with our proposed method is robust to the abrupt change of arriving target batches, leading to a stable domain transfer. We also feed the transformation matrix back to the next target batch, which moves it closer to the source domain. This preprocessing step of the next target batch leads to a more precise computation of the mean-target subspace. Experiments on four datasets demonstrated that our proposed method outperforms the traditional methods in terms of performance and computation speed.

## 2 Proposed Approach

### 2.1 Problem Description

We assume that the data in the source domain are static and labelled as , where , and indicate the numbers of source data, dimension of the data and the number of the class categories, respectively. Data in the target domain are unlabelled and arriving as one batch in each timestep as , which are assumed to be sequential and temporally correlated. We use the term mini-batch for the target-data batch and indicates the number of mini-batches and indicates the number of data in each mini-batch. is assumed to be constant for and very small compared to . In our notation, the subscripts and indicate the source and the target domains, respectively. Furthermore, subscript represents the mini-batch in the target domain.

Our goal is to align the target-data batch to the source domain at in an online manner so that the transformed target data can be recognized correctly as with the classifier pre-trained in the source domain. Using the notation of [7], we denote the subspace with its basis and , where is the dimension of the original data and is the dimension of the subspace. For instance, is the set of target subspaces composed of entire mini-batches, whereas is the target subspace for the mini-batch. For example, for Principal Component Analysis (PCA) [22], this subspace represents the projection matrix from the original space to the subspace.

### 2.2 Proposed OUDA Method

As shown in Fig. 1, our proposed OUDA framework consists of four steps for the incoming mini-batch: 1) Subspace representation, 2) Averaging mean-target subspace, 3) Domain adaptation, and 4) Recursive feedback. Step one computes the low-dimensional subspace, , of the target domain using PCA. Step two computes the mean of the target subspaces embedded in the Grassmann manifold using our novel technique, Incremental Computation of Mean Subspace (ICMS). Step three is the domain adaptation and it computes the transformation matrix from the target domain to the source domain based on the approach of Bitarafan et al. [2] which adopts the GFK method [7], a manifold alignment technique. Step four provides recursive feedback by feeding back to the next mini-batch . Each step is described next in detail.

#### Subspace Representation

The ultimate goal of our proposed OUDA method is to find the transformation matrix that transforms the set of target mini-batches to so that these transformed target data are well aligned to the source domain, where indicates the transformation matrix from to . However, we prefer not to use the methods that compute directly on the original data space with high dimension . For example, raw input image features have dimension , and the technique, which directly computes the transformation matrix by Correlation Alignment (CORAL) [18], requires to compute matrix. Since our technique is desired to be conducted in online manner, we embed the source and the target data to low-dimensional spaces as and , respectively, which preserve the meaningful information of the original data space. We adopt PCA to obtain and since PCA algorithm is simple and fast for online DA, and it is available for both labellel and unlabelled data unlike other dimension-reduction techniques such as Linear Discriminant Analysis (LDA) [1].

#### Averaging Mean-target Subspace

Throughout this paper, we utilize Grassmann manifold [4], a space that parameterizes all dimensional linear subspaces of dimensional vector space. Since a subspace is represented as a single point on the Grassmann manifold, and are represented as points on .

For solving the offline UDA problem (i.e., ), Gong et al. [7] utilized the geodesic flow from to on . Previous methods for the OUDA problem directly compute the transformation matrix based on the source and the target subspaces of each mini-batch. We propose a novel technique, called Incremental Computation of Mean Subspace (ICMS), which computes the mean subspace in the target domain inspired by the geometrical interpretation on the Euclidean space. Then we compute the geodesic flow from to . Formally, when the mini-batch arrives and is represented as the subspace , we incrementally compute the mean-target subspace using and , where is the mean subspace of () target subspaces .

As shown in Fig. 2(a), the mean point can be computed in an incremental way when points are on the Euclidean space. If the mean point of points and the point are given, the updated mean point is computed as . From a geometrical perspective, is the internal point where the distances from to and to have the ratio of :

(1) |

We adopt this ratio concept to the Grassmann manifold from a geometrical perspective. As shown in Fig. 2(b), we update the mean-target subspace of target subspaces when the previous mean subspace of () target subspaces and subspace are given. Using the geodesic parameterization [6] with a single parameter , the geodesic flow from to is parameterized as :

(2) |

under the constraints and . It is valid to apply this ratio concept on the Euclidean space to the geodesic flow on the Grassmann manifold since is parameterized proportionally to the arc length of [17]. denotes the orthogonal complement to ; that is, . Two orthonormal matrices and are given by the following pair of singular-value decompositions (SVDs),

(3) | |||

(4) |

where and are diagonal and block diagonal matrices, respectively, and and . Since the dimension of should be positive, should be greater than 0. We assume that the dimension of the subspace is much smaller than the dimension of the original space so that . The diagonal elements of and are and for . These ’s are the principal angles [12] between and . and are diagonal and block diagonal matrices whose elements are and , respectively.

Finally, we adopt the ratio concept from Eq. (1) to and obtain . Hence, we can incrementally compute the mean-target subspace as follow:

(5) |

Note that refers to the mini-batch in the target domain. Since , and are well defined.

#### Domain Adaptation

After computing the mean-target subspace , we parameterize another geodesic flow from to as :

(6) |

under the constraints and . denotes the orthogonal complement to ; that is, . Two orthonormal matrices and are given by the following pair of SVDs,

(7) | |||

(8) |

Based on the GFK, the transformation matrix from the target domain to the source domain is found by projecting and integrating over the infinite set of all intermediate subspaces between them:

(9) |

From the above equation, we can derive the closed form of as:

(10) |

We adopt this as the transformation matrix to the preprocessed target data as , which better aligns the target data to the source domain. is the target data fed back from the previous mini-batch, which is described in the next section.

#### Recursive Feedback

Previous work on the OUDA problem does not evidently consider the temporal dependency between the subspace of adjacent target mini-batches. Unlike traditional methods, our proposed OUDA method feeds back to the next target mini-batch as at the next timestep (), which imposes the temporal dependency between and by moving closer to on the Grassmann manifold. PCA is conducted from this to represent the target subspace .

## 3 Experimental Results

### 3.1 Datasets

To evaluate our proposed OUDA method in data classification, we performed experiments on four datasets [2]– the Traffic dataset, the Car dataset, the Waveform21 dataset, and the Waveform40 dataset. These datasets provided a large variety of time-variant images and signals to test upon. The Traffic dataset includes images captured from a fixed traffic camera observing a road over a 2-week period. It consists of 5412 instances of dimensional features with two classes as either heavy traffic or light traffic. Figure 3 depicts the image samples of the Traffic dataset. The Car dataset contains images of automobiles manufactured between 1950 and 1999 acquired from online database. It includes 1770 instances of dimensional features with two classes as sedans or trucks. The Waveform21 dataset is composed of 5000 wave instances of dimensional features with three classes. The Waveform40 dataset is the second version of the Waveform21 with additional features. This dataset consists of dimensional features.

### 3.2 Comparison with Previous Methods

We used the Evolving Domain Adaptation (EDA) [2] method as the reference model for comparing the classification accuracy with our proposed OUDA method and its variants. The metric for classification accuracy is based on [2] as , where is the accuracy of the arrived data and is the accuracy for mini-batch.

Figure 4 depicts the classification accuracy when the mini-batches are arriving. It indicated that our proposed OUDA method and majority of its variants outperformed the EDA method. For the Traffic dataset, a sudden drift occurred in the mini-batch which resulted in an abrupt decrease of the accuracy but the performance recovered when the number of arriving mini-batch increased. For the Car dataset, the average accuracy was slightly decreasing since the target data were evolving in long term (i.e., from 1950 to 1999), which resulted in more discrepancy between the source and the target domains.

### 3.3 Ablation Study

In order to understand which step of our proposed method contributes to the improvement of the accuracy performance, we also measured the accuracy for the different variants of our proposed OUDA method and compared their performance. We compared the accuracy by incrementally including each step to the process of OUDA. Except for the EDA method, which adopted Incremental Semi-Supervised Learning (ISSL) technique for classifying the unlabelled target data, all other approaches adopted the basic K-Nearest-Neighbors [11] or Support-Vector-Machine [19] classifiers for target-label prediction.

Table 1 shows that averaging the mean-target subspace (Gmean) and recursive feedback (FB) steps improved the performance the most. Gmean and FB steps improved the performance at 4.27% and 4.08% respectively, compared to EDA. These results indicated that computing the mean-target subspace leads to stable computation of the transformation matrix . Furthermore, feeding back to the target mini-batch shifted it closer to the source domain.

Method | Classifier | Traffic | Car | Waveform21 | Waveform40 |
---|---|---|---|---|---|

CMA+GFK | KNN | 63.22 | 82.50 | 72.48 | 66.85 |

SVM | 68.87 | 82.73 | 69.15 | 68.77 | |

CMA+SA | KNN | 41.33 | 56.45 | 33.19 | 33.09 |

SVM | 41.33 | 56.45 | 33.84 | 33.05 | |

EDA | ISSL | 69.00 | 82.59 | 74.65 | 79.66 |

PCA | KNN | 63.05 | 82.50 | 71.07 | 66.08 |

SVM | 68.85 | 83.31 | 82.55 | 77.74 | |

PCA+GFK | KNN | 64.02 | 82.44 | 70.55 | 65.76 |

SVM | 68.71 | 83.08 | 82.10 | 77.23 | |

PCA+GFK+FB | KNN | 61.77 | 81.28 | 72.65 | 66.85 |

SVM | 66.67 | 84.88 | 82.18 | 79.86 | |

PCA+GFK+Gmean | KNN | 56.42 | 82.73 | 72.22 | 67.11 |

SVM | 69.94 | 85.52 | 82.69 | 80.79 | |

PCA+GFK+Gmean+FB | KNN | 57.03 | 82.44 | 72.38 | 67.90 |

SVM | 69.77 | 85.00 | 82.51 | 81.07 |

Method | Traffic | Car | Waveform21 | Waveform40 |
---|---|---|---|---|

EDA | 105.7 | 2545 | 22.32 | 23.42 |

Proposed method | 57.45 | 5503 | 3.188 | 4.410 |

### 3.4 Computation Time

We evaluated the computation time of our proposed OUDA method as compared to the previous methods in the same datasets above. As shown in Table 2, our proposed OUDA method was significantly faster (i.e, 1.84 to 7.00 times) for all the datasets except the Car dataset, which indicated that our proposed method was more suitable for online DA. Since the Car dataset consists of dimensional features, it consumed more time to compute the mean-target subspace as well as the geodesic curve from the source subspace to the mean-target subspace.

## 4 Conclusions

We have described a multi-step framework for tackling the OUDA problem for classification problem when target data are arriving in mini-batches. Inspired by the geometrical interpretation of computing mean point on the Euclidean space, we proposed computing the mean-target subspace on the Grassmann manifold incrementally for mini-batches of target data. We further adopted a feedback step that leverages the transformation of the target data at the next timestep. The transformation matrix computed from the source subspace and the mean-target subspace aligned the target data closer to the source domain. Recursive feedback of domain adaptation increases the robustness of the recognition system for abrupt change of target data. Fast computation time due to the usage of low-dimensional space enables our proposed method to be applied to OUDA in real-time.

### Footnotes

- footnotetext: This work was supported in part by the National Science Foundation under Grant IIS-1813935. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
- footnotetext: We also gratefully acknowledge the support of NVIDIA Corporation with donation of a Titan XP GPU used for this research.

### References

- (1998) Linear discriminant analysis-a brief tutorial. Institute for Signal and information Processing 18, pp. 1–8. Cited by: §2.2.1.
- (2016) Incremental evolving domain adaptation. IEEE Transactions on Knowledge and Data Engineering 28 (8), pp. 2128–2141. Cited by: §1, §2.2, §3.1, §3.2.
- (2012) Statistics on special manifolds. Vol. 174, Springer Science & Business Media. Cited by: §1.
- (1998) The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications 20 (2), pp. 303–353. Cited by: §2.2.2.
- (2013) Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2960–2967. Cited by: §1.
- (2003) Efficient algorithms for inferences on grassmann manifolds. In IEEE Workshop on Statistical Signal Processing, 2003, pp. 315–318. Cited by: §2.2.2.
- (2012) Geodesic flow kernel for unsupervised domain adaptation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2066–2073. Cited by: §1, §1, §2.1, §2.2.2, §2.2.
- (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1.
- (2014) Continuous manifold based adaptation for evolving visual domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 867–874. Cited by: §1.
- (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §1.
- (1985) A fuzzy k-nearest neighbor algorithm. IEEE transactions on systems, man, and cybernetics 15 (4), pp. 580–585. Cited by: §3.3.
- (2012) Principal angles between subspaces and their tangents. arXiv preprint arXiv:1209.0523. Cited by: §2.2.2.
- (2015) Learning transferable features with deep adaptation networks. arXiv preprint arXiv:1502.02791. Cited by: §1.
- (2018) Kitting in the wild through online domain adaptation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1103–1109. Cited by: §1.
- (1950) Equilibrium points in n-person games. Proceedings of the national academy of sciences 36 (1), pp. 48–49. Cited by: §1.
- (2015) Visual domain adaptation: a survey of recent advances. IEEE signal processing magazine 32 (3), pp. 53–69. Cited by: §1.
- (2011) Introduction to differential geometry. ETH, Lecture Notes, preliminary version. Cited by: §2.2.2.
- (2016) Return of frustratingly easy domain adaptation. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §1, §2.2.1.
- (1999) Least squares support vector machine classifiers. Neural processing letters 9 (3), pp. 293–300. Cited by: §3.3.
- (2019) Unsupervised domain adaptation using graph transduction games. arXiv preprint arXiv:1905.02036. Cited by: §1.
- (2018) Visual domain adaptation with manifold embedded distribution alignment. In 2018 ACM Multimedia Conference on Multimedia Conference, pp. 402–410. Cited by: §1.
- (1987) Principal component analysis. Chemometrics and intelligent laboratory systems 2 (1-3), pp. 37–52. Cited by: §2.1.
- (2018) Incremental adversarial domain adaptation for continually changing environments. In 2018 IEEE International conference on robotics and automation (ICRA), pp. 1–9. Cited by: §1, §1.
- (2014) Incremental partial least squares analysis of big streaming data. Pattern recognition 47 (11), pp. 3726–3735. Cited by: §1.
- (2017) Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1859–1867. Cited by: §1.