Fairest of Them All: Establishing a Strong Baseline for Cross-Domain Person ReID

Fairest of Them All: Establishing a Strong Baseline for Cross-Domain Person ReID

Devinder Kumar*, Parthipan Siva, Paul Marchwica, Alexander Wong*
*University of Waterloo                       SPORTLOGiQ Inc.,
Waterloo, Ontario, Canada               Kitchener, Ontario, Canada
{d22kumar,a28wong}@uwaterloo.ca     {parthipan,paul}@sportlogiq.com
Abstract

Person re-identification (ReID) remains a very difficult challenge in computer vision, and critical for large-scale video surveillance scenarios where an individual could appear in different camera views at different times. There has been recent interest in tackling this challenge using cross-domain approaches, which leverages data from source domains that are different than the target domain. Such approaches are more practical for real-world widespread deployment given that they don’t require on-site training (as with unsupervised or domain transfer approaches) or on-site manual annotation and training (as with supervised approaches). In this study, we take a systematic approach to establishing a large baseline source domain and target domain for cross-domain person ReID. We accomplish this by conducting a comprehensive analysis to study the similarities between source domains proposed in literature, and studying the effects of incrementally increasing the size of the source domain. This allows us to establish a balanced source domain and target domain split that promotes variety in both source and target domains. Furthermore, using lessons learned from the state-of-the-art supervised person re-identification methods, we establish a strong baseline method for cross-domain person ReID. Experiments show that a source domain composed of two of the largest person ReID domains (SYSU and MSMT) performs well across six commonly-used target domains. Furthermore, we show that, surprisingly, two of the recent commonly-used domains (PRID and GRID) have too few query images to provide meaningful insights. As such, based on our findings, we propose the following balanced baseline for cross-domain person ReID consisting of: i) a fixed multi-source domain consisting of SYSU, MSMT, Airport and 3DPeS, and ii) a multi-target domain consisting of Market-1501, DukeMTMC-reID, CUHK03, PRID, GRID and VIPeR.

Figure 1: Common learning approaches used in person ReID. (a) Supervised approaches train embedding space from labeled training data from the test environment (target domain). (b) Cross-domain approaches train embedding space from labeled training data from source domains that are independent of the target domain. (c) Domain Adaptation approaches adapt the cross-domain embedding space to target domain using unlabeled data from the target domain. (d) Unsupervised approaches learn embedding space directly from unlabeled target domain data without any external information.

1 Introduction

One of the fundamental computer vision problems for the purpose of large-scale computerized video surveillance is the ability to identify an individual across a multitude of cameras in a given environment. This requires the ability to match an individual from one camera view to another, and is commonly referred to as the person re-identification (ReID) problem. One of the most promising approaches to person ReID that has achieved state-of-the-art results in recent years leverages deep convolutional neural networks (CNN) to learn an embedding space [zheng2015scalable, Zeng2018hierarchical, liu2017Stepwise, wang2018transferable]. Current research on person ReID using deep convolutional neural networks can be categorized as follows (Fig. 1):

Supervised: Supervised approaches explicitly assume that manually annotated data in the form of matched individuals across cameras is available for the camera network where person ReID is needed (target domain). The manual annotation, used for training, must be sufficiently large and span all cameras in the target domain. Furthermore, for best performance an implicit assumption is made that the manually annotated data spans all expected environmental conditions in the target domain. To this end, existing datasets consist of data collected from an environment in a short time period and randomly split data into training and testing.

Cross-Domain: Cross-domain approaches assume the availability of manually annotated data from a single or multiple source domains which are different from the target domain. Embedding space is trained on the source domains, then applied to the target domain. In this approach an implicit assumption is made that the source domain is similar to the target domain or the source domain has enough environmental variations as to capture the differences in the target domain. Techniques for handling differences in source and target domains are studied in [song2019generalizable, marchwica2018evaluation, yu2017cross].

Domain Adaptation: Domain adaptation approaches start with cross-domain result – that is they start with an embedding space learned from a different source domain – then utilize unlabeled data from the target domain to adapt to the target domain. There are two main approaches here: i) use unlabeled data to adjust to the target domain by transforming target domain images to look more like the source domain images (or vice versa) [Wei2018GAN, deng2018image], or ii) automatically annotate unlabeled target domain data and re-train models [yu2017cross]. The latter approach can be considered as a form of self-learning. Such approaches can be considered a good compromise between supervised and unsupervised approaches.

Unsupervised: Unsupervised approaches do not use manually annotated data from the target domain or any source domains. They only utilize unlabeled data from the target domain to train the deep model’s embedding space. This is a pure self-learning problem that typically utilizes the inherent spatio-temporal nature of a camera network to initialize the self-learning process [li2018unsupervised].

Supervised approaches are the most extensively studied approaches in person ReID [zhang2019densely, tay2019aanet, zheng2019pyramidal, zheng2019joint] and achieve the best results so far, which is expected. However, from a practical point of view this approach is unfeasible for large scale deployment of person ReID because of the need for manually annotated data from every target domain. On the opposite end of the manual annotation spectrum are unsupervised approaches where no manual annotation is needed at all. This is a promising avenue of research with some amazing results already reported [song2018unsupervised, li2018unsupervised, qin2015unsupervised, fan2018unsupervised]. However, this approach assumes the availability of training hardware at the target sites or the ability to transfer unlabeled data from target site to a training facility. Again, this complicates large scale deployment of person ReID using such an approach. Furthermore, the system requires a learning period in which person ReID functionality will not be available at all. Domain adaptation is a compromise between supervised and unsupervised approaches but it still requires on site training for the domain adaptation part.

Cross-domain approaches can be regarded as the ideal approaches for practical deployment because a pre-trained model from annotated source domain(s) can simply be deployed to any target domain without on-site training. Currently, cross-domain research in person ReID is split into two key directions, with some works reporting cross-domain results as part of domain adaptation research [li2018adaptation], while other works looking directly at the cross-domain problem [jia2019frustratingly, song2019generalizable]. As a result, there is a lack of consistency in the use of source domains and target domains making comparison of different approaches difficult.

Many papers [li2014deepreid, lisanti2015person, li2018unsupervised] consider single source domain and single target domain scenarios. This results in easier comparison between different approaches but does not take advantage of the existing multiple person ReID datasets. To address this shortcoming, recent works have combined [jia2019frustratingly, marchwica2018evaluation, song2019generalizable] multiple datasets to form a large source domain and tested with several small target domains. However, the target domains have no commonality between the previous approaches that used single source and target domain for testing. Furthermore, the target domains used in these multisource domain papers [jia2019frustratingly, song2019generalizable] are very small datasets and as such we can’t make any strong conclusions about their performance.

In this study, we take a systematic approach to establishing a large baseline source domain and target domain for cross-domain person ReID. We accomplish this by conducting a comprehensive analysis to study the similarities between source domains proposed in literature, and studying the effects of incremental increasing the size of the source domain. This allows us to establish a balanced source domain and target domain split that promotes variety in both source and target domains. Furthermore, using lessons learned from state-of-the-art supervised person ReID methods, we establish a strong baseline method for cross-domain person ReID.

In summary, the key contributions of this study are as follows:

  • conduct a comprehensive analysis to study the similarities between source domains proposed in literature,

  • study the effects of incrementally increasing the size of the source domain,

  • establish a large, baseline source domain and target domain for cross-domain person ReID based on the findings of the above analysis.

  • establish a strong baseline method for cross-domain person ReID using lessons learned from the state-of-the-art supervised person ReID methods.

Figure 2: Existing ReID datasets [Airport [karanam2018airport], DukeMTMC-reID [zheng2017unlabeled], Market-1501 [zheng2015scalable], CUHK03 [li2014deepreid], SYSU [DBLP:journals/corr/XiaoLWLW16], MSMT17 [Wei2018GAN], CUHK02 [li2013locally], PRID2011 [PRID2011Dataset], GRID [GRIDDataset], VIPeR [ViperDataset], 3DPeS [baltieri2011_308]]. All images have been re-sized to 256x128 for easier comparison.
Dataset # IDs # Images # Cams Common Test-Set


Airport [karanam2018airport]
1381 8660 6
DukeMTMC-reID [zheng2017unlabeled] 1404 32948 8
Market-1501 [zheng2015scalable] 1501 32668 6
CUHK03 [li2014deepreid] 1467 14097 10
SYSU [DBLP:journals/corr/XiaoLWLW16] 11934 34574 N/A
MSMT17 [Wei2018GAN] 3060 126142 15
CUHK02 [li2013locally] 1816 7264 10
PRID2011 [PRID2011Dataset] 385 1134 2
GRID [GRIDDataset] 250 500 2
VIPeR [ViperDataset] 632 1264 2
3DPeS [baltieri2011_308] 164 951 8
Table 1: List of different source and target domain datasets.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
384243
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description