Revisiting Distributional Correspondence Indexing: A Python Reimplementation and New Experiments
This paper introduces PyDCI, a new implementation of Distributional Correspondence Indexing (DCI) written in Python. DCI is a transfer learning method for cross-domain and cross-lingual text classification for which we had provided an implementation (here called JaDCI) built on top of JaTeCS, a Java framework for text classification. PyDCI is a stand-alone version of DCI that exploits scikit-learn and the SciPy stack. We here report on new experiments that we have carried out in order to test PyDCI, and in which we use as baselines new high-performing methods that have appeared after DCI was originally proposed. These experiments show that, thanks to a few subtle ways in which we have improved DCI, PyDCI outperforms both JaDCI and the above-mentioned high-performing methods, and delivers the best known results on the two popular benchmarks on which we had tested DCI, i.e., MultiDomainSentiment (a.k.a. MDS – for cross-domain adaptation) and Webis-CLS-10 (for cross-lingual adaptation). PyDCI, together with the code allowing to replicate our experiments, is available at https://github.com/AlexMoreo/pydci .
Transfer Learning Domain Adaptation Text Classification Sentiment Classification Cross-Domain Classification Cross-Lingual Classification Python
Distributional Correspondence Indexing (DCI) is a
pivot-based feature-transfer domain adaptation method for cross-domain
and cross-lingual text classification. DCI was first described in
(Esuli and Moreo, 2015), and later improved and extended in
(Moreo et al., 2016a); it was formerly implemented in Java as part of
the JaTeCS (Java Text
Categorization System) framework
(Esuli et al., 2017), and this implementation (henceforth called
JaDCI) was made publicly
In this paper we present PyDCI, a new implementation of the DCI method written in Python and built on top of the SciPy stack and scikit-learn toolkit. Python has become the preferred programming language for computer scientists. In the fields of machine learning and data mining its use has also been promoted by the appearance of Python-based environments such as SciPy and scikit-learn, whose potential and ease of use have attracted the interest of practitioners. Our reimplementation is thus in line with these trends.
With respect to JaDCI, PyDCI introduces a few modifications in the way DCI is implemented that, although subtle, bring about a significant improvement in the effectiveness of the method.
The rest of this paper is structured as follows. In Section 2 we describe the main modifications to DCI that our new implementation introduces. In Section 3 we report on new experiments that we have run using PyDCI, and show that, thanks to the modifications above, PyDCI delivers new state-of-the-art results on two popular benchmark datasets, i.e., MultiDomainSentiment (hereafter MDS – for cross-domain adaptation) and Webis-CLS-10 (for cross-lingual adaptation). These results represent a clear improvement over the ones originally obtained with JaDCI and presented in Moreo et al. (2016a), and also over the ones obtained by recent high-performing methods that have appeared Ganin et al. (2016); Li et al. (2017); Xu and Yang (2017); Zhou et al. (2016), or that we have become aware of Yang et al. (2015), after DCI was originally proposed. Section 4 concludes, hinting at future developments.
We make PyDCI publicly available via
2 Implementation Changes
For reasons of brevity we do not re-explain DCI from scratch; we refer the interested reader to (Moreo et al., 2018) for a concise description, or to (Moreo et al., 2016a) for the full-blown presentation.
The main modifications that PyDCI introduces with respect to JaDCI are the following:
Document Standardization: In DCI, feature vectors and document vectors (i.e., the vectors that represent the features and the vectors that represent the documents, respectively) are post-processed via L2-normalization. In (Moreo et al., 2016a) we had witnessed improvements when applying standardization to the feature vectors (i.e., translating and scaling each dimension so that it is approximatelly normally distributed in – see (Moreo et al., 2016a, p. 144)). In PyDCI we give the user the option to apply standardization also to each dimension of the document vectors before training the classifier. All experiments we report in this paper are run with this option activated.
Classifier Optimization: In PyDCI we use scikit-learn’s implementation of linear SVMs (LinearSVC, which is in turn based on the liblinear package
3), instead of using Joachims’ SVM package 4as we had done in JaDCI. This allows us to leverage scikit-learn’s GridSearchCV utility in order to optimize SVM’s parameter (which determines the trade-off between training error and the margin) via grid search optimization, which allows us to effortlessly tune the classifier. In the new experiments using PyDCI we let parameter range in , while in the JaDCI experiments we had simply relied on the default value that SVM attributes to .
Increase in the Number of Pivots: We increase the number of pivots from 100 (the value we had used in (Moreo et al., 2016a)) to 1,000 in the cross-domain experiments and to 450 in the cross-lingual experiments. This brings about a significant improvement in performance, that does not come at a significant cost in execution time (as instead had happened with the previous implementation). We limit the number of pivots to 450 in the cross-lingual case (instead of 1,000) since in this case each pivot requires a translation to the target language
5which is assumed to have a cost; we thus set the number of pivots to 450 as was done in previous research (e.g., in (Prettenhofer and Stein, 2010)). We discuss below in more detail the impact on performance that the variation in the number of pivots has.
We should also mention that PyDCI relies on scikit-learn (while JaDCI relied on JaTeCS) for many preprocessing-related aspects (e.g., term weighting), which also may cause some (hard to track) differences in performance with respect to JaDCI.
3.1 Effectiveness on Cross-Domain Classification and Cross-Lingual Classification
In this section we report the results we have obtained in
rerunning with PyDCI the same experiments we had run with JaDCI,
and whose results had been reported in (Moreo et al., 2016a). The
datasets we use are arguably the most popular benchmarks in the domain
adaptation literature, i.e., MDS (Blitzer et al., 2007) for
Tables 1 and 2 show the values of classification accuracy (i.e., the fraction of correctly classified documents) we obtain for cross-domain and cross-lingual classification experiments, respectively. We focus on Linear and Cosine (columns 9-10), two parameter-free probabilistic and kernel-based distributional correspondence functions (DCFs) investigated in (Moreo et al., 2016a). For each such DCF we show a direct comparison against the values we had obtained with JaDCI (Columns 7-8). We also report two baselines:
Lower (Column 3), a classifier that directly trains on the “source” training examples and tests on the “target” unlabeled examples without performing any sort of adaptation at all. Such a classifier should thus act as a lower bound for any reasonable adaptation endeavour.
Upper (Column 4), a classifier that trains on the “target” training examples and tests on the “target” unlabeled examples without performing any sort of adaptation at all.
8Such a classifier should thus act as an upper bound for any reasonable adaptation endeavour.
The baselines use exactly the same learner we use for PyDCI (LinearSVC with the parameter optimized via grid search). For each (problem, dataset) pair we also report the accuracy obtained by what, to the best of our knowledge, is today the best-performing known method on this (problem, dataset) pair (Column 5 – labelled as “SOTA”, which stands for “State Of The Art” – reports the name of the method and Column 6 reports the accuracy score, taken from the original paper). Boldface indicates the best score for each (problem, dataset) pair; shadowed cells indicate the PyDCI scores that outperform the best-known results.
Note that, aside from SDA (Glorot et al., 2011), all the baselines in the “SOTA” column had not been used as baselines in our original work on DCI; the reason is that these methods were published after DCI appeared in print (Ganin et al., 2016; Li et al., 2017; Xu and Yang, 2017; Zhou et al., 2016), or that we were unaware of them (Yang et al., 2015).
|DVD||0.807||0.850||SDA (Glorot et al., 2011)||0.844||0.808||0.817||0.803||0.823|
|Electronics||0.734||0.871||AMN (Li et al., 2017)||0.808||0.810||0.822||0.837||0.837|
|Books||Kitchen||0.774||0.907||DANN (Ganin et al., 2016)||0.843||0.834||0.835||0.851||0.843|
|Books||0.790||0.839||DANN (Ganin et al., 2016)||0.825||0.825||0.824||0.832||0.835|
|Electronics||0.757||0.871||CDFL (Yang et al., 2015)||0.809||0.822||0.824||0.839||0.855|
|DVD||Kitchen||0.778||0.907||DANN (Ganin et al., 2016)||0.849||0.858||0.864||0.853||0.856|
|Books||0.716||0.839||AMN (Li et al., 2017)||0.780||0.766||0.764||0.796||0.800|
|DVD||0.745||0.850||DANN (Ganin et al., 2016)||0.781||0.768||0.774||0.787||0.801|
|Electronics||Kitchen||0.859||0.907||SDA (Glorot et al., 2011)||0.902||0.864||0.868||0.871||0.878|
|Books||0.737||0.839||AMN (Li et al., 2017)||0.793||0.783||0.790||0.779||0.807|
|DVD||0.746||0.850||CDFL (Yang et al., 2015)||0.876||0.788||0.799||0.795||0.806|
|Kitchen||Electronics||0.840||0.871||SDA (Glorot et al., 2011)||0.872||0.855||0.858||0.853||0.860|
|Average||0.773||0.867||AMN (Li et al., 2017)||0.814||0.815||0.820||0.825||0.833|
|German||Books||0.523||0.863||BiDRL (Zhou et al., 2016)||0.841||0.798||0.827||0.846||0.850|
|DVD||0.562||0.837||BiDRL (Zhou et al., 2016)||0.841||0.826||0.822||0.841||0.837|
|Music||0.558||0.849||BiDRL (Zhou et al., 2016)||0.847||0.844||0.856||0.865||0.852|
|French||Books||0.558||0.844||BiDRL (Zhou et al., 2016)||0.844||0.746||0.842||0.834||0.816|
|DVD||0.537||0.843||BiDRL (Zhou et al., 2016)||0.836||0.823||0.827||0.835||0.851|
|Music||0.566||0.876||CLDFA (Xu and Yang, 2017)||0.833||0.816||0.844||0.824||0.842|
|Japanese||Books||0.498||0.802||CLDFA (Xu and Yang, 2017)||0.774||0.779||0.758||0.796||0.790|
|DVD||0.500||0.814||CLDFA (Xu and Yang, 2017)||0.805||0.822||0.801||0.830||0.802|
|Music||0.509||0.834||BiDRL (Zhou et al., 2016)||0.788||0.826||0.839||0.811||0.838|
|Average||0.534||0.840||BiDRL (Zhou et al., 2016)||0.813||0.809||0.824||0.831||0.831|
PyDCI outperforms JaDCI in most cases, and outperforms also the best-performing method in the literature, which is not always the same for each (problem,dataset) pair, with very few exceptions. PyDCI obtains 7 out of 13 best results on MDS (including best averaged accuracy) when equipped with the Cosine DCF, and 5 out of 10 best results in Webis-CLS-10 when using the Linear DCF (including best averaged accuracy). In agreement with with (Moreo et al., 2016a), Cosine proved the best performing DCF, yielding the best results overall and surpassing the best accuracy obtained by any other method in 17 cases out of 23 (across the two datasets, and also including the average results). With respect to the previously best-performing system, PyDCI(Cosine) brings about a reduction in error of +10.2% on MDS and +9.6% on Webis-CLS-10.
On the very same (problem,dataset) pairs we have also run experiments
in order to evaluate the impact of modifications
1 (Document Standardization) and
2 (Classifier Optimization) mentioned in Section
2. Concerning document standardization, we have rerun
all the PyDCI experiments described in Tables 1 and
2 without applying document standardization. The results
are reported in the first two rows of Table 3,
and indicate, on average, a relative improvement in accuracy of +0.2%
on MDS and +8.6% on Webis-CLS-10; document standardization thus
appears to be clearly beneficial. Concerning classifier optimization,
we have rerun all the PyDCI experiments described in Tables
1 and 2 without applying classifier
optimization. The results are reported in the last two rows of Table
3, and indicate, on average, a relative
improvement in accuracy of +9.7% on MDS and +11.8% on Webis-CLS-10;
also classifier optimization is thus (unsurprisingly) clearly
3.2 Effectiveness on Cross-Domain Cross-Lingual Classification
Table 4 reports classification accuracy
values obtained in the domain adaptation setting proposed in
(Moreo et al., 2016a), in which both domain and language differ
between the source and target (i.e., when the classification task is
simultaneously cross-domain and cross-lingual). In Table
4 we include the results we had obtained in
(Moreo et al., 2016a) for the Cross-Lingual Structural Correspondence
Learning (SCL) method (Prettenhofer and Stein, 2010) (which we use here
as a baseline), using its authors’
The results in Table 4 confirm the superiority of PyDCI over JaDCI. In this case, though, the differences in performance between the “Cosine” counterparts is less pronounced. Between the PyDCI variants, Linear performs slightly better than Cosine.
3.3 Statistical Significance
We have subjected our experiments to thorough statistical significance testing, by running a two-tailed t-test on paired examples across all runs (cross-domain and/or cross-lingual). The test reveals that the PyDCI versions of Linear and Cosine outperform, in a statistically significant sense, the corresponding JaDCI versions (at a confidence level of ).
One important aspect of DCI in general, and of PyDCI in
particular, is its efficiency. Figure 1 reports the
computation times we have recorded
3.5 Effectiveness vs. Efficiency Trade-off
In this section we analyse the trade-off between effectiveness (in terms of classification accuracy) and time efficiency (in terms of seconds). In this experiment, we vary the number of pivots in the range . For the Webis-CLS-10 we bound this range to pivots since, for some tasks it was impossible to extract more than pivots. Figure 2 shows the average accuracy (left) and computation times for MDS and Webis-CLS-10.
As increases, PyDCI surpasses the best average accuracy reported for any other method in both datasets. In particular, and in accordance with (Moreo et al., 2016a), PyDCI equipped with the Cosine DCF does so with only 100 pivots. In this case, and in contrast with JaDCI, classification accuracy increases noticeably when more pivots are taken into account; this might be a side effect of the modifications discussed in Section 2. In any case, the method seems to reach a plateau for higher values of , allowing the Cosine variant to reach new peaks of classification accuracy of 0.839 (when ) in MDS, and 0.840 (when ) in Webis-CLS-10. Regarding the efficiency of the method, PyDCI exhibits a quasi-linear trend in time complexity, e.g., when the number of pivots is doubled, the execution time is roughly doubled too.
We have presented PyDCI, a (Python-based) revision of our previous (Java-based) implementation of DCI. This new implementation incorporates changes that, although subtle, nonetheless allow the method to deliver improved results that outperform the currently known best-performing methods. The efficiency tests we have carried out speak clearly about the efficiency of PyDCI, which requires roughly half a minute to undertake any of the domain adaptation tasks in our experiments.
In a preliminary study DCI was also tested in transductive scenarios (Moreo et al., 2016b). PyDCI does not support transductive classification; this is something we plan to address in the near future.
- A word translator oracle is used to automate the translation work in the experiments, following the indications of (Prettenhofer and Stein, 2010) and the bilingual dictionaries they released.
- In MDS there is only one labelled set available for each domain (see (Blitzer et al., 2007)). In this case we report the accuracy of a 5-fold cross-validation on the test set.
- Note that the results obtained by PyDCI without document standardization and classifier optimization are different from the ones obtained by JaDCI, the main reason being that SVM and LinearSVC choose different default parameters for their SVM learner.
- The experiments were run on a machine equipped with a 8-core processor AMD FX-8350 at 4GHz with 32 GB of RAM under Ubuntu 16.04 (LTS).
- John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), pages 440–447, Prague, CZ, 2007.
- Andrea Esuli and Alejandro Moreo. Distributional correspondence indexing for cross-language text categorization. In Proceedings of the 37th European Conference on Information Retrieval (ECIR 2015), pages 104–109, Wien, AT, 2015.
- Andrea Esuli, Tiziano Fagni, and Alejandro Moreo. JaTeCS: An open-source Java Text Categorization System. arXiv preprint arXiv:1706.06802, 2017.
- Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
- Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pages 513–520, Bellevue, US, 2011.
- Zheng Li, Yu Zhang, Ying Wei, Yuxiang Wu, and Qiang Yang. End-to-end adversarial memory network for cross-domain sentiment classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI 2017), pages 2237–2243, Melbourne, AU, 2017.
- Alejandro Moreo, Andrea Esuli, and Fabrizio Sebastiani. Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification. Journal of Artificial Intelligence Research, 55:131–163, 2016a.
- Alejandro Moreo, Andrea Esuli, and Fabrizio Sebastiani. Transductive distributional correspondence indexing for cross-domain topic classification. In Proceedings of the 7th Italian Information Retrieval Workshop (IIR 2016), Venezia, IT, 2016b.
- Alejandro Moreo, Andrea Esuli, and Fabrizio Sebastiani. Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification (Extended Abstract). In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), pages 5647–5651, Stockholm, SE, 2018. doi: 10.24963/ijcai.2018/802.
- Peter Prettenhofer and Benno Stein. Cross-language text classification using structural correspondence learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), pages 1118–1127, Uppsala, SE, 2010.
- Peter Prettenhofer and Benno Stein. Cross-lingual adaptation using structural correspondence learning. ACM Transactions on Intelligent Systems and Technology (TIST), 3(1):13, 2011.
- Ruochen Xu and Yiming Yang. Cross-lingual distillation for text classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1415–1425, 2017.
- Xiaoshan Yang, Tianzhu Zhang, and Changsheng Xu. Cross-domain feature learning in multimedia. IEEE Transactions on Multimedia, 17(1):64–78, 2015.
- Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. Cross-lingual sentiment classification with bilingual document representation learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1403–1412, 2016.