Dynamic Mode Decomposition based feature for Image Classification
Irrespective of the fact that Machine learning has produced groundbreaking results, it demands an enormous amount of data in order to perform so. Even though data production has been in its all-time high, almost all the data is unlabelled, hence making them unsuitable for training the algorithms. This paper proposes a novel method111All codes and datasets used in this paper are available at :
https://github.com/rahulvigneswaran/Dynamic-Mode-Decomposition-based-feature-for-Image-Classification of extracting the features using Dynamic Mode Decomposition (DMD). The experiment is performed using data samples from Imagenet. The learning is done using SVM-linear, SVM-RBF, Random Kitchen Sink approach (RKS). The results have shown that DMD features with RKS give competing results.
The human race is generating more data than any point of time in history. Still, they are unsuitable for training an algorithm because almost all the data generated are either biased or unlabelled. Due to these reasons, data scientists are being restricted to use only a minuscule portion of the generated data, that has been preprocessed and cleaned. That makes the almost 99% of the generated data unusable and on the other hand, today’s state-of-the-art Deep learning algorithms are designed to be data thirsty in its training stage. Recent days, scientist have started designing architectures that can learn the distribution with less data, unlike their counterparts. Largely, these new waves of algorithms can achieve this by the following ways,
[1, 3] comes under the category of generative models of data. As the name suggests,  make use of the generative a classifier through a generative model by iterative Expectation-Maximization (EM) techniques, a variant of Deterministic Annealing whole and  use the unlabelled data to make the synthetically generated labelled data, less synthetic by use of Generative Adversarial Nets (GANs). [5, 6] uses a method called co-training (When a set of data is divided into parts by nature and this trait is exploited by algorithms, they are categorized into co-training) where  finds the weak indicator from labelled data and finds the corresponding unlabeled data to strengthen it. Like the methods discussed so far, there are several techniques used for learning with limited labelled data. Table I gives a detailed summary of methods from each category mentioned previously.
Section I gives a brief introduction on the existing methods for limited labelled data learning, Section II provides an elaborate explanation of the concepts used in the proposed approach like Dynamic Mode Decomposition (DMD) and Random Kitchen Sink (RKS) algorithm. Section III details the proposed approach and Section IV elaborates the obtained results and draws the underlying commonalities which are interesting. Finally Section V gives an essence of the proposed approach’s findings and concludes with the future scope of this research.
|Semi-Supervised||Generative Models of Data||-||Limited Labelled + Unlabelled|||
|Synthetic-Labelled + Real-Unlabelled|||
|Co-Training||Limited Labelled + Unlabelled||[5, 6]|
|Low-Density Separation||Transductive Learning||Labelling the Unlabelled using Labelled|||
|Graph-Based||Label Propagation||Labelling the Unlabelled using Labelled||[9, 10]|
|Weak Supervision||Noisy Labels||Relation Extraction||Heuristic labelling of Completely Unlabelled data|||
|Generative Models of Labels||Relation Extraction||Removing Wrong labels from the Heuristic labelling of Completely Unlabelled data|||
|-||Limited Labelled + Large Weakly Labelled|||
|Labelling of Unlabelled data|||
|Error reduction of Labelled data|||
|Biased Labels||PU-Learning||Positive and Unlabelled data|||
|Feature Annotation||NA||Use Labelled features|||
|Active Learning||-||-||Human labels the required unlabelled data|||
|-||Inductive Learning||Transfering model|||
|Multi-Task Learning||-||Inductive Learning||Limited Labelled data||[26, 27]|
|Few-shot Learning||-||-||Limited Labelled data||[28, 29]|
|Data Augmentation||-||-||Increase the Labelled data count||[31, 32]|
|Reinforcement Learning||-||Apprenticeship Learning||Learning directly from the Expert without the need for any dataset|||
|-||Policy Shaping||Modifying policy in realtime by getting advice from a human|||
Ii Materials and Methods
The dataset used for benchmarking is the Tiny Imagenet Dataset which is a miniature version of the Imagenet Dataset. It contains 200 classes and each class contains 500 Images each. Each Image is 64x64 pixels in size.
Ii-B Dynamic Mode Decomposition (DMD)
It’s a way of extracting the underlying dynamics of a given data that flows with time. It is a very powerful tool for analysing the dynamics of non-linear systems and was developed by Schmid . It is also used for forecasting , natural language processing , salient region detection from images , etc. It was inspired by and closely related to Koopman-operator analysis . The popularity gained by DMD in the fluids community is majorly due to its ability to provides information about the dynamics of flow, even when those dynamics are inherently non-linear. In short, DMD is a method driven by data, free from the equation which has the capability of providing a precise decomposition of a system which is highly complex into respective coherent spatio-temporal structures, that can be fashioned for predicting for few timestamps into the future. A typical DMD algorithm involves the following enumerations,
Compute Singular Value Decomposition (SVD) of as,
There , , , J refers to the SVD approximation of which is reduced.
Compute matrix C from,
Compute similar matrix of which is by,
Compute the Eigen Decomposition of by,
Pre-multiply by on both sides,
There, is the eigen decomposition.
Compute the Dynamic Modes matrix by,
Ii-C Random Kitchen Sink algorithm
The aim of Random Kitchen Sinks (RKS) algorithm and the methods similar to it, is not to perform inference but rather aim at overcoming the limitations of other kernel-based algorithms.
Kernel-based algorithms perform well in almost all the settings but heavily depend on matrix manipulation. If a matrix is then naively the computation cost is which bottlenecks them to applications that have limited samples. One of the general ways to overcome this limitation is by use of low-rank methods (even though other approaches like Bayesian committee machines and Kronecker based methods exist).
Random Fourier features  aims at sampling subset components of kernel Fourier to generate low-rank-approximations of kernels that are invariant to shifts. Due to the reason that the Fourier spaces are invariant to shift, this property is not changed. But now a kernel Hilbert space which is reproduced by a finite dimensional space by these Fourier components’ union. As a result, the RKS which was infinite dimensional once is approximated by the degenerate approximate kernel.
The epitome of supervised machine learning approaches are to obtain the knowledge of an approximate function which can map the input variable to output variable (i.e.) . The idea of finding such a function is that when a new data comes, the function can predict the corresponding output. In real-world applications, the input data can be image, 1-D signal, text data etc and the output will be the corresponding labels. The learning of mapping function often involves finding the best parameters for the function to get the maximum performance. Kernel methods are the best examples of supervised approaches which extensively used for several machine learning problems. It requires to compute a Kernel matrix ( signifies the count of the input vectors). However, the above mentioned computation suffers badly when the dataset size is large. There has been an effort to reduce the dimension of the Kernel matrix using smart sample selection , Eigen decomposition via Nystrom , low-rank approximations . In [42, 43], the authors proposed an alternative approach via randomization, known as Random Kitchen Sinks (RKS) algorithm, to compute the Kernel matrix even when the dataset size becomes large. The idea is to provide an approximate kernel function via explicit mapping
Here, denotes the implicit mapping function (used to compute kernel matrix) and denotes the explicit mapping function. The RKS method approximates the kernel trick [44, 45]. This explicit mapping function can be written as [46, 47, 48].
Iii Proposed Approach
As mentioned earlier in Section II-B, DMD can be used in data that can flow in time. Contrary to that, the images that we are using are static in nature. Therefore flow is induced  to the image as shown in Figure 1, by extracting the different bands of colours by converting it into Lab colour space and permutation of luminescence bands and colour bands into a single matrix. After applying DMD, the sparse and low-rank components are extracted and normalized for being used as features.
These extracted features capture the underlying dynamics of the image. Therefore, these are then given as input to the Random Kitchen Sink Algorithm (Section II-C) for classification.
Iv Results and Discussion
After obtaining the features, they are given as input to the Random Kitchen Sink algorithm and Support Vector Machine (SVM) with various configurations. Tables II, III, IV contain the accuracies of the proposed approach under various configurations when classified by Random Kitchen Sink algorithm, SVM-rbf and SVM-linear respectively. Column 1 represents the percentage of the total dataset used for setting, column 2 represents the count of eigenvalues taken into consideration for reconstruction of the features, column 3 denotes the type of class (Distinctive - Classes that are easily differentiable from one another; Overlapped - Classes that have overlapping features), and column 4 represents the accuracy of the corresponding configuration. All the accuracies corresponding to their configurations are plotted in Figure 4. It is evident from it that the Random Kitchen Sink Algorithm (RKS), constantly comes on top as compared to the other two algorithms.
|Type of Data||
|Type of Data||Kernel||
|Type of Data||Kernel||
|Test Data (in %)||:||60|
|Number of Eigen Values||:||5|
|Type of Data||:||Distinctive|
The reason why the accuracy tops when the number of Eigenvalues is “5” can be explained through Figure 3. After the Eigenvalue index of “5”, the Eigenvalues cease to change and doesn’t contribute much to the underlying dynamics of the image. After the features are extracted, it is given as input for RKS in which it is mapped from 640 to 500.
Figure 5 shows the reconstructed image with low-rank and sparse matrix with 5 Eigenvalues which clearly captures the skeleton of the image. This extraction of the skeletal structure is the dynamics captured by applying DMD for feature extraction. t-SNE plot in Figure 2 provides a better picture of how the extracted DMD features of images arrange themselves distinctive groups flawlessly. Figure 6 and Figure 7 are the t-SNE plots of 3 classes before and after applying the proposed technique. It is evident from them that, the proposed approach is promising and effective.
The present approach is a novel one and needs more applications-oriented experimental evaluations. There are certain cases like in Figure 8, which has a complex background and where it is difficult to differentiate the foreground of the object of interest from the background, the proposed approach fails to perform. Apart from cases like in Figure 8, the proposed approach is proven to be effective in learning with limited labelled data.
As the world’s data generation explodes and due to the reason that manual labelling is highly expensive, it is necessary to develop and explore machine learning architectures that can classify with limited labelled data. The proposed approach in this paper has provided a novel direction in which Dynamic Mode Decomposition based feature can be used in conjunction with a classifier for achieving competing results. As a future scope of this research, the shortcomings of the current proposed architecture can be solved and extended to fast paced applications like Intrusion Detection Systems [51, 52] and Data-driven solvers where the data is limited and the classifier must be re-trained frequently on the go.
-  Nigam, K., McCallum, A., & Mitchell, T. (2006). Semi-supervised text classification using EM. Semi-Supervised Learning, 33-56.
-  Cohen, I., & Cozman, F. G. (2006). Risks of semi-supervised learning: how unlabeled data can degrade performance of generative classifiers.
-  Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2107-2116).
-  Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
-  Blum, A., & Mitchell, T. (1998, July). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory (pp. 92-100). ACM.
-  Seeger, M. (2000). Input-dependent regularization of conditional density models (No. REP_WORK).
-  Nigam, K., & Ghani, R. (2000, November). Analyzing the effectiveness and applicability of co-training. In Cikm (Vol. 5, p. 3).
-  Joachims, T. (1999, June). Transductive inference for text classification using support vector machines. In Icml (Vol. 99, pp. 200-209).
-  Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation (p. 1). Technical Report CMU-CALD-02-107, Carnegie Mellon University.
-  Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
-  Saul, L. K., Weinberger, K. Q., Ham, J. H., Sha, F., & Lee, D. D. (2006). Spectral methods for dimensionality reduction. Semisupervised learning, 293-308.
-  Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning?. Journal of Machine Learning Research, 11(Feb), 625-660.
-  Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
-  Mintz, M., Bills, S., Snow, R., & Jurafsky, D. (2009, August). Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2 (pp. 1003-1011). Association for Computational Linguistics.
-  Takamatsu, S., Sato, I., & Nakagawa, H. (2012, July). Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 (pp. 721-729). Association for Computational Linguistics.
-  Urner, R., David, S. B., & Shamir, O. (2012, March). Learning from weak teachers. In Artificial intelligence and statistics (pp. 1252-1260).
-  Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., & Ré, C. (2017). Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment, 11(3), 269-282.
-  Dawid, A. P., & Skene, A. M. (1979). Maximum likelihood estimation of observer error‐rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1), 20-28.
-  Liu, B., Dai, Y., Li, X., Lee, W. S., & Philip, S. Y. (2003, November). Building Text Classifiers Using Positive and Unlabeled Examples. In ICDM (Vol. 3, pp. 179-188).
-  Druck, G., Mann, G., & McCallum, A. (2008, July). Learning from labeled features using generalized expectation criteria. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 595-602). ACM.
-  Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of artificial intelligence research, 4, 129-145.
-  Lowell, D., Lipton, Z. C., & Wallace, B. C. (2018). How transferable are the datasets collected by active learners?. arXiv preprint arXiv:1807.04801.
-  Siddhant, A., & Lipton, Z. C. (2018). Deep Bayesian active learning for natural language processing: Results of a large-scale empirical study. arXiv preprint arXiv:1808.05697.
-  Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10), 1345-1359.
-  Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
-  Augenstein, I., & Søgaard, A. (2017). Multi-task learning of keyphrase boundary classification. arXiv preprint arXiv:1704.00514.
-  Caruana, R. (1997). Multitask learning. Machine learning, 28(1), 41-75.
-  Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, 28(4), 594-611.
-  Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. (2013). Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems (pp. 935-943).
-  Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2018). Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence.
-  Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2018). Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501.
-  Ratner, A. J., Ehrenberg, H., Hussain, Z., Dunnmon, J., & Ré, C. (2017). Learning to compose domain-specific transformations for data augmentation. In Advances in neural information processing systems (pp. 3236-3246).
-  Abbeel, P., & Ng, A. Y. (2004, July). Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning (p. 1). ACM.
-  Griffith, S., Subramanian, K., Scholz, J., Isbell, C. L., & Thomaz, A. L. (2013). Policy shaping: Integrating human feedback with reinforcement learning. In Advances in neural information processing systems (pp. 2625-2633).
-  Schmid, P. J. (2010). Dynamic mode decomposition of numerical and experimental data. Journal of fluid mechanics, 656, 5-28.
-  Rowley, C. W., Mezić, I., Bagheri, S., Schlatter, P., & Henningson, D. S. (2009). Spectral analysis of nonlinear flows. Journal of fluid mechanics, 641, 115-127.
-  Rahimi, A., & Recht, B. (2008). Random features for large-scale kernel machines. In Advances in neural information processing systems (pp. 1177-1184).
-  Sikha, O. K., Kumar, S. S., & Soman, K. P. (2018). Salient region detection and object segmentation in color images using dynamic mode decomposition. Journal of Computational Science, 25, 351-366.
-  Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6(Sep), 1579-1619.
-  Kumar, S., Mohri, M., & Talwalkar, A. (2012). Sampling methods for the Nyström method. Journal of Machine Learning Research, 13(Apr), 981-1006.
-  Fine, S., & Scheinberg, K. (2001). Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2(Dec), 243-264.
-  Rahimi, A., & Recht, B. (2008, September). Uniform approximation of functions with random bases. In 2008 46th Annual Allerton Conference on Communication, Control, and Computing (pp. 555-561). IEEE.
-  Pavy, A., & Rigling, B. (2018). SV-Means: A fast SVM-based level set estimator for phase-modulated radar waveform classification. IEEE Journal of Selected Topics in Signal Processing, 12(1), 191-201.
-  Schölkopf, B., Smola, A. J., & Bach, F. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
-  Hofmann, T., Schölkopf, B., & Smola, A. J. (2008). Kernel methods in machine learning. The annals of statistics, 1171-1220.
-  Kumar, S. S., Premjith, B., Kumar, M. A., & Soman, K. P. (2015, December). AMRITA_CEN-NLP@ SAIL2015: sentiment analysis in Indian Language using regularized least square approach with randomized feature learning. In International Conference on Mining Intelligence and Knowledge Exploration (pp. 671-683). Springer, Cham.
-  Athira, S.; Harikumar, K.; Sowmya, V.; Soman, K.P., Parameter analysis of random kitchen sink algorithm, IJAER, Volume 10, Issue 20, Number 20, p.19351-19355 (2015)
-  Thara, S., & Krishna, A. (2018, September). Aspect Sentiment Identification using random Fourier features. In International Journal of Intelligent Systems and Applications (IJISA).
-  Mohan, N., Soman, K. P., & Kumar, S. S. (2018). A data-driven strategy for short-term electric load forecasting using dynamic mode decomposition model. Applied energy, 232, 229-244.
-  Kumar, S. S., Kumar, M. A., Soman, K. P., & Poornachandran, P. (2020). Dynamic Mode-Based Feature with Random Mapping for Sentiment Analysis. In Intelligent Systems, Technologies and Applications (pp. 1-15). Springer, Singapore.
-  Rahul, V. K., Vinayakumar, R., Soman, K. P., & Poornachandran, P. (2018, July). Evaluating Shallow and Deep Neural Networks for Network Intrusion Detection Systems in Cyber Security. In 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-6). IEEE.
-  Rahul-Vigneswaran, K., Poornachandran, P., & Soman, K.P. (2019). A Compendium on Network and Host based Intrusion Detection Systems. CoRR, abs/1904.03491.
-  Rahul-Vigneswaran, K., Mohan, N., & Soman, K.P. (2019). Data-driven Computing in Elasticity via Chebyshev Approximation. CoRR, abs/1904.10434.