Fsl-Bm: Fuzzy Supervised Learning with Binary Meta-Feature for Classification
This paper introduces a novel real-time Fuzzy Supervised Learning with Binary Meta-Feature (FSL-BM) for big data classification task. The study of real-time algorithms addresses several major concerns, which are namely: accuracy, memory consumption, and ability to stretch assumptions and time complexity. Attaining a fast computational model providing fuzzy logic and supervised learning is one of the main challenges in the machine learning. In this research paper, we present FSL-BM algorithm as an efficient solution of supervised learning with fuzzy logic processing using binary meta-feature representation using Hamming Distance and Hash function to relax assumptions. While many studies focused on reducing time complexity and increasing accuracy during the last decade, the novel contribution of this proposed solution comes through integration of Hamming Distance, Hash function, binary meta-features, binary classification to provide real time supervised method. Hash Tables (HT) component gives a fast access to existing indices; and therefore, the generation of new indices in a constant time complexity, which supersedes existing fuzzy supervised algorithms with better or comparable results. To summarize, the main contribution of this technique for real-time Fuzzy Supervised Learning is to represent hypothesis through binary input as meta-feature space and creating the Fuzzy Supervised Hash table to train and validate model.
I Introduction and Related Works
Big Data Analytics has become feasible as well as recent powerful hardware, software, and algorithms developments; however, these algorithms still need to be fast and reliable . The real-time processing, stretching assumptions and accuracy till remain key challenges. Big Data Fuzzy Supervised Learning has been the main focus of latest research efforts . Many algorithms have been developed in the supervised learning domain such as Support Vector Machine (SVM) and Neural Networks. Deep Learning techniques such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Deep Neural Networks (DNN), and Neural Networks (NN) are inefficient for fuzzy classification tasks in binary feature space[3, 4], but Deep learning could be very efficient for multi-class classification task . In fuzzy Deep neural networks, the last layer of networks (output layer) is activated by Boolean output such as sigmoid function. Their limitation was demonstrated in their inability to produce reliable results for all possible outcomes. Time complexity, memory consumption, the accuracy of learning algorithms and feature selection remained as four critical challenges in classifier algorithms.
The key contribution of this study is providing a solution that addresses all four critical factors in a single robust and reliable algorithm while retaining linear processing time.
Computer science history in the field of machine learning has been shown significant development particularly in the area of Supervised Learning (SL) applications . Many supervised learning applications and semi-supervised learning algorithms were developed with Boolean logic rather than using Fuzzy logic; and therefore, these existing methods cannot cover all possible variations of results. Our approach offers an effective Fuzzy Supervised Learning (FSL) algorithm with a linear time complexity. Some researchers have attempted to contribute in their approach to Fuzzy Clustering and utilizing more supervised methods than unsupervised. Work done in 2006 and in 2017, [7, 8] provided new algorithm with Fuzzy logic implemented in Support Vector Machine (SVM), which introduced a new fuzzy membership function for nonlinear classification. In the last two decades, many research groups focused on Neural Networks using Fuzzy logic  or neuro-fuzzy systems , and they used several hide layer and . In 1992, Lin and his group worked on the Fuzzy Neural Network (FNN). However, their contribution is besed on outlined in the back-propagation algorithm and real time learning structure . Our work focuses on approach of mathematical modeling of binary learning with hamming distance applied to supervised learning.
Between 1979 and 1981, NASA111The National Aeronautics and Space Administration developed Binary Golay Code (BGC) as an error correction technique by using the hamming distance [12, 13]. The 1969 goal of these research projects was an error correction using Golay Code for communication between the International Space Station and Earth. Computer scientists and electrical engineers used fuzzy logic techniques for Gilbert burst-error-correction over radio communication [14, 15]. BGC utilizes 24 bits, however, a perfected version of the Golay Code algorithm works in a linear time complexity using 23 bits , . The algorithm used and implemented in this research study was inspired by the Golay Code clustering hash table [18, 19, 17, 20]. This research offers two main differences and improvements: i) it works with features whereas Golay code has a limitation of 23 bits ii) our method utilizes supervised learning while Golay Code is an unsupervised algorithm which basically is a Fuzzy Clustering method. The Golay code generate hash table with six indices for labelling Binary Features (BF) as fuzziness labeled but FSL-BM is supervised learning is induced techniques of encoding and decoding into two labels or sometimes fuzzy logics classifiers by using probability or similarity. Between 2014 and 2015, the several studies addressed on using the Golay Code Transformation Hash table (GCTHT) in constructing a 23-bit meta-knowledge template for Big Data Discovery which allows for meta-feature extraction for clustering Structured and Unstructured Data (text-based and multimedia) [21, 19]. In 2015, according to , FuzzyFind Dictionary (FFD), is generated by using GCTHT and FuzzyFind dictionary is improved from % 86.47 (GCTHT) to %98.2 percent  .In this research our meta-features and feature, selection are similar to our previous work, which is done by Golay Code Clustering, but now we introduce a new algorithm for more than 23 features. Furthermore, existing supervised learning algorithms are being challenged to provide proper and accurate labeling  for unstructured data.
Nowadays, most large volume data-sets are available for researchers and developers contain data points belonging to more than a single label or target value. Due to the limited time complexity and memory consumption, existing fuzzy clustering algorithm such as genetic fuzzy learning  and fuzzy C-means  aren’t very applicable for Big Data. Therefore, a new method of fuzzy supervised learning is needed to process, cluster, and assign labels to unlabeled data using a faster time complexity, less memory consumption and more accuracy for unstructured datasets. In short, new contributions and the unique features of the algorithms proposed in this paper are an efficient technique of Fuzziness learning, linear time complexity, and finally powerful prediction due to robustness and complexity. The baseline of this paper is as follows: Fuzzy Support Vector Machine(FSVM)  and Original Support Vector Machine(SVM).
This paper is organized with the following topics respectively: section II: Fuzzy Logic for Machine Learning, section III: Pre-Processing including section III-A: Meta-Knowledge. section III-B: Meta-Feature Selection, section IV: Supervised Learning including section IV-A: Pipeline of Supervised Learning by Hamming Distance and how we train our model, and finally, section V) evaluation of model; and finally, section VI: experimental results.
Ii Fuzzy Logic for Machine Learning
Fuzzy logic methods in machine learning are more popular among researchers [26, 27] in comparison to Boolean and traditional methods. The main difference between the Fuzziness method in clustering and classification for both fields of supervised and unsupervised learning is that each data point can be belong to more than one cluster. Fuzzy logic, in our case, is extended to handle the concept of partial truth, where the truth-value may range between completely true  and false . We make the claim that such an approach is suited for the proposed binary stream of data meta-knowledge representation [28, 29], which leads to meta-features. Therefore we apply Fuzzy logic as a comparative notion of truth (or finding the truth) without the need to represent fully the syntax, semantics, axiomatization, truth-preserving deduction, and still reaching a degree of completeness . We extend the many-valued logic [31, 32, 33, 34] based on the paradigm of inference under vagueness where the truth-value may range between completely true (correct outcome, correct label assignment) and false (false outcome, opposite label assignment), and at the same time the proposed method handles partial truth, where the label assignment can be either .
Through an optimization process of discovering meta-knowledge and determining of meta-features, we offer binary output representation as input into a supervised machine learning algorithm process that is capable of scaling. Each unique data point is assigned to a binary representation of meta-feature which is converted consequently into hash keys that uniquely represent the meta-feature presented in the record. In the next step, the applied hash function selects and looks at the supervised hash table to assign an outcome, which is represented by assigning the correct label. The fuzziness is introduced through hash function selection of multiple (fuzzy) hash representations [32, 33, 34].
The necessary fuzziness compensates for the inaccuracy in determining the meta-feature and its representation in the binary data stream. As we represent these meta-features as binary choice of , we provide binary output of classification outcome as through the designation of labels [32, 33, 34]. There must be some number of meta-features () such that a record with n meta-features counts with ”result m” whilst a record with or does not. Therefore, there must be some point where the defined and predicted output (outcome) ceases. Let assert that some number n satisfies the condition â¦nâ¦. Therefore, we can represent the sequence of reasoning as follows,
where can be arbitrarily large. If we paraphrase the above expressions with utilization of Hamming Distance (HD), there must be a number of meta-features such that a record with meta-features counts with result while a records with or does not exist. Whether the argument is taken to proceed by addition or subtraction[35, 36], it completely depends on how one views the series of meta-features . This is the key foundation of our approach that provides background to apply and evaluate many valued truth logics with the standard two value logic (meta-logic), where the truth and false, i.e., yes and no, is represented within the channeled stream of data. The system optimizes (through supervised) training on selection of meta-features to assert fuzziness of logic into logical certainty; thus we are combining the optimization learning methods through statistical learning (meta-features) and logical (fuzzy) learning to provide the most efficient machine learning mechanism [31, 32, 33, 34].
The Figure 1 indicates how fuzzy logics works on supervised learning for two classes. This figure indicates that red circle assigned only in label and blue stars belong to label , but diamond shape there dose not have specific color which means their color is between blue and red. if we have number of classes or categories, the data points can be belonging to different categories.
Where is number of categories, is labels of data points and is percentages of labelling for class .
Regarding the hash function, the order of the features is critical for this learning techniques as a feature space [18, 20, 38]. Therefore, we use a process for feature selection that consists of meta-feature collection, meta-feature learning, and meta-feature selection. The feature that build the meta-knowledge template technique offers unique added value in that it provides clusters of interrelated data objects in a fast and linear time. The meta-knowledge template is a pre-processing technique built with each feature that can be assigned with either a yes or no as binary logics. In other words, given a template called , is a single feature representing a bit along the -bit string. It is good to indicate that developing meta-knowledge is associated with the quality methodology associated with ontology engineering. Ontology aggregates a common language for a specific domain while specifying definitions and relationships among terms. It is also important to indicate that the development of the meta-knowledge template is by no means done randomly. This opportunity seems to be unique and unprecedented. In the following sections, we will explain the process that constitutes the building of the meta-knowledge based on specific feature selections that defines the questions of meta-knowledge.
The definition of meta-knowledge is extracting knowledge from feature representation and also we can define it as perfect feature extraction and pre-selected knowledge from unstructured data [39, 40, 41]. The meta-knowledge or perfect feature extraction allows the deep study of feature for the purpose of more precise knowledge. Meta-knowledge can be utilized in any application or program to obtain more insightful results based on advanced analysis of data points.
n the early 1960s, researchers were challenged to find a solution for a large domain specific knowledge . The goal to collect and utilize knowledge from these large data-repositories has been a major challenge, as a result meta-knowledge systems have been developed to overcome this issue. The problem of those to represent this knowledge remained a research question for researchers to develop. Therefore, our presented approach of meta-knowledge template with -features can significantly provide easiness and speed to process large data sets.
According to , Meta Learning is a very effective technique for solving support data mining. In regression and classification problems Meta Feature (MF) and Meta Learning algorithms have been used for application in the data mining and machine learning domain. It is very important to mention that the results obtained from data mining and machine learning are directly linked to the success of well-developed meta-learning model. In this research, we define Meta-learning as the process which helps the selected features use the right machine learning algorithms to build the meta-knowledge. The combination of machine learning algorithms and the study of pattern recognition allows us to study the meta-features correlations and the selections of the most result/goal-indicating features.
Iv Supervised Learning
There are generally three popular learning methods in the machine learning community: supervised learning, unsupervised learning, and semi-supervised learning. Unsupervised learning or data clustering by creating labels for unlabeled data points such as Golay Code, K-means, weighted unsupervised and etc. [44, 45, 46].
In Supervised Learning, more than 80 percent of the data points are used for training purposes, and the rest of data points will be used for testing purposes or evaluation of the algorithm such as Support Vector Machine (SVM), and Neural Network. Semi-supervised learning uses label generated by supervised learning on part of the data to be able to label the remaining data points [47, 48, 49]. The latter is a combination of supervised and unsupervised learning. Overall, the contribution of this paper is shown in Fig. 2 which is conclude to meta-feature learning in pre-processing step and the input feature is ready for learning algorithm in follows.
Iv-a Pipeline of Supervised Leaning using Hamming Distance
In the pipeline of this algorithm (Fig. 3), all possible combinations of input binary features are created and the algorithm improves the training matrix by using hamming distance and ultimately improves the results by meta-feature selection and meta-knowledge discovery. As show in Figure 1, the algorithm is divided into two main parts; i) the training algorithm, which entails feature selection, hamming distance detection, and updates the training matrix, ii) testing the Hash function which is included in the meta-feature category; the critical feature order that converts the testing input to indices while each index has at least one or more label.
An explicit utilization for all available data points is not feasible. Supervised Hash Table(SHT) is a hash table with elements where is the number of binary feature indices. The SHT elements are created by Hamming Distance of training data sets from zero to . In equation 5 , is the value of Hamming Distance which can be either or more depending on the number of training data points, and is number of features. The segmentation of the stream data sets can be 2032 bits, and is the number of training data points.
Iv-B Data Structure
Data structure is one-implementation criteria for learning algorithms. According to [20, 18] the Golay code, Golay Code Transformation Matrix, Golay Code Clustering Hash Table, FuzzyFind Dictionary, and supervised Hash Table use the Hash table which can be the most efficient method for direct access to data by constant time complexity. Hash table is the most efficient techniques for knowledge discovery and it gives us constant time complexity to have easy and linear access to indices. On the other hand, Hash function can convert any unstructured or structured data input into binary features. This data structure is used in order to reduce computational time complexity in the supervised learning algorithm.
Iv-C Hamming Distance
Hamming Distance (HD) is used to measure the similarity of two binary variables . The Hamming Distance between two Code-word is equal to the number of bits in which they differ; for example: Hamming distance of and is equal to . In the proposed algorithm, we use the values for HD of , , and .This algorithm can Handel larger volume of data using fuzzy logics (depends on the hardware , the algorithm is run on it). The number of bits is represented as binary meta-feature. In our algorithm, generate bits as feature space(e.g. for 32 bits, 4-billon unique data points will be generated). In this paper, we test our algorithm with 24 binary input which means which has nearly 16 million unique records.
Iv-D Generating the Supervised Hash Table
Generating the Supervised Learning Hash Table is a main part of this technique. In this section, the main loop is given from 0 to all training data points for creating all possible results. The calculation of the Hamming Distance can be 2, 3 or even more for a large number of features and small portion of training data points. After calculating the hamming distance, the Supervised Hash Table is updated with labels. Regarding Algorithm 2, the main loop for all training data points, is as follows:
Equation 7 is the hamming distance of features and indicates value of HD. If a label is assigned two or more labels, that vector of Supervised learning Hash Table keeps all of the labels in hash function, meaning the record uses fuzziness labeling.
V Evaluating Model
In Supervised Learning using Hamming Distance techniques, the Supervised hash table during the training part. This Hash table contains all possible feature’s input if enough data used for training. For evaluating the trained model, unlabeled data set can fit in FSL-BM algorithm using binary input, and encode all unlabeled data points in same space of trained model as we discussed in section III. After using Hashing function, the correct indices is assigned to each data point; and finally, unlabeled has assigned by correct label(s). In feature representation, input binary feature converted to hash keys by meta-feature selected hash function and look at the Supervised hash table. Some data points have more than one label, hence fuzziness logics meaning each data point can be belong to more than one label. As presented in Algorithm 3, main loop is from 0 to where is the number of test data points and the maximum number of fuzziness is the maximum number of labels each point can be assigned.
Vi Experimental Results
Although time complexity is one of the most important criteria of evaluating the time consumption, the hardware used for implementation and testing the algorithms is pretty essential. Listed time in Table I and figure 4 experimented with single thread implementation, yet using multiple thread implementation would reduce the time. In addition, the other significant factor is memory complexity which is linear in this case, .
All of the empirical and experimental results of this study shown in Table I is implemented in a single processor. The source code will be released on GitHub and our lab website that implemented C++ and C# framework. C++ and C# are utilized for testing the proposed algorithms with a system core Central Processing Unit (CPU) with 12 GB memory.
|Dataset 1||Dataset 2|
|Accurcy||Fuzzy Measure||Accurcy||Fuzzy Measure|
Vi-a Data Set
we test our algorithm in two ways: first empirical data,The data set which is used in this algorithm has 24 binary features. AS regards to table I, we test our algorithm as following data sets: first data-set includes training data points and validation test. And the second data set include data points as training size and validation test. And also we test accuracy and time complexity with random generated data-set as shown in Fig. 4.
We test and evaluate our algorithm by two ways which are real dataset from and Random Generate Dataset.
Vi-B1 Results of IMDB dataset
Testing a new algorithm with different kinds of datasets is very critical. We test our algorithms and compare our algorithms vs traditional supervised learning methods such as Support Vector Machine (SVM) . The proposed algorithm is validated with two different datasets with 23 binary features. Regarding table I, total accuracy of dataset number 1 with 23 binary feature isz; 93.41%, correct accuracy : 93.1%, Fuzziness accuracy: 92.87%, Fuzziness : 0.23% , Boolean Error : 6.8%, Fuzzy Error: 7.1%, and regarding the second data set : Total accuracy is : 95.59%, correct accuracy : 94.4%, Fuzziness accuracy : 96.87%, Fuzziness : 0.86%, Error: 4.4%. Regarding table I , these results show that Binary Supervised Learning using Hamming Distance has more accurate result in comparison same data set. In first data set, we have 93.41 percent accuracy with FSL-BM while the accuracy in SVM is 89.62 percent and in second data set with 100 training data points, accuracy is 95.59 percent and 90.42 percent.
Vi-B2 Results on random online dataset
Vii Conclusion and Future Works
The proposed algorithm (FSL-BM) is effectively suitable for big data stream, where we want to convert our data points to binary feature. This algorithm can be comparable with other same algorithms such as Fuzzy Support Vector Machine (FSVM), and other methods. In this research paper, we presented a novel technique of supervised learning by using Hamming Distance for finding nearest vector and also using meta-feature, meta-knowledge discovery, and meta-learning algorithms is used for improving the accuracy. Hash table and Hash function are used to improve the computational time and the results indicate that our methods have better accuracy, memory consumption, and time complexity. Fuzziness is another factor of this algorithm that could be useful for fuzzy unstructured data-sets which real data-sets could be classified as fuzzy data, once more reiterating each training data point has more than one label. As a future work, we plan to automate dynamically the number of feature selection process and create meta-feature selection library for public use. This algorithm can be particularly useful for many kinds of binary data points for the purpose of binary big data stream analysis. Binary Features in Fuzzy Supervised Learning is a robust algorithm that can be used for big data mining, machine learning, and any other related field. The authors of this study will have a plan to implement and release the Python, R and Matlab source code of this study, and also optimize the algorithm with different techniques allowing the capability to use it in other fields such as image, video, and text processing.
-  P. Brazdil, C. G. Carrier, C. Soares, and R. Vilalta, Metalearning: Applications to data mining. Springer Science & Business Media, 2008.
-  M. Fatehi and H. H. Asadi, “Application of semi-supervised fuzzy c-means method in clustering multivariate geochemical data, a case study from the dalli cu-au porphyry deposit in central iran,” Ore Geology Reviews, vol. 81, pp. 245–255, 2017.
-  X. Qiu, Y. Ren, P. N. Suganthan, and G. A. Amaratunga, “Empirical mode decomposition based ensemble deep learning for load demand time series forecasting,” Applied Soft Computing, vol. 54, pp. 246–255, 2017.
-  G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
-  K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, , M. S. Gerber, and L. E. Barnes, “Hdltex: Hierarchical deep learning for text classification,” in IEEE International Conference on Machine Learning and Applications(ICMLA). IEEE, 2017.
-  R. A. R. Ashfaq, X.-Z. Wang, J. Z. Huang, H. Abbas, and Y.-L. He, “Fuzziness based semi-supervised learning approach for intrusion detection system,” Information Sciences, vol. 378, pp. 484–497, 2017.
-  X. Jiang, Z. Yi, and J. C. Lv, “Fuzzy svm with a new fuzzy membership function,” Neural Computing & Applications, vol. 15, no. 3-4, pp. 268–276, 2006.
-  S.-G. Chen and X.-J. Wu, “A new fuzzy twin support vector machine for pattern classification,” International Journal of Machine Learning and Cybernetics, pp. 1–12, 2017.
-  C. P. Chen, Y.-J. Liu, and G.-X. Wen, “Fuzzy neural network-based adaptive control for a class of uncertain nonlinear stochastic systems,” IEEE Transactions on Cybernetics, vol. 44, no. 5, pp. 583–593, 2014.
-  P. S. Sajja, “Computer aided development of fuzzy, neural and neuro-fuzzy systems,” Empirical Research Press Ltd., 2017.
-  C. Lin and C. G. Lee, “Real-time supervised structure/parameter learning for fuzzy neural network,” in Fuzzy Systems, 1992., IEEE International Conference on. IEEE, 1992, pp. 1283–1291.
-  T. M. Thompson, From error-correcting codes through sphere packings to simple groups. Cambridge University Press, 1983, no. 21.
-  J. West, “Commercializing open science: deep space communications as the lead market for shannon theory, 1960–73,” Journal of Management Studies, vol. 45, no. 8, pp. 1506–1532, 2008.
-  L. Bahl and R. Chien, “On gilbert burst-error-correcting codes (corresp.),” IEEE Transactions on Information Theory, vol. 15, no. 3, pp. 431–433, 1969.
-  H. Yu, T. Jing, D. Chen, and S. Y. Berkovich, “Golay code clustering for mobility behavior similarity classification in pocket switched networks,” J. of Communication and Computer, USA, no. 4, 2012.
-  U. Rangare and R. Thakur, “A review on design and simulation of extended golay decoder,” International Journal of Engineering Science, vol. 2058, 2016.
-  E. Berkovich, “Method of and system for searching a data dictionary with fault tolerant indexing,” Jan. 23 2007, uS Patent 7,168,025.
-  K. Kowsari, M. Yammahi, N. Bari, R. Vichr, F. Alsaby, and S. Y. Berkovich, “Construction of fuzzyfind dictionary using golay coding transformation for searching applications,” International Journal of Advanced Computer Science & Applications, vol. 1, no. 6, pp. 81–87.
-  N. Bari, R. Vichr, K. Kowsari, and S. Y. Berkovich, “Novel metaknowledge-based processing technique for multimediata big data clustering challenges,” in Multimedia Big Data (BigMM), 2015 IEEE International Conference on. IEEE, 2015, pp. 204–207.
-  K. Kowsari, “Investigation of fuzzyfind searching with golay code transformations,” Master’s thesis, The George Washington University, Department of Computer Science, 2014.
-  N. Bari, R. Vichr, K. Kowsari, and S. Berkovich, “23-bit metaknowledge template towards big data knowledge discovery and management,” in Data Science and Advanced Analytics (DSAA), 2014 International Conference on. IEEE, 2014, pp. 519–526.
-  T. Kamishima and J. Fujiki, “Clustering orders,” in International Conference on Discovery Science. Springer, 2003, pp. 194–207.
-  M. Russo, “Genetic fuzzy learning,” IEEE transactions on evolutionary computation, vol. 4, no. 3, pp. 259–273, 2000.
-  J. C. Bezdek, R. Ehrlich, and W. Full, “Fcm: The fuzzy c-means clustering algorithm,” Computers & Geosciences, vol. 10, no. 2-3, pp. 191–203, 1984.
-  G. Qin, X. Huang, and Y. Chen, “Nested one-to-one symmetric classification method on a fuzzy svm for moving vehicles,” Symmetry, vol. 9, no. 4, p. 48, 2017.
-  R. Wieland and W. Mirschel, “Combining expert knowledge with machine learning on the basis of fuzzy training,” Ecological Informatics, vol. 38, pp. 26–30, 2017.
-  M. J. Prabu, P. Poongodi, and K. Premkumar, “Fuzzy supervised online coactive neuro-fuzzy inference system-based rotor position control of brushless dc motor,” IET Power Electronics, vol. 9, no. 11, pp. 2229–2239, 2016.
-  J. Gama, Knowledge discovery from data streams. CRC Press, 2010.
-  Learning from data streams. Springer, 2007.
-  U. Höhle and E. P. Klement, Non-classical logics and their applications to fuzzy subsets: a handbook of the mathematical foundations of fuzzy set theory. Springer Science & Business Media, 2012, vol. 32.
-  E. N. Zalta et al., “Stanford encyclopedia of philosophy,” 2003.
-  P. Forrest, “The identity of indiscernibles,” 1996.
-  F. Logic, “Stanford encyclopedia of philosophy,” 2006.
-  F. Pinto, C. Soares, and J. Mendes-Moreira, “A framework to decompose and develop metafeatures,” in Proceedings of the 2014 International Conference on Meta-learning and Algorithm Selection-Volume 1201. CEUR-WS. org, 2014, pp. 32–36.
-  J. Cargile, “The sorites paradox,” The British Journal for the Philosophy of Science, vol. 20, no. 3, pp. 193–202, 1969.
-  G. Malinowski, “Many-valued logic and its philosophy,” in The Many Valued and Nonmonotonic Turn in Logic, ser. Handbook of the History of Logic, D. M. Gabbay and J. Woods, Eds. North-Holland, 2007, vol. 8, pp. 13 – 94. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1874585707800045
-  B. Dinis, “Old and new approaches to the sorites paradox,” arXiv preprint arXiv:1704.00450, 2017.
-  M. Yammahi, K. Kowsari, C. Shen, and S. Berkovich, “An efficient technique for searching very large files with fuzzy criteria using the pigeonhole principle,” in Computing for Geospatial Research and Application (COM. Geo), 2014 Fifth International Conference on. IEEE, 2014, pp. 82–86.
-  J. A. Evans and J. G. Foster, “Metaknowledge,” Science, vol. 331, no. 6018, pp. 721–725, 2011.
-  M. Handzic, Knowledge management: Through the technology glass. World scientific, 2004, vol. 2.
-  K. Qazanfari, A. Youssef, K. Keane, and J. Nelson, “A novel recommendation system to match college events and groups to students,” 2017, arXiv:1709.08226v1.
-  R. Davis and B. G. Buchanan, “Meta-level knowledge,” Rulebased expert systems, The MYCIN Experiments of the Stanford Heuristic Programming Project, BG Buchanan and E. Shortliffe (Editors), Addison-Wesley, Reading, MA, pp. 507–530, 1984.
-  R. Vilalta, C. G. Giraud-Carrier, P. Brazdil, and C. Soares, “Using meta-learning to support data mining.” IJCSA, vol. 1, no. 1, pp. 31–45, 2004.
-  M. H. Alassaf, K. Kowsari, and J. K. Hahn, “Automatic, real time, unsupervised spatio-temporal 3d object detection using rgb-d cameras,” in Information Visualisation (iV), 2015 19th International Conference on. IEEE, 2015, pp. 444–449.
-  K. Kowsari and M. H. Alassaf, “Weighted unsupervised learning for 3d object detection,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 1, pp. 584–593, 2016.
-  K. Qazanfari, R. Aslanzadeh, and M. Rahmati, “An efficient evolutionary based method for image segmentation,” arXiv preprint arXiv:1709.04393, 2017.
-  O. Chapelle, B. Scholkopf, and A. Zien, “Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews],” IEEE Transactions on Neural Networks, vol. 20, no. 3, pp. 542–542, 2009.
-  O. Chapelle, M. Chi, and A. Zien, “A continuation method for semi-supervised svms,” in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp. 185–192.
-  O. Chapelle, V. Sindhwani, and S. S. Keerthi, “Branch and bound for semi-supervised support vector machines,” in NIPS, 2006, pp. 217–224.
-  S.-S. Choi, S.-H. Cha, and C. C. Tappert, “A survey of binary similarity and distance measures,” Journal of Systemics, Cybernetics and Informatics, vol. 8, no. 1, pp. 43–48, 2010.