Mislabel Detection of Finnish Publication Ranks
The paper proposes to analyze a data set of Finnish ranks of academic publication channels with Extreme Learning Machine (ELM). The purpose is to introduce and test recently proposed ELM-based mislabel detection approach with a rich set of features characterizing a publication channel. We will compare the architecture, accuracy, and, especially, the set of detected mislabels of the ELM-based approach to the corresponding reference results in .
Finland, in the spirit of Norway and Denmark, introduced ranking system for academic publication channels (referring to scientific journals, conference series, book publishers etc.) called as Jufo (i.e. ”Julkaisufoorumi” in Finnish, ”Publication Forum” in English) in 2010, together with the renewed university legislation. The ranking of a publication channel, ranging from 0 (non-peer-reviewed) to 3 (most distinguished academic publication forums), is decided by a specially nominated panel of a particular scientific discipline. These panels decide the rankings based on their academic expertise in regular meetings. Because the rankings are directly linked to the allocated funding of the universities, there has been and is a lot of discussion about the fairness and objectivity of the ranks.
A versatile analysis of the 2015 Jufo-rankings was done in . There, by using association rule mining, decision trees, and confusion matrices with respect to Norwegian and Danish ranks, it was shown that most of the expert-based rankings could be predicted and explained with machine learning methods. Moreover, it was found out that those publication channels, for which the Finnish expert-based rank is higher than the estimated one, are characterized by higher publication activity or recent upgrade of the rank. Hence, the outcomes of the system, the publication ranks, need to be assessed and evaluated regularly and rigorously.
Extreme Learning Machine (ELM), as proposed by Huang et al. [6, 5], provides one of the key randomized neural network frameworks . Probabilistic convergence analysis of the technique was provided in [8, 7], where the necessity of repeated sampling of the feedforward kernel and the advantage of weight decay (ridge regression) were concluded. Here, to identify possibly mislabeled publication channel ranks, we apply the MD-ELM algorithm described and successfully tested in .
The rest of the paper is organized as following. The next section 2 introduces the original dataset of Jufo rankings. The methodology, section 3, describes the feature extraction process and summarizes the MD-ELM method. Section 4 explains the experimental setup, general prediction performance, and provides the comparison with the previous results in . The last section 5 summarizes the findings and describes the future research directions.
The data for this study comes from two publicly available databases containing the Finnish publication source information and the actual national publication activity information.
JuFoDB: database of the Finnish publication forum, JuFo
1, which contains all nationally evaluated publication channels. Data was retrieved from this database in February 2015, so it describes the ranking situation after complete reevaluation round by the end of 2014.
JuuliDB: The publicly accessible database of Juuli
2that contains all publications of Finnish researchers. Each publication channel in JuFoDB has a unique Juuli ID, through which all Finnish publications in that particular channel can be found. Data was retrieved from this database in September 2015, because only then all published work by the end of 2014 had been checked and included in the repository.
29,443 different publication channels with 33 attributes were retrieved from JuFoDB and 107,289 publications from JuuliDB. The Finnish expert-based rank of each publication channel as well as the Norwegian and Danish expert-based rankings can be obtained directly through the JuFoDB and also the three bibliometric indicators from Scopus, that is the SJR, the SNIP and the IPP, are featured. Moreover, through the link to JuuliDB, one can directly access the information of all researchers in Finland who have published in the particular channel.
The panel variable determines the list of experts
In addition to some more general data, such as the title, subtitle, website, country of publication, language, unique identifier (ID), ISSN, Sherpa/Romeo code, starting year, and publisher, the JuFoDB also provides information such as abbreviation, title details, ISBN, DOAJ, end year, continued under the name and continued JuFo-rank. The evaluation history provides information about the previous ranks in the system.
Similarly as in , the continuous variables are directly utilized as features and the categorical variables are transformed to own binary features for each category. All of the 29,443 publication channels have missing values for at least some of the 33 total variables. Hence, for utilizing all of available data in the analysis, one faces a significant sparsity problem . Since the missing information was discovered as an important predictor of the Finnish expert-based rank in , we utilize here all the described variables as features plus for each variable the binary information whether it has an available value. Thus, for our final model we had 942 features (452 original + 400 added non-linear feature combinations).
3.1 Feature extraction
The original variables as described in the previous section were transformed into numerical features, either real-valued or binary ones. Each original feature has its own specific transformation into numerical format. The absence of a value, similarly to , is encoded with a separate binary variable for most features, as it provides valuable information (i.e., absence of a website of a poor quality conference).
The original features that are used for the analysis task, and their corresponding transformations are described below in Table 1. The results are notably missing Jufo-rankings for the previous years; those are omitted on purpose to make the rank prediction task unbiased by the previous decisions.
|1||Level||Current Jufo ranking||An output variable with integer values in range|
|Subtitle||Title and subtitle (if available) of the publication||Encoded in a Bag-of-Words representation, dimensionality reduced from 3700 to 30 by a Sparse Random Projection|
|3||Website||Website of the publication||Country code of the host represented in one-hot encoding with 117 binary variables (including unknown)|
|4||Type||Publication type (journal, conference, book series)||Represented in one-hot encoding with 3 binary variables; this feature has no missing values|
|5||ISSN||ISSN numbers of printed and online versions||A binary variable representing whether the publication has an ISSN, two variable total|
|6||StartYear||Start year of the publication||A logarithm of age of the publication, plus a binary variable representing missing value|
|7||Publication Country||Country of publication||One-hot encoding of the publication origin with 114 binary variables, including the unknown origin|
|8||Publisher||Publisher of the series||One-hot encoding of 100 most popular publishers, plus other publisher|
|9||Language||Language of the publication||One-hot encoding of the publication language with 49 binary variables, including undetermined|
|10||ERIH-class||ERIH ranking of publications||One-hot encoding of the four available ranks, plus a missing rank|
|IPP||Impact factors in three different systems||Three real-valued variables for the impact factors, plus three binary variables indicating the absence of an impact factor|
|Sherpa/Romeo||Open access types||Eight binary variables: two for DOAJ levels, and six for the Sherpa/Romeo levels|
|13||Field||The field of study in Finnish classification||Ten binary variables for the ten fields, a publication may belong to multiple fields|
|14||MinEdu Field||The field of study according to the Ministry of Education classification||70 binary variables for the Ministry of Education fields, a publication may belong to several of them|
|15||Panel||The scientific panel that assigned a corresponding score||One-hot encoding of the panel number with 25 binary variables, including a not available panel|
|16||ISBN||ISBN numbers used by the publication||One variable representing the number of different ISBNs; can be zero|
3.2 Mislabel detection using MD-ELM
The mislabel detection is based on the MD-ELM algorithm from . The key idea is to include in a data set artificial mislabels, which then can be used as baseline in a statistical detection of unknown mislabels using Welch’s t-test and directly computable Leave-One-Out (LOO) cross-validation error (PRESS statistics). In this way, the MD-ELM algorithm detects samples whose original labels are likely incorrect.
More precisely, the MD-ELM analyses the changes in the LOO error of the model in response to randomly changing labels of a few training samples. If the new labels reduce the global LOO error, the mislabel score of those samples is increased. A small part of the samples, whose labels are randomly changed on purpose, create the control group called artificial mislabels. Scores of the artificially mislabeled samples help to determine whether the MD-ELM method succeeds, and define the stopping criterion.
The mislabel detection method uses Extreme Learning Machine as the powerful nonlinear prediction model with a fast LOO error. A practical implementation employs several ELM models with different sets of artificial mislabels, eliminating their possible impact on the results. The predicted originally mislabeled samples are samples with the mislabel score higher that the given quantile of a normal distribution fitted to all the scores.
4 Experimental results
4.1 Prediction performance
A successful MD-ELM method application requires a precise prediction model to work with. The prediction task uses features 2-16 from Table 1 as inputs and the feature 1 as the target output.
The dataset exhibit a strong class imbalance (see Table 2). The imbalance causes rank 3 to be completely neglected in the predictions, unless class balancing measures are taken.
The benchmark performance level is obtained with the Random Forest classifier. It achieves 89.3% test accuracy, but the predictions are biased due to the strong class imbalance as shown on Figure 1. The smallest class 3 has only 18,6% correct predictions, while the largest class 1 is predicted correctly 98,4% of times.
Unfortunately, Random Forest model cannot be used in the Mislabeled Detection framework. So an Extreme Learning Machine was train instead. The input features consisted of the 542 numerical features derived from the data, 200 standard non-linear ELM neurons and another 200 Radial Basis Function neurons.
The output layer training proved difficult due to both class imbalance, and a high number of irrelevant linear features. The only successful model was an ElasticNet linear classifier that combined L1 and L2 regularization, trained with the Stochastic Gradient Descend. The regularization strength parameter is found by a 5-fold stratified cross-validation, that keeps the proportion of samples from different classes equal between the folds. Additionally, the method performed class balancing by computing the corresponding sample weights.
The resulted ELM achieved 85% total accuracy, distributed much more equally among the classes as shown in Figure 2. The resulting model selected only 289 input features out of the total of 942, reducing the data size for the MD-ELM method.
4.2 MD-ELM performance
The MD-ELM method uses 289 best features selected in the prediction experiment. The method does not implement class balancing, so the scope of the experiment is limited to detecting incorrectly labeled samples of rank 3 using a dataset of 900 random samples from ranks 0,1,2 plus all the 668 samples of rank 3. Such reduced dataset has a smaller class imbalance, that does not negatively affect the results.
The final predictions are averaged over 10 different MD-ELM models. Each model uses its own dataset with different random samples of ranks 0,1,2, a random subset of 100 input features out of the available 289, and a different random subset of 3% artificially mislabeled samples. At each iteration of the method, two samples have their labels changes, one of which is always an original rank 3 sample.
The method continues until artificially mislabeled samples get an average score of 100. This takes 400,000 iterations. By that time, non-artificially mislabeled samples achieve an average mislabel score of only 19 with standard deviation of 28. The difference between the scores shows that MD-ELM methods succeeds at separating artificially mislabeled samples from the rest; it means that it should also succeed in detecting the originally mislabeled samples.
The mislabel scores of all the samples with the original rank 3 are shown on Figures 3. A few outliers are clearly visible, together with other candidates to be the originally mislabeled samples. The analysis of these samples is presented below.
4.3 Characterization of misclassified publication channels and comparison to earlier results
As explained above, we concentrate only on misclassifications for the highest JuFo ranking, that is publication channels that were evaluated by the Finnish discipline experts as 3, but for which the automatic model suggested a lower rank. We restrict our misclassification analysis here to this set because it also resembles the largest difference to the Danish and Norwegian systems that include only ranks 0, 1 and 2.
With a mislabelled score over 99% quantile of average scores, 34 publication channels were identified for which the Finnish expert-based ranking was 3 but the model suggested a different rank. However, 30 of these misclassifications could immediately be explained by the ranks in the Danish and Norwegian model, which evaluated these publication channels as 2, that is the highest rank in their systems.
The four remaining publication channels for which both, the automated model and the Danish and Norwegian systems, suggested a lower rank were LIGHT: SCIENCE & APPLICATIONS, Etudes classiques, New German critique (for all three of these journals, the rank has recently been updated to a higher one), and the British medical journal. The last one has a considerable higher publication activity: The average number of Finnish publications in JuFo rank 3 channels is 10.78 but the British medical journal has a total of 26 publications. All of these journals were also detected to be mislabled in , but the misclassification could actually be explained. The three Scopus indicators had incorrectly not been included in JuFoDB for LIGHT: SCIENCE & APPLICATIONS and the British medical journal. These indicators could be manually found from Scopus and in both cases the indicators were so high that rank 3 actually seemed justified.
Although the methods utilized in here were very different from the ones utilized in , the main results obtained and the misclassification detected in here are to a large extend the same as the ones in . Thus, we conclude that methodological triangulation [2, 3] has strengthen our analysis results.
An extended version of the analysis of Finnish publication channel ranks was provided in this paper. Compared to the reference models in , we used here much more versatile set of features, with fully nonlinear ELM-based rank prediction model. The mislabel detection was based on the MD-ELM algorithm proposed in  and briefly recapitulated in section 3.2.
In summary, the experimental results obtained and reported in Section 4.3 are very similar to the analysis results in . In our future work, we intend to repeat the mislabel detection also for the other ranks, especially rank 2 for which the most suspicious publication channel quality misclassifications were identified in  and that, as explained above, actually contain the most misclassifications. The MD-ELM method will also be extended with a class balancing mechanism, allowing it to handle the whole original dataset.
- Available at http://www.tsv.fi/julkaisufoorumi/haku.php.
- Available at http://www.juuli.fi/?&lng=en.
- See http://www.julkaisufoorumi.fi/en/publication-forum/panels.
- (2015) MD-elm: originally mislabeled samples detection using op-elm model. Neurocomputing 159, pp. 242–250. Cited by: §1, §3.2, §5.
- (2004) Triangulation. In The SAGE Encyclopedia of Social Science Research Methods, pp. 1143–1144. External Links: Cited by: §4.3.
- (1970) Strategies of Multiple Triangulation. The Research Act: A Theoretical Introduction to Sociological Methods, pp. 297–313. Cited by: §4.3.
- (26-28 April 2017) Randomized machine learning approaches: Recent developments and challenges. In ESANN 2017 Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 77–86. Cited by: §1.
- (2012) Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42 (2), pp. 513–529. Cited by: §1.
- (2006) Extreme learning machine: theory and applications. Neurocomputing 70 (1), pp. 489–501. Cited by: §1.
- (2015) Is extreme learning machine feasible? a theoretical assessment (part ii). IEEE Transactions on Neural Networks and Learning Systems 26 (1), pp. 21–34. Cited by: §1.
- (2015) Is extreme learning machine feasible? a theoretical assessment (part i). IEEE Transactions on Neural Networks and Learning Systems 26 (1), pp. 7–20. Cited by: §1.
- (2015) Analysing Student Performance using Sparse Data of Core Bachelor Courses. JEDM-Journal of Educational Data Mining 7 (1), pp. 3–32. External Links: Cited by: §2.
- (2016) Expert-based versus citation-based ranking of scholarly and scientific publication channels. Journal of Informetrics 10 (3), pp. 693–718. Cited by: Mislabel Detection of Finnish Publication Ranks, §1, §1, §2, §2, §3.1, §4.3, §4.3, §5, §5.