Feature versus Raw Sequence: Deep Learning Comparative Study on Predicting PremiRNA
Abstract
\parttitleBackground Should we input known genome sequence features or input sequence itself in deep learning framework? As deep learning more popular in various applications, researchers often come to question whether to generate features or use raw sequences for deep learning. To answer this question, we study the prediction accuracy of precursor miRNA prediction of featurebased deep belief network and sequencebased convolution neural network. \parttitleResults Tested on a variant of sixlayer convolution neural net and threelayer deep belief network, we find the raw sequence input based convolution neural network model performs similar or slightly better than feature based deep belief networks with best accuracy values of 0.995 and 0.990, respectively. Both the models outperform existing benchmarks models. The results shows us that if provided large enough data, well devised raw sequence based deep learning models can replace feature based deep learning models. However, construction of well behaved deep learning model can be very challenging. In cased features can be easily extracted, featurebased deep learning models may be a better alternative.
Research
addressref=aff1, noteref=n1, email=jaya.thomas@sunykorea.ac.kr ]\initsJT\fnmJaya \snmThomas addressref=aff1, noteref=n1, email=sonia.thomas@sunykorea.ac.kr ]\initsST\fnmSonia \snmThomas addressref=aff1,aff2, corref=aff1,aff2, email=sael@cs.stonybrook.edu ]\initsLS\fnmLee \snmSael
[id=n1]Equal contributor
precursor miRNA \kwdConvolution neural network \kwdDeep belief network \kwdDeep learning comparison
Introduction
Deep learning methods have been popularized in biosequence analysis. More specifically, convolution neural network (CNN) have been widely applied to characterize and classify raw sequence data . Traditionally, to classify sequence data, sequence features were generated, fed in to classification algorithms, and predictions were made. CNN simplified this process by removing the need to feature generation which could be challenging in cases. However, question arises of whether to use raw sequences when there are already good set of features. We answer this question in the context of precursor micro RNA prediction where raw sequence data as well as set of working sequence features which are shown to give high accuracy.
Precursor miRNA
MicroRNAs (miRNAs) are singlestranded small noncoding RNAs that are typically 22 nucleotides long. A miRNA regulates gene expression at the post transcription level by base pairing with a complementary messenger RNA (mRNA) there by hindering the translation of the mRNA to proteins. The regulatory role of miRNAs are important in development, cell proliferation and cell death and their malfunction has been connected with neurodegenerative disease, cancer and metabolic disorders [1]. Furthermore, informatics analysis predicts that 30 of human genes are regulated by miRNA [2].
MiRNAs can be experimentally determined by directional cloning of endogenous small RNAs [3]. However, this is a time consuming process that require expensive laboratory reagents. These drawbacks motivate the application of computational approaches for predicting miRNAs. The goal of miRNA prediction is to correctly classify premiRNAs from other pseudo hairpins. Via miRNA biogenesis, premiRNA becomes a mature miRNA, however, other hairpins do not. The miRNA biogenesis involves number of steps. First, primary transcripts of miRNA (primiRNA) are transcribed from introns of protein coding genes that are several kilobases long. The primiRNAs are then clopped by RnaseIII enzyme Drosha into 70 base pairs (bp) long hairpinlooped precursor miRNAs (premiRNAs). Then exportin5 proteins transport premiRNA hairpins into the cytoplasm through nuclear pore. In cytoplasm, premiRNAs are further cleaved by RnaseIII enzyme Dicer to produce a 20 bp double stranded intermediate called miRNA:miRNA*. Then a strand of the duplex with the low thermodynamic energy becomes a mature miRNA.
Precursor miRNA prediction methods
Several machine learning based methods have been proposed to predict miRNAs, that is to determine the true premiRNAs from other pseudo hairpins, RNA sequences that have similar stemloop features to premiRNAs but does not contain mature miRNAs, with high accuracy. Most methods relies on features generated from sequence, folding measures, stemloop features and statistical measures and careful selection of features.
Many tools have been developed based on the different classification techniques such as naive Bayes classifier (NBC), artificial neural networks (ANN), support vector machines (SVM), and random forests (RF). Among the approaches, support vector machine (SVM) had been most extensively applied. Some of the notable SVMbased methods includes tripletSVM [4], MiRFinder [5], miPred [6], microPred [7], yasMiR [8], MiRenSVM [9], MiRPara [10], YamiPred [11], and GDE [12]. Among them, TripletSVM[4] is the classifier that consider local structuresequence features that reflect characteristics of miRNAs. The approach report an accuracy of 90 considering premiRNAs from the other 11 species including plants and virus without considering any other comparative genomics information. Another, miPred[6] SVM approach considered Gaussian Radial Basis Function kernel (RBF) as a similarity measure for global and intrinsic hairpin folding attribute and resulted with accuracy of around 93. MicroPred[7] introduces some additional features for evaluation of miRNA using SVM based machine learning classifier. Author’s report classification results of high specificity of 97.28 and sensitivity of 90.02. The miRSF classifier [13] predicts the identified human premiRNAs in miRBase source on the selected optimized feature subset including 13 features, generated by SVM and genetic algorithm. Finially, YamiPred [11] is a genetic algorithm and SVM based embedded classifier that consider feature selection and parameters optimization for improving performance. Other notable methods are based on random forests (RF)[14] and artificial neural networks (ANN) [15, 16]. In MiRANN[15] predictor, author’s consider neural network for premiRNA prediction by expanding the network with more neurons and the hidden layers and reports an ACC on a human dataset. The network is designed to be impartial for any feature by integrating exceptional weight initializing equation where closest neurons slightly differ in weights. MiRANN utilizes carefully selected features on a neural network structure. However, to the best of our knowledge, raw sequence data have not been used for distinguish premiRNA from other hairpin sequences.
Deep learning approaches
Two types of neural network models, i.e., deep belief network and convolution neural network, are used to to compare the prediction accuracy of featurebased learning and raw sequence based learning. Convolution neural network (CNN) has been used in several instances to directly process raw data as input. CNN has gained momentum due to its success in improving the previously recorded stateoftheart performance measures in a wide range of domains including genome sequence analysis. Deep belief network (DBN), on the other hand has been popular with where there are large number of features. Whether the input is a raw data or a high dimensional feature, DNN uses multilayer architecture to infer from data. The deep architecture automatically extracts highlevel feature necessary for classification or clustering. That is, the multiple layers in deep learning helps exploit the inherent complexities of data.
Deep belief network
Deep belief network (DBN) is an architecture obtained by stacking multiple restricted boltzmann machines (RBMs), such that the hidden layer is the input to the hidden layer. Let be the observed vector and hidden layer , with N hidden layers, then the distribution is as follows [17]:
(1) 
where , is the distribution for visible (input) unit on the level hidden layer, and is the top level layer distribution of the visiblehidden layer. The first step in training the DBN is to train the first layer (visible layer) of the model such that is models the raw input . In the second step the distribution of the input (i.e transformed data) is obtained as using training results of the first layer and is used as the input for the second layer. In the third step, second layer of RBM is trained on sampling the learned conditional probability in the previous layer. Steps two and three is repeated to generate multiple layers. In the final step, hyper parameters are finetuned based on the gradient descent based back propagations. The first hidden layer learns from the structure of the data through the input layer and the process is continued by adding the second layer. The first hidden layer acts as the input, which is multiplied by weight at the nodes of second hidden layer and thus the probability for activating the second hidden layers is calculated. This process results in sequential sets of activations by grouping features of features resulting in a feature hierarchy, by which networks learn more complex and abstract representations of data, and can be repeated several times to create a multilayer network. A standard feedforward neural network is added after the last hidden layer to predict the label, the input to the network being being the activation probabilities. The resulting DBN is put together to adjust the weights with stochastic gradient descent back propagation [18].
Convolution neural network
Typical CNN models consist of multiple layers of convolution layer and pooling layer alternations finalized by a fully connected layer. The convolution layer performs the convolution operation between the input values and learned filters, matrix of weights. Let be the filter size and be the small matrix of weights, then the convolution layer performs a modified convolution of the W with the input X by calculating the dot product , where a instance of and is the bias. Typically the filters are are share by using the same filter across different positions of the input. The step size by which the filter slides across the input is called the stride, and the filter area is called the receptive field. Weight sharing concept is the important characteristics of a convolution network that reduces the complexity of the problem by reducing number of weights learned. It is also allows location invariant learning, i.e., if a important pattern exists, a appropriate CNN model will learn it no matter where in the sequence. The convolution layer is often followed by the pooling layer that summaries the value learned in the convolution layer. The pooling also allows invariance in the learning as well as reducing model complexity. Popular pooling methods are average pooling or max pooling. The final layer is the fully connected layer which is connected to the output or the classification layer
Contributions
The main contribution of the paper are summarized as follows:

Compare the performance of feature based and raw sequence based deep learning: a deep belief network models is proposed for integrating large number of heterogeneous features and convolution neural network model is proposed for the raw input sequence data.

Provides a solution for class imbalance problem, allowing for unbiased performance measures.

Compares the performance of proposed model against existing machine learning classifier on eleven different species which extends the previous work on human dataset only [16].
Methods
Dataset
miRNA data selection
We use experimentally validated premiRNAs as positive examples and pseudo hairpins as negative examples to train and test the proposed method. The human premiRNA sequence was retrieved from the miRBase 18.0 release [19]. Similar to miPred [6] approach, the multiple loops were discarded to obtain 1600 premiRNAs (positive) dataset. The positive sample sequence has an average length of 84 nt with maximum and minimum length being 154 nt and 43 nt respectively. Similarly the negative sample sequence has average length of 85nt with 63 nt and 120 nt as minimum and maximum length respectively. The negative dataset consists of 8494 pseudo hairpins as the false samples. They were extracted from the human proteincoding regions as suggested by microPred [7]. The average length of the sequence is 85 nt with minimum as 63 nt and maximum as 120 nt. The different filtering criteria, including nonoverlapping sliding window, no multiple loops, lowest base pair number set to 18, and minimum free energy less than 15kcal/mol were applied on these sequences to resemble the real premiRNA properties.
Class imbalance solution
Another problem that we have addressed here is the class imbalance problem in miRNA predictions. Class imbalance is a machine learning problem where the number of data samples belonging to one class (positive) is far less compared to data sample belonging to other class (negative). The imbalance class is often solved by using either under or over sampling methods. In case of under sampling the data samples are removed from the majority class, whereas for over sampling balance is created either by addition of new samples or duplication of the existing minority class samples. Class imbalance problem often arise in miRNA data classification problem due to abundance in pseudo hairpin structure compared to true premiRNAs folds. In existing classifiers such as tripletSVM[4] and miPred [6] handled the imbalance problem manually.
We address the class imbalance problem during the training phase by adopting a modified under sampling approach [20]. In the modified approach, we divided the majority class into subsets using kmeans algorithm with k=5, and thus obtain clusters with slightly higher similarity amongst the group. The entire negative samples is divided into subsets using kmeans algorithm with k=5, and the cluster having the highest similarity index among the group is selected. Now using 8 fold cross validation, the negative samples is divided into training and testing dataset such that training dataset has 1400 negative samples and testing dataset has 200 negative samples. Similarly, the positive sample is divided into training and testing dataset using 8 fold cross validation such that training dataset has 1400 positive sample and testing dataset has 200 positive samples. Hence, the training dataset has 2400 samples and testing dataset has 400 samples.
Modeling deep belief network
miRNA feature encoding
Feature based learning require features as inputs. This work adopts 58 characteristic features, which are shown useful in existing studies for predicting miRNA [11]. The features includes sequences composition properties, folding measures, stemloop features, energy and statistical features, and 20 selected features to differentiate premiRNAs from pseudo hairpins for candidate input of the DBN model. These features are extracted based on the knowledge based analysis of the existing methods for the miRNA analysis. The common characteristics of premiRNAs used for evaluation consists of sequences composition properties, secondary structures, folding measures and energy. The sequence characteristics include features related to the frequency of two and three adjacent nucleotide and aggregate dinucleotide frequency in the sequence. The secondary structure features from the perspectives of miRNA biogenesis relating different thermodynamic stability profiles of premiRNAs. These structures have lower free energy and often contain stem and loop regions. They include diversity, frequency, entropyrelated properties, enthalpyrelated properties of the structure. The other features are hairpin length, loop length, consecutive basepairs and ratio of loop length to hairpin length of premiRNA secondary structure. The energy characteristic associated to the energy of secondary structure includes the minimal free energy of the secondary structure, overall free energy NEFE, combined energy features and the energy required for dissolving the secondary structure. All the features extracted are normalized to standardizing the inputs in order to improve the training and to avoid getting stuck in local optima. The features used are summarized Table LABEL:tab:DBN_feat and detailed in Table 7 .
Deep belief network architecture
The proposed DBN based miRNA prediction method, we call miRNAFDL, has three hidden layers, and the model is denoted as X10070351, where X being the size of the input layer, 1 denotes the number of neuron in the output layer and the remaining values denotes the number of neurons in each hidden layer. Figure 1 illustrates the model architecture and layerbylayer learning procedure described in . Different model architectures were trained using the same learning procedure but varying the number of hidden layer and nodes. Amongst the candidate network models, a better one was selected based on the classifier accuracy.
The weights of the miRNAFDL was trained with stochastic gradient descent base back propagation algorithm [18]] were the update rule is the following:
(2) 
where, is the weight computed at , denotes the learning rate, and is the cost function. For the given model, softmax is used as an activation function and the cost is computed using cross entropy. The softmax function is defined as
(3) 
where, is the output of the unit j, and denotes the total input to unit and , respectively for the same level. The cross entropy is given by
(4) 
where is the target probability for output unit and is the probability output after applying the activation function.
Modeling convolution neural network
CNN input processing
Each premiRNA is a RNA sequence of composed letters (A, C, G, U). Each of the nucleotide is encoded using onehotcode methods. That is, A is encoded as (1,0,0,0), C as (0,1,0,0), G as (0,0,1,0), and U as (0,0,0,1). The micro RNA is a nucleotide sequence of (A, C, G, U).
Convolution neural network architecture
Various architecture of the CNN can be generated dependent on the choice of number of layers and on how to combine convolution layer with pooling layers. The Table 3 shows the various variants of the CNN architecture considered for the study. In CNN model type 1, the CNN architecture has single layer of convolution followed by a pooling layer which in turn is connected to the fully connected output layer. The output layer is connected to the classification layer which classifies the predicted labels. The model type 2 is variation to model type 1, such that the pooling layer of model type 1 is replaced by a fully connected layer. Hence the model type 2 has two fully connected layers. The further variation to the model type 1 leads to the model type 3 such that it has two convolution layers. All other layers as similar to the base model 1. The model type 4 has three convolution layers. In all the models global pooling is preferred over local pooling as it is observed that features are better learned in global pooling.
The architecture of the CNN model highly depends on the various hyperparameters. We set the number of node in the input layer to be , 4 for the onehotcode encoding and 160 the for length of the sequence as the maximum length of sequence in the database is 154bp. We set the output node of the the fully connected layer to be three and add a classification layer which identifies the input sequence as whether it contains the premiRNA or not based on the three nodes result. Other hyperparameters tested are list in the Table 4. The various combination of the hyperparameters mentioned in Table 4 with the models mentioned in Table 3, are considered.
Results
Wether to determine the efficacy of raw sequence based learning and feature based learning, we compared the accuracies of two DBN models that works one unselected premiRNA feature set of size 58 and selected feature set of size 20 and the accuracy of one CNN model that works on raw premiRNA sequences. We also compared the proposed methods on other machine learning methods. The proposed miRNA prediction models are implemented on MATLAB 2016 (b) platform with 2.30 GHz Intel Xenon GPU E52630 and 32 GB RAM. The most crucial aspect of the deep learning was on the selection of the appropriate hyperparameters. We describe the final models that was selected in the following. The performance of the proposed and compared methods are summarized in Table 6.
DBN based precursor miRNA prediction model
The various candidate model for the DBN based precursor miRNA prediction model is obtained by varying the number of hidden nodes in the hidden layer as well as the number of hidden layers. The best prediction accuracy is obtained for a DBN network architecture [Fig. 3] of three hidden layers with first, second and third layer having 100, 70, 35 hidden neurons respectively. Considering the stochastic nature of the algorithm the output values are averaged for twenty executions. It is observed that for 58 features as input, the DBN model (Input1007035output) gives an accuracy of 0.968 with F1score of 0.957 Furthermore from the literature survey [11] it is learned that the most relevant features associated with the miRNA are melting temperature, enthropy, enthalpy and free minimum energy. Thus the 20 relevant features mentioned are aggregate nucleotide frequency A+U, dinucleotide frequencies AG, AU, CU, GA, UU, Minimum Free Energy Index 4 (MFEI4), Positional Entropy, Normalized Ensemble free energy, Frequency of the MFE structure (Freq), Enthalpy normalized by the length of the sequence (dH/L), Melting temperature (Tm), Melting temperature normalized by length (Tm/L), Normalized basepair count by length, j GCj /L, Normalized average base pairs by number of stem loops (AU)/ stems, (GU)/stems, the length of the sequence (Len), Centroid energy normalized by length (CE/L), and the Statistical Zscores zG and zSP. The DBN model for the above 20 features gives an accuracy of 0.992 with F score 0.989 which is slightly higher than the using all 58 features.
CNN based precursor miRNA prediction model
The various candidate models obtained by the combination of Table 3 and Table 4 are bested and two models that output highest accuracies on the validation data set are selected. Deeper layers were also tested, however, additional layers of convolution does not increases the performance of the miRNA prediction due to two factors due to limitation of available number of data. The two models are described bellow and summarized in Table 5.
The architecture of the model type 2 could be explained as follows: the input layer (raw sequence data) is convoluted with a filter (window) of size of 18, and the window is shifted with value of 4 (using a stride of 4). The total number of filters used are 20. The output obtained after the convolution, is now fed into a fully connected layer having 90 neurons. Furthermore the output from this fully connected layer is again fed into another fully connected layer having 2 neurons. This layer is also called the output layer. The output layer is connected to a classification layer which classifies the label. In the model type 2, after the convolution layer, two fully connected layers are used before the final (output) classification layer. The fully connected layer helps in better learning of the features that are extracted from the convolution layer.
In the model type 3 as depicted by Figure 2, the architecture is as follows: the input layer (raw sequence data) is convoluted with a filter (window) of size 12 and the window is shifted with value 1 (Stride =1). The output of this convolution layer is again convoluted with another filter (window) of size 6 and Stride=1. For both the convolutions the number filters used is 12. Now after the second convolution, a max pooling layer is connected with window size 6 and the window is shifted with value 4 (stride =4). The output of the pooling layer is connected to the fully connected layer having 2 neurons, (i.e the output layer). The output layer is connected to a classification layer which classifies the label. For both the models, the accuracy is at its best at the dropout ratio of 0.3 at the output layer.
Comparison with the existing computational methods
Proposed prediction model using CNN and DBN are compared to the existing benchmark models. It is clearly observed that the prediction model based on deep learning approaches outperforms compared methods. Two models of CNN and DBN model with 20 selected features has highest accuracy values above 0.99. The DBN model working on 58 features also has high accuracy of 0.968. This shows that DBN performs well on large set of unselected features. Both the proposed models in this study, provided enough data, validate that deeper the network model, higher is the precision efficacy of the classifier. Table 6.
Discussions and Conclusions
In this study, prediction model for prediction of precursor miRNA that contains miRNA sequence is proposed using deep learning techniques using convolution neural networks on raw sequence input and deep belief networks on feature sets. Convolution neural network, when well modeled, were able to automatically learn relevant feature from raw RNA sequence for predicting correct premiRNAs, hence developing a highly accurate classifier. Deep learning framework, outperform all the existing popular learning algorithms including naive Bayes, random forest, k nearest neighbor, and SVM.
References
 [1] Witkos, T.M., Koscianska, E., Krzyzosiak, W.J.: Practical aspects of microrna target prediction. Curr Mol Med 11(2), 99–109 (2011). doi:10.2174/156652411794859250
 [2] Ross, J.S., Carlson, J.A., Brock, G.: mirna: the new gene silencer. Am J Clin Pathol. 128(5), 830–836 (2007). doi:10.1309/2JK279BU2G743MWJ
 [3] Chen, P.Y., Manninga, H., Slanchev, K., Chien, M., Russo, J.J., Ju, J., Sheridan, R., John, B., Marks, D.S., Gaidatzis, D., Sander, C., Zavolan, M., Tuschl, T.: The developmental mirna profiles of zebrafish as determined by small rna cloning. Genes and Development 19(11), 1288–1293 (2005). doi:10.1101/gad.1310605
 [4] Xue, C., Li, F., He, T., Liu, G.P., Li, Y., Zhang, X.: Classification of real and pseudo microrna precursors using local structure sequence features and support vector machine. BMC Bioinformatics 6, 310 (2005). doi:10.1186/147121056310
 [5] Huang, T.H., Fan, B., Rothschild, M.F., Hu, Z.L., Li, K., Zhao, S.H.: Mirfinder: an improved approach and software implementation for genomewide fast microrna precursor scans. BMC Bioinformatics 8, 341 (2007). doi:10.1186/147121058341
 [6] Ng, K.L.S., Mishra, S.K.: De novo svm classification of precursor micrornas from genomic pseudo hairpins using global and intrinsic folding measures. BMC Bioinformatics 23(11), 1321–1330 (2007). doi:10.1186/147121058341
 [7] Batuwita, R., Palade, V.: micropred: effective classification of premirnas for human mirna gene prediction. BMC Bioinformatics 25(8), 989–995 (2009). doi:10.1093/bioinformatics/btp107
 [8] Pasaila, D., Sucial, A., Mohorianu, I., Pantiru, S.T., Ciortuz, L.: Mirna recognition with the yasmir system: The quest for further improvements. Adv Exp Med Biol. 696, 17–25 (2011). doi:10.1007/9781441970466 2
 [9] Ding, J., Zhou, S., Guan, J.: Mirensvm: towards better prediction of microrna precursors using an ensemble svm classifier with multi loop features. BMC Bioinformatics 14(11), 11–11 (2010). doi:10.1186/1471210511S11S11
 [10] Wu, Y., Wei, B., Liu, H., Li, T., Rayner, S.: Mirpara: a svmbased software tool for prediction of most probable microrna coding regions in genome scale sequences. BMC Bioinformatics 12, 107 (2011). doi:10.1186/1471210512107
 [11] Kleftogiannis, D., Theofilatos, K., Likothanassis, S., Mavroudi, S.: Yamipred: A novel evolutionary method for predicting premirnas and selecting relevant features. IEEE ACM Transactions on Computational Biology and Bioinformatics 12(5), 1183–1192 (2015). doi:10.1109/TCBB.2014.2388227
 [12] Hsieh, C.H., Chang, D.T.H., Hsueh, C.H., Wu, C.Y., Oyang, Y.J.: Predicting microrna precursors with a generalized gaussian components based density estimation algorithm. BMC Bioinformatics 11, 1–52 (2010). doi:10.1186/1471210511S1S52
 [13] Wang, Y., Chen, X., Jiang, W., Li, L., Li, W., Yang, L., Liao, M., Lian, B., Lv, Y., Wang, S., Wang, S., Li, X.: Predicting human microrna precursors based on an optimized feature subset generated by gasvm. Genomics 98(2), 73–78 (2011)
 [14] Xiao, J., Tang, X., Li, Y., Fang, Z., Ma, D., He, Y., Li, M.: Identification of microrna precursors based on random forest with networklevel representation method of stemloop structure. BMC Bioinformatics 12:165 (2011). doi:10.1186/1471210512165
 [15] Rahman, M.E., Islam, R., Islam, S., Mondal, S.I., Amin, M.R.l.: Mirann: A reliable approach for improved classification of precursor microrna using artificial neural network model. Genomics 99, 189–194 (2012)
 [16] Thomas, J., Thomas, S., Sael, L.: DPmiRNA: an improved prediction of precursor microRNA using deep learning mode. In: IEEE International Conference on Big Data and Smart Computing (IEEE BigComp 2017), pp. 96–99 (2017). http://conf2017.bigcomputing.org/
 [17] Hinton, G.E.: Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation 14(8), 1771–1800 (2002)
 [18] Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A.R., Jaitly, N., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 82–97 (2012). doi:10.1.1.248.3619
 [19] Zhong, Y., Xuan, P., Han, K., Zhang, W., Li, J.: Improved premirna classification by reducing the effect of class imbalance. BioMed Research International 2015, 1–12 (2015). doi:10.1155/2015/960108
 [20] Yen, S.J., Lee, Y.S.: Clusterbased undersampling approaches for imbalanced data distributions. Expert Systems with Applications 36(3), 5718–5727 (2009). doi:10.1016/j.eswa.2008.06.108
Figures
Tables
B1  B2  B3  

A1  0.1  0.2  0.3 
A2  …  ..  . 
A3  ..  .  . 
Category  Features 

Sequence composition properties  features related to the frequency of two and three adjacent nucleotide, aggregate dinucleotide frequency in the sequence, such as Dinucleotide pair frequency, Trinucleotide frequency, aggregate dinucleotide frequency 
Secondary structures  thermodynamic stability profiles of premiRNAs 
Stem and loop  diversity, frequency, entropyrelated properties, enthalpyrelated properties of the structure, hairpin length, loop length, consecutive basepair, ratio of loop length to hairpin length of premiRNA secondary structure 
Energy characteristics  minimal free energy of the secondary structure, overall free energy NEFE, combined energy features, the energy required for dissolving the secondary structure 
Statistical measures  Zscore of the folding measures zG, zQ, zSP, zP, zD 
Model Name  Number of Layers  Description of architecture 

Type 1  5  Layer 1: Input layer, Layer 2: Convolution with no stride, Layer 3 Pooling layer with stride, Layer 4 Fully connected to output layer, Layer 5 classification layer 
Type 2  5  Layer 1: Input layer, Layer 2: Convolution with stride, Layer 3 Fully connected Layer , Layer 4 Fully connected to output layer, Layer 5 classification layer 
Type 3  6  Layer 1: Input layer, Layer 2: Convolution with no stride, Layer 3: Convolution with no stride, Layer 4 Pooling layer with stride, Layer 5 Fully connected to output layer, Layer 6 classification layer 
Type 4  7  Layer 1: Input layer, Layer 2: Convolution with no stride, Layer 3: Convolution with no stride, Layer 4: Convolution with no stride, Layer 5 Pooling layer with stride, Layer 6 Fully connected to output layer, Layer 7 classification layer 
Hyperparameter  Range 

Filter size  5 to 24 
Number of filters  5 to 20 
Stride  0 to 24 
Pooling  Max pooling 0 to 9 
Dropout  0 to 0.4 
Number of convolution layers  1 to 3 
Model Type  Description of model  Performance Measure  
Type 2  Layer 1  Input Sequence  SE=1 
Layer 2  Convolution Layer, Window size= 18, Stride = 4,  Number of filters =20.  
Layer 3  Fully connected Layer (90 neurons)  Precision= 0.985  
Layer 4  Fully connected Layer (2 neurons)  Acc=0.993  
Layer 5  Classification Layer  
Type 3  Layer 1  Input Sequence  SE=1 
Layer 2  Convolution Layer (window size=12, stride=1, Num. of filters=12)  SP=0.990  
Layer 3  Convolution Layer (window size=6, stride=1, Num. of filters=12)  Precision=0.990  
Layer 4  Pooling Layer (max pooling, stride=4)  Acc=0.995  
Layer 5  Fully Layer (2 neurons)  
Layer 6  Classification Layer 
Method  Sensitivity  Specificity  Accuracy 

Naive Bayes  0.943  0.796  0.914 
K nearest neighbors  0.970  0.657  0.908 
Random Forest  0.979  0.765  0.937 
miRNN  0.963  0.705  0.917 
YamiPred  0.937  0.912  0.932 
Deep RBM model [58 features]  0.973  0.942  0.968 
Deep RBM model [20 features]  0.995  0.982  0.990 
CNN model 1 (Type 2)  1.00  0.985  0.993 
CNN model 2 (Type 3)  1.00  0.990  0.995 
Additional Files
Full list of premiRNA features
Full list of premiRNA features used as inputs to deep belief network is listed.
Feature  Number  Description 

XY, where X,Y A,C,G,U  16  Dinucleotide pairs frequency 
XYZ, where X,Y,Z A,C,G,U  64  Trinucleotide pairs frequency 
A+U  1  Aggregate dinucleotide frequency (bases which are either A or U) 
G+C  1  Aggregate dinucleotide frequency (bases which are either G or C) 
L  1  Structure length 
Freq  1  Structural frequency property 
dP  1  Adjusted base pairing propensity given as totalbasesL 
dG  1  Adjusted Minimum Free Energy of folding given as dG = MFEL 
dD  1  Adjusted base pair distance 
dQ  1  Adjusted shannon entropy 
dF  1  Compactness of the treegraph representation of the sequence 
MFEI1  1  MFEI1 = dG(C+G) 
MFEI2  1  MFEI2 = dGnumberofstems 
MFEI3  1  MFEI3 = dGnumberofloops 
MFEI4  1  MFEI4 = dGtotalbases 
MFEI5  1  MFEI5= dG(A+U) ratio 
dS  1  Structure entropy 
dSL  1  Normalized structure entropy 
dH  1  Structure Enthalpy 
dHL  1  Normalized structure enthalpy 
Tm  1  Melting Temperature 
TmL  1  Normalized melting temperature 
BPX, where X GC,GU,AU  3  Ratio of totalbases to respective base pairs 
GC  1  Number of G,C bases 
AvgBPStem  1  Average number of base pairs in the stem region 
AUL,GCL, GUL  3  XY is the number of (X Y) base pairs in the secondary structure 
AUnstems, G Cnstems,  
and G Unstems  3  Average number of base pairs in the stem region 
zP,zG,zD,zQ,zSP  5  statistical Zscore of the folding measures 
dPs  1  Positional Entropy which estimates the structural volatility of the secondary structure 
EAFE  1  Normalized Ensemble Free Energy 
CEL  1  Centroid energy normalized by length 
Diff  1  Diff = MFEEFEL where, EFE is the ensemble free energy 
IH  1  Hairpin length dangling ends 
IL  1  Loop length 
IC  1  Maximum consecutive basepairs 
L  1  Ratio of loop length to hairpin length 