In this work, we propose a deep learning approach to improve docking-based virtual screening. The introduced deep neural network, DeepVS, uses the output of a docking program and learns how to extract relevant features from basic data such as atom and residues types obtained from protein-ligand complexes. Our approach introduces the use of atom and amino acid embeddings and implements an effective way of creating distributed vector representations of protein-ligand complexes by modeling the compound as a set of atom contexts that is further processed by a convolutional layer. One of the main advantages of the proposed method is that it does not require feature engineering. We evaluate DeepVS on the Directory of Useful Decoys (DUD), using the output of two docking programs: AutodockVina1.1.2 and Dock6.6. Using a strict evaluation with leave-one-out cross-validation, DeepVS outperforms the docking programs in both AUC ROC and enrichment factor. Moreover, using the output of AutodockVina1.1.2, DeepVS achieves an AUC ROC of 0.81, which, to the best of our knowledge, is the best AUC reported so far for virtual screening using the 40 receptors from DUD.
Oswaldo Cruz Foundation] Fiocruz, 4365 Avenida Brasil, Rio de Janeiro, RJ 21040 900, Brazil Oswaldo Cruz Foundation] Fiocruz, 4365 Avenida Brasil, Rio de Janeiro, RJ 21040 900, Brazil IBM Watson] IBM Watson, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, USA Deep learning-based method for virtual screening] Boosting Docking-based Virtual Screening with Deep Learning
Drug discovery process is a time-consuming and expensive task. The development and even the repositioning of already known compounds is a difficult chore 1. The scenario gets worse if we regard the thousands or millions of molecules capable of being synthesized in each development stage 2, 3.
In the past, experimental methods such as high-throughput screening (HTS) could help making this decision through the screening of large chemical libraries against a biological target. However, the high cost of the whole process associated with a low success rate turns this method inaccessible to the academia 2, 3, 4.
In order to overcome these difficulties, the use of low-cost computational alternatives is extensively encouraged, and it was adopted routinely as a way to aid in the development of new drugs 3, 4, 5.
Computational virtual screening works basically as a filter (or a prefilter) consisting of the virtual selection of molecules, based on a particular predefined criterion of potentially active compounds against a determined pharmacological target 2, 4, 6.
Two variants of this method can be adopted: ligand-based virtual screening and structure-based virtual screening. The first one deals with the similarity and the physicochemical analysis of active ligands to predict the activity of other compounds with similar characteristics. The second is utilized when the three-dimensional structure of the target receptor was already elucidated somehow (experimentally or computationally modeled). This approach is used to explore molecular interactions between possible active ligands and residues of the binding site. Structure-based methods present a better performance when compared to methods based solely on the structure of the ligand focusing on the identification of new compounds with therapeutic potential 7, 8, 1, 9.
One of the computational methodologies extensively used to investigate these interactions is molecular docking 5, 8, 10. The selection of more potent ligands using Docking-based Virtual Screening (DBVS) is made by performing the insertion of each compound from a compound library into a particular region of a target receptor with elucidated 3D structure. In the first stage of this process, a heuristic search is carried out in which thousands of possibilities of insertions are regarded. In the second, the quality of the insertion is described via some mathematical functions (scoring functions) that give a clue about the energy complementarity between compound and target 10, 11. The last phase became a challenge to the computational scientists considering that it is easier to recover the proper binding mode of a compound within an active site than to assess a low energy score to a determined pose. This hurdle constitutes a central problem to the docking methodology 12.
Systems based on machine learning (ML) have been successfully used to improve the outcome of Docking-based Virtual Screening for both, increasing the performance of score functions and constructing binding affinity classifiers 3, 1. The main strategies used in virtual screening are neural networks - NN 13, support vector machines - SVM 12 and random forest - RF 14. One of the main advantages of employing ML is the capacity to explain the non-linear dependence of the molecular interactions between ligand and receptor 3.
Traditional Machine Learning strategies depend on how the data is presented. For example, in the virtual screening approaches, scientists normally analyze the docking output to generate or extract human engineered features. Although this process can be effective to some degree, the manual identification of characteristics is a laborious and complex process and can not be applied in large scale, resulting in the loss of relevant information, consequently leading to a set of features incapable of explaining the actual complexity of the problem 15, 16, 3, 1.
On the other hand, recent work on Deep Learning (DL), a family of ML approaches that minimizes feature engineering, has demonstrated enormous success in different tasks from multiple fields15, 16, 17. DL approaches normally learn features (representations) directly from the raw data with minimal or none human intervention, which makes the resulting system easier to adapt to new datasets.
In the last few years, DL is bringing the attention of the academic community and big pharmaceutical industries, as a viable alternative to aid in the discovery of new drugs. One of the first work in which DL was successfully applied solved problems related to QSAR (Quantitative Structure-Activity Relationships) by Merck in 2012. Some years later, Dahl \latinet al. 18 developed a multi-task deep neural network to predict biological and chemical properties of a compound directly from its molecular structure. More recently, Multi-task deep neural networks were employed to foresee the active-site directed pharmacophore and toxicity 19, 20. Also in 2015, Ramsundar \latinet al. 21 predicted drug activity using Massively Multitask Neural Networks associated with fingerprints. The relevance of DL is also highlighted by recent applications to the discovery of new drugs and the determination of their characteristics such as Aqueous Solubility prediction 22 and Fingerprints 23.
In this work, we propose an approach based on Deep Convolutional Neural Networks to improve docking-based virtual screening. The method uses docking simulation results as input to a Deep Neural Network, DeepVS from now on, which automatically learns to extract relevant features from basic data such as compound atom types, atomic partial charges, and the distance between atoms. DeepVS learns abstract features that are suitable to discriminate between active ligands and decoys in a protein-compound complex. To the best of our knowledge, this work is the first on using deep learning to improve docking-based virtual screening. Recent works on improving docking-based virtual screening have only used traditional shallow neural networks with human defined features 24, 13, 7.
We evaluated DeepVS on the Directory of Useful Decoys, which contains 40 different receptors. In our experiments we used the output of two docking programs: AutodockVina1.1.2 and Dock6.6. DeepVS outperformed the docking programs in both AUC ROC and enrichment factor. Moreover, when compared to the results of other systems previously reported in the literature, DeepVS achieved the state-of-the-art AUC of 0.81.
The main contributions of this work are: (1) Proposition of a new deep learning-based approach that achieves state-of-the-art performance on docking-based virtual screening; (2) Introduction of the concept of atom and amino acid embeddings, which can also be used in other deep learning applications for computational biology; (3) Proposition of an effective method to create distributed vector representations of protein-compound complexes that models the compound as a set of atom contexts that is further processed by a convolutional layer.
2 Materials and Methods
DeepVS takes as input the data describing the structure of a protein-compound complex (input layer) and produces a score capable of differentiating ligands from decoys. Figure 1 details the DeepVS architecture. First, given an input protein-compound complex , information is extracted from the local context of each compound atom (First hidden layer). The context of an atom comprises basic structural data (basic features) involving distances, neighbor atom types, atomic partial charges and associated residues. Next, each basic feature from each atom context is converted to feature vectors that are learned by the network. Then, a convolutional layer is employed to summarize information from all contexts from all atoms and generate a distributed vector representation of the protein-compound complex (second hidden layer). Finally, in the last layer (output layer), the representation of the complex is given as input to softmax classifier, which is responsible for producing the score. In Algorithm 1, we present a high-level pseudo-code with the steps of the feedforward process executed by DeepVS.
2.1.1 Atom context
First, it is necessary to perform some basic processing on the protein-compound complex to extract the input data for DeepVS. The input layer uses information from the context of each atom in the compound. The context of an atom “a” is defined by a set of basic features extracted from its neighborhood. This neighborhood comprises the atoms in the compound closest to “a” (including itself) and the atoms in the protein that are closest to “a”, where and are hyperparameters that must be defined by the user. The idea of using information from closest neighbor atoms of both compound and protein have been successfully explored in previous work on structure-based drug design 25.
The basic features extracted from the context of an atom include the atom types, atomic partial charges, amino acid types and the distances from neighbors to the reference atom. For instance (Figure 2), for the nitrogen atom (\ceN3) from THM compound, the vicinity with and is formed by \ceN3, \ceH and \ceC from the compound, and the two atoms \ceOE and \ceCD from the residue \ceGln125 in the protein Thymidine kinase (ID_PDB: 1kim). In this particular case, the context of the atom \ceN3 contains the following values for each basic feature:
Atom Type = [\ceN; \ceH; \ceC; \ceOE; \ceCD]
Charge = [-0.24; 0.16; 0.31; -0.61; 0.69]
Distance = [0.00; 1.00; 1.34; 3.06; 3.90]
Amino Acid Type = [\ceGln; \ceGln]
2.1.2 Representation of the atom context
The first hidden layer in DeepVS transforms each basic feature value of atom contexts into real-valued vectors (aka embeddings) by a lookup table operation. These embeddings contain features that are automatically learned by the network. For each type of basic feature, there is a corresponding embedding matrix that stores a column vector for each possible value for that basic feature. The matrices , , and contain the embeddings of the basic features atom type, distance, atomic partial charge, and amino acid type, respectively. These matrices constitute the weight matrices of the first hidden layer and are initialized with random numbers before training the network.
Each column in corresponds to a feature vector of a particular type of atom, where A is the set of atom types and is the dimensionality of the embedding and constitutes a hyperparameter defined by the user. Given the context of an atom “a”, the network transforms each value of the basic feature atom type in its respective feature vector and then concatenates these vectors to generate the vector atom type representation . As illustrated in Figure 3, retrieving the embedding of an atom type from consists in a simple lookup table operation. Therefore, the order of the atom type embeddings (columns) in is arbitrary and have no influence in the result. However, the order in which the embeddings of the atom types are concatenated to form matters. We always concatenate first the embeddings from atom types of the ligand, from the closest to the farthest, then we concatenate the embeddings from atom types of the protein, from the closest to the farthest.
Likewise, , , vectors are created from values of distance, charge and amino acid types in the context of the target atom. Values of the basic features charge and distance need to be discretized before being used as input for the network. We define minimum and maximum values, and respectively, to perform the discretization of charge values. Bins equally distanced by 0.05 between minimum and maximum are built. For instance, with = -1 and = 1, there will be 40 bins. Similarly, to discretize distance values, bins equally distributed by 0.3 Å will be defined in the range between and . For example, with = 0 e = 5.1 Å, there will be 18 bins.
Finally, the representation of the context of the atom “a” is defined as , comprising the concatenation of the vectors previously described. Our hypothesis is that from the basic contextual features, the network can learn more abstract features (the embeddings) that are informative about the discrimination between compound and decoys. This type of strategy, where basic features (words) are transformed into more abstract features (word embeddings), has obtained enormous success in the Natural Language Processing (LNP) field 26, 27, 28, 29, 30.
2.1.3 Representation of the protein-compound complex
The second hidden layer in DeepVS is a convolutional layer, responsible for (1) extracting more abstract features from the representations of all atom contexts in the compound, and (2) summarizing this information in a fixed-length vector . We name the vector the representation of the compound-protein complex, and it is the output of the convolutional layer.
The subjacent goal in using a convolutional layer is its ability to deal with inputs of variable sizes 31. In the case of virtual screening, different compounds can have a different number of atoms. Therefore, the number of representations of atom contexts can differ for different complexes. In DeepVS, the convolutional layer allows the processing of complexes of different sizes.
Given a complex , whose compound is composed of atoms, the input to the convolutional layer is a list of vectors , where is the representation of the context of the -th atom in the compound. In the first stage of the convolutional layer, generation of more abstract features from each vector is carried out according to:
where is the weight matrix corresponding to the convolutional layer, is a bias term, is the hyperbolic tangent function and corresponds to the resulting feature vector. The number of units (also called filters) in the convolutional layer, , is a hyperparameter defined by the user.
The second stage in the convolutional layer, also known as pooling layer, summarizes the features from the various atom contexts. The input consists of a set of vectors . For the DeepVS, we use a max-pooling layer, which produces a vector , where the value of the element is defined as the maximum of the elements of the set of input vectors, i.e.:
The resulting vector from this stage is the representation of the compound-protein complex (Eq. 2). In this way, the network can learn to generate a vector representation that summarizes the information from the complex that is relevant to discriminate between ligands and decoys.
2.1.4 Scoring of Compound-Protein Complex
The vector is processed by two usual neural network layers: a third hidden layer that extract one more level of representation, and an output layer, which computes a score for each one of the two possible classifications of the complex: (0) inactive compound and (1) active compound. Formally, given the representation generated for the complex , the third hidden layer and the output layer execute the following operation:
where is the weight matrix of the third hidden layer, is the weight matrix of the output layer, and are bias terms. The number of units in the hidden layer, , is a hyperparameter defined by the user. is a vector containing the score for each of the two classes.
Let and be the scores for the classes 0 and 1, respectively. We transform these scores in a probability distribution using the softmax function, as follows:
where we interpret and as the conditional probabilities of the compound be a decoy or a ligand, respectively, given the compound-protein complex data acquired from docking.
The likelihood of class 1 (active ligand) is the scoring used to rank ligands during Virtual Screening essays. The larger the scoring, the greater the chance the compound is an active ligand.
2.1.5 Training DeepVS
The common approach for training neural networks is the stochastic gradient descent (SGD) algorithm 32. In our case, SGD is used to minimize a loss function over a training set that contains both complexes of ligands and decoys. At each iteration, a new complex is randomly chosen, where if the complex contains an active ligand and , otherwise. Next, the DeepVS network with parameter set , , , , , , , , , is used to estimate the probability . Finally, the prediction error is computed as the negative log-likelihood, , and the network parameters are updated applying the backpropagation algorithm 32. In other words, the set of parameters of the network is learned by using SGD to select a set of values that minimize the loss function with respect to :
In our experiments, we applied SGD with minibatches, which means that instead of considering only one compound-protein complex at each iteration we consider a small set of randomly selected complexes and used the average prediction loss to perform backpropagation. We set in our cases. Also, we used Theano 33 to implement DeepVS and perform all the experiments reported in this work.
2.2 Experimental Setup
We used the Directory of Useful Decoys (DUD) 34 as a benchmark to evaluate our deep-learning-based virtual screening approach. One of the main reasons to use this dataset was the possibility of comparing our results with the ones from systems previously proposed for revalidation of scoring functions and reclassification in virtual screening 24, 11, 7. There is a problem in the partial charges of the original version of DUD that makes it trivial to discriminate between ligands and decoys. Therefore, we use the version of DUD produced by Armstrong et al.35, which contains corrected atomic partial charges.
The DUD dataset has been developed specifically to validate VS methods in a rigorous way. The dataset is composed of 40 receptors distributed in six different biological groups: Hormone nuclear receptors, kinases, serine proteases, metalloenzymes, flavoenzymes and other classes of enzymes 34. Also, it possesses 2,950 annotated ligands and 95,316 decoys, a ratio of 36 decoys for each annotated ligand. Each of the 36 decoys was retrieved from the ZINC databank, to mimic some physical property of the associated ligand, such as molecular weight, cLogP, and the number of H-bonds groups, although differing in its topology 7, 34.
2.2.2 Docking Programs
In this work, we used two different computer programs to perform molecular docking: Dock 6.6 36 and AutodockVina1.1.2 37. Both are open access and widely used in the academia to perform VS. Dock 6.6 offers physics-based energy score functions based on force fields and scoring functions (GRID score & AMBER score) 36. Autodockvina1.1.2 applies a hybrid scoring function combining characteristics of knowledge-based and empiric scoring functions (Vina score) 37.
2.2.3 Dock 6.6 setup
Protein and compound structures were prepared using computational tools from Chimera 38. Receptors were prepared using the DockPrep module from Chimera. Ligands, non-structural ions, solvent molecules, and cofactors were removed. Absent atoms were added, and Gasteiger atomic partial charges were applied. Input files were .mol2 formatted, except those receptors that were used to calculate molecular surface, in which Hydrogen atoms were removed, and were finally saved in pdb format 38. Spheres were created with radius in the range 13.5-15.5 Å, varying according to the size of the active site of the receptor. Box and grid parameters were taken directly from DUD. The docking procedure was performed according to an available script provided by Dock6.6 program 36.
2.2.4 AutodockVina1.1.2 Setup
Receptors and compound were prepared following default protocols and Gasteiger atomic partial charges were applied 37, 39. A cubic Grid of edge as 27 was defined. The center of the grid box coincided with the center of mass of the ligand. Docking runs were performed following the default settings defined in AutoDockTools 39. The only hyperparameter we changed was the global search exhaustiveness, which we set to 16 as in Arciniega & Lange (2014) 39. It is worth to note that, although AutodockVina1.1.2 can output more than one pose, in our experiments, we only consider just one, which corresponded to the pose that AutodockVina1.1.2 outputs as the best one.
2.2.5 Evaluation Approach
The performance of the proposed method is assessed using leave-one-out cross-validation with the 40 proteins from the DUD dataset. Figure 4 illustrates the process we follow to perform our leave-one-out cross-validation experiments. First, we applied either DOCK6.6 or AutodockVina1.1.2 for each protein and its respective set of ligands and decoys to generate the docked protein-compound complexes. Next, in each run, one receptor was left out of the test set while the others were employed as the training set.
To avoid distortions in the performance results of DeepVS, it was essential to remove all receptors similar to the one used as a test in a specific cross-validation iteration from the training set. Following Arciniega & Lange (2014) 7, we regarded as similar receptors those sharing the same biological class or those with reported positive cross enrichment 34. Once the network was trained, it was applied to the test receptor, producing a scoring for each of the potential ligands. Such score was used to rank the set compounds. The ranking was validated using metrics that indicate the algorithm’s performance.
2.2.6 DeepVS Hyperparameters
The main advantage in using leave-one-out cross-validation is the possibility of tuning the neural network hyperparameters without being much concerned with overfitting. In fact, leave-one-out cross-validation is a suitable method for tuning hyperparameters of machine learning algorithms when the dataset is small 40. In our experiments, we used the same set of hyperparameters for the 40 leave-one-out cross-validation iterations. This is equivalent to perform 40 different experiments with different training/test sets using the same configuration for DeepVS. The hyper-parameter values that provided our best results and were used in our experiments with both, AutodockVina1.1.2 and Dock6.6, are specified in Table 1. Note that our evaluation approach was stricter than the one used by Arciniega & Lange (2014) 7, because they tuned the hyper-parameters using a hold-out set at each leave-one-out iteration.
In the next section, we present some experimental results that detail the difference in performance when we vary some of the main hyper-parameters of DeepVS.
|Atom type embedding size||200|
|Amino Acid emb. size||200|
|Charge emb. size||200|
|Distance emb. size||200|
|# conv. filters||400|
|# hidden units||50|
|# neig. atoms from comp.||6|
|# neig. atoms from protein||2|
2.3 Evaluation Metrics
To validate DeepVS performance and compare it with other methods previously published in the literature, we used two wellestablished VS performance metrics: the enrichment factor (EF) and the area under the ROC curve (Receiver Operating Characteristic) 41, 42.
ROC curves are a way to represent the relationship between the selectivity (Se) and specificity (Sp) along a range of continuous values (Equations 7 and 8). It represents the ratio of true positives in function of the false-positives.
The area under the ROC curve (AUC) represents a quantification of the curve and facilitates the comparison of results. The AUC is calculated as given in Eq. 9, where depicts the number of actives, represents the number of decoys, and describes the number of decoys that are higher ranked than the -th active structure 41. An AUC 0.50 indicates a random selection, whereas an AUC of 1.0 indicates the perfect identification of active compounds.
Given the set of compounds ranked by score, the enrichment factor at x% (Eq. 10) informs how good is the set formed by the top x% ranked compounds compared to a set of an equal size selected at random from the entire set of compounds41, 42. The EF is computed as
3 Results and Discussion
3.1 DeepVS vs. Docking Programs
In Table 2 we report, for each of the 40 receptors in DUD, the virtual screening performance for Dock6.6, AutodockVina1.1.2 (henceforth ADV), DeepVS using Dock6.6 output (DeepVS-Dock) and DeepVS using AVD output (DeepVS-ADV). For each system, we report the AUC ROC and the enrichment factor () at 2% and 20%. We also report the maximum (), which is the maximum value of that can be achieved for a given ranked list of compounds. Among the four systems, DeepVS-ADV achieved the best average result for AUC, and . DeepVS-ADV had the best AUC for 20 out of the 40 DUD receptors.
Overall, the quality of the docking output impacts the performance of DeepVS, which is expected. The average performance of DeepVS-ADV, which had a better docking input from ADV (avrg. AUC of 0.62) produced better AUC, , and than DeepVS-Dock, whose input is based on DOCK6.6 (avrg. AUC of 0.48). On the other hand, there were some cases where the AUC of docking program was very poor, but DeepVS was able to boost the AUC result significantly. For instance, although DOCK6.6 produced an AUC < 0.40 for the receptors AR, COX1, HSP90, InhA, PDE5, PDGFrb and PR, DeepVS-Dock resulted in an AUC > 0.70 for these receptors.
In Figure 5 we compare the AUC of DeepVS-ADV and ADV for the 40 receptors. DeepVS-ADV achieved AUC > 0.70 for 33 receptors, while this number was only 13 for ADV. The number of receptors with AUC 0.50 was 2 for DeepVS-ADV and 9 for ADV. The AUC of DeepVS-ADV was higher than the one for ADV for 31 receptors. In average, the AUC of DeepVS-ADV (0.81) was 31% better than the one for ADV (0.62). Additionally, when we selected 20% of the data according to the ranked compounds, in average DeepVS-ADV’s (3.1) was 55% larger than the from the ADV (2.0).
A comparison of the AUC from DeepVS-Dock and Dock6.6 for the 40 receptors is presented in Figure 6. In average, the AUC of DeepVS-Dock (0.74) was 54% better than the one for the Dock6.6 (0.48). While Dock6.6 achieved AUC > 0.70 for 10% (4) of the receptors only, DeepVS-Dock reached AUC > 0.70 for 68% (27) of the receptors. The number of receptors with AUC 0.50 was 5 for DeepVS-Dock and 23 for Dock6.6. The AUC of DeepVS-Dock was higher than the one for Dock6.6 for 36 receptors. Finally, when we select 20% of the data according to the ranked compounds, in average DeepVS-Dock’s (3.0) was more than two times larger than the from Dock6.6 (1.3).
The experimental results presented in this section, which include outputs from two different docking programs, are strong evidence that DeepVS can be an effective approach for improving docking-based virtual screening.
3.2 DeepVS Sensitivity to Hyperparameters
In this section, we present experimental results regarding an investigation on the sensitivity of DeepVS concerning the main hyper-parameters. In our experiments, we used the ADV output as input to DeepVS. Therefore, all results reported in this section were generated using DeepVS-ADV. However, we noticed that DeepVS behaved in a similar manner when the input from Dock6.6 was used. In the experiments, we varied one of the hyper-parameters and fixed all the others to the following values: , , , and .
In Table 3, we present the experimental results of varying the basic feature embedding sizes. We can see that embedding sizes larger than 50 improved mainly the . Embeddings larger than 200 did not improve the results.
The experimental results of varying the number of filters () in the convolutional layer are presented in Table 4. Notice that the AUC improves by increasing the number of convolutional filters up to 400. On the other hand, using results in the best .
In Table 5, we present the experimental results of DeepVS trained with different learning rates. We can see that larger learning rates work better for the DUD dataset. A learning rate of 0.1 resulted in the best outcomes in terms of AUC and .
We also investigated the impact of using a different number of neighbors atoms from the compound () and the receptor (). In Table 6, we present some experimental results where we vary both and . For instance, with and , it means that no information from the compound is used, while information from the 5 closest atoms from the receptor is used. In the first half of the Table, we keep fixed to 5, and vary . In the second half of the Table, we keep fixed to 6, and vary . As we can notice in the first half of Table 6, by increasing the number of neighbor atoms we use from the compound (), we significantly increase both and AUC. In the second half of the Table, we can notice that using degrades DeepVS-ADV performance. As we conjecture in the next section, this behavior seems to be related to the quality of the docking program output.
In order to assess the robustness of DeepVS with regard to the initialization of the weight matrices (network parameters), we performed 10 different runs of DeepVS-ADV using a different random seed in each run. In Table 7, we present the experimental results of these 10 runs. We can see in the Table that the standard deviation is very small for both and AUC, which demonstrates the robustness of DeepVS to different random seeds.
3.3 Docking Quality vs. DeepVS Performance
In the experimental results reported in Table 6, we can notice that, when creating the atom contexts, using (number of neighbor atoms coming from the protein) does not lead to improved AUC or . In fact, if we use only information from the compound (), which is equivalent to perform ligand based virtual screening, the AUC is already very good (0.803). We hypothesize that this behavior is related to the quality of the docking output, which varies a lot across the 40 DUD proteins. In an attempting to test this hypothesis, we separately analyse the AUC of DeepVS for the DUD proteins for which the AUC of ADV is good (Table 8) or poor (Table 9).
In Table 8, we present the AUC of DeepVS-ADV for proteins whose AUC of ADV is larger than 0.75. We present results for three different values of , namely, 0, 2 and 5. In the three experiments we use . We can notice that for proteins that have a good docking quality (and likely the protein-compound complexes have good structural information) the average AUC of DeepVS-ADV increases as we increase . This result suggests that if the structural information is good, the neural network can benefit from it.
|auc > 0.75|
In Table 9, we present the AUC of DeepVS-ADV for proteins whose AUC of ADV is smaller than 0.5. For these proteins, which have a poor docking quality (and likely the protein-compound complexes have poor structural information) the average AUC of DeepVS-ADV decreases as we increase . This result suggests that if the structural information is poor, the neural network works better without using it.
|auc < 0.50|
3.4 Comparison with State-of-the-art System
In this section, we compare DeepVS results with the ones reported in previous work that also employed DUD. First, we perform a detailed comparison between DeepVS and the Docking Data Feature Analysis (DDFA) system 7, which also applied neural networks on the output of docking programs to perform virtual screening. Next, we compare the average AUC of DeepVS with one of other systems that also report results for the 40 DUD receptors.
DDFA uses a set of human defined features that are derived from docking output data. Examples of the features employed in DDFA are: compound efficiency, the best docking score of the compound poses and the weighted average of the best docking scores of the five most similar compounds in the docked library. The features are given as input to a shallow neural network that classifies the input protein-compound complex as an active or inactive ligand. DDFA uses data from the six best poses output by the docking program, while in DeepVS we use data from the best pose only. In Figure 7, we compare the AUC of DeepVS-ADV vs DDFA-ADV, which is the version of DDFA that uses the output of AutodockVina. In this figure, each circle represents one of the 40 DUD receptors. DeepVS-ADV produces higher AUC than DDFA-ADV for 27 receptors, which represents 67.5% of the dataset.
DDFA-ALL is a more robust version of DDFA7 that uses simultaneously the output of three different docking programs: Autodock4.2 (AD4), AutodockVina1.1.2 (ADV) and RosettaLigand3.4 (RL). Therefore, DDFA-ALL uses three times more input features than DDFA-ADV. In Figure 8, we compare the AUC of DeepVS-ADV vs DDFA-ALL. Despite using data from one docking program only, DeepVS-ADV produces higher AUC than DDFA-ALL for 25 receptors, which represents 62.5% of the dataset. This is a strong indication that DeepVS-ADV result for DUD dataset is very robust.
In Table 10, we compare the average AUC of DeepVS and the docking programs with the ones from other systems reported in the literature. DeepVS-ADV produced the best AUC among all systems, outperforming commercial docking softwares ICM and Glide SP. NNScore1-ADV and NNScore2-ADV are also based on shallow neural networks that use human defined features and the output of AutodockVina. It is worth to note that NNScore1-ADV and NNScore2-ADV 11 results are based in a different set of decoys that are simpler than the ones available in DUD. Therefore, these results are not 100% comparable with other results on the table. To the best of our knowledge, the AUC of DeepVS-ADV is the best reported so far for virtual screening using the 40 receptors from DUD.
In this work, we introduce DeepVS, a deep learning approach to improve the performance of docking-based virtual screening. Using DeepVS on top of the docking output of AutodockVina we were capable of producing the best AUC reported so far for virtual screening on the DUD dataset. This result, together with the fact that (1) DeepVS does not require human defined features, and (2) it achieves good results using the output of a single docking program, makes DeepVS an attractive approach for virtual screening. Moreover, different from other methods that use shallow neural networks with few parameters, DeepVS has a larger potential for performance improvement if more data is added to the training set. Deep Learning systems are usually trained with large amounts of data. Although the number of protein-compound complexes in DUD is relatively large (more than 100k), the number of different proteins is still very small (only 40).
Additionally, this work also brings some very innovative ideas on how to model the protein-compound complex raw data to be used in a deep neural network. We introduce the idea of atom and amino acid embeddings, which can also be used in other deep learning applications for bioinformatics. Moreover, our idea of modeling the compound as a set of atom contexts that is further processed by a convolution layer proved to be an effective approach to learning representations of protein-compound complexes.
J.C.P. is funded through a Ph.D. scholarship from the Oswaldo Cruz Foundation. E.R.C’s research is supported by following grants: Faperj/E-26/111.401/2013 and CNPq. Papes VI./407741/2012-7.
- Hecht and Fogel 2009 Hecht, D.; Fogel, G. B. Computational Intelligence Methods for Docking Scores. Current Computer-Aided Drug Design 2009, 5, 56–68.
- Walters \latinet al. 1998 Walters, W.; Stahl, M. T.; Murcko, M. A. Virtual Screening - An Overview. Drug Discovery Today 1998, 3, 160 – 178.
- Cheng \latinet al. 2012 Cheng, T.; Li, Q.; Zhou, Z.; Wang, Y.; Bryant, S. Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review. The AAPS journal 2012, 14, 133–141.
- Shoichet 2004 Shoichet, B. K. Virtual Screening of Chemical Libraries. Nature 2004, 432, 862–865.
- Ghosh \latinet al. 2006 Ghosh, S.; Nie, A.; An, J.; Huang, Z. Structure-Based Virtual Screening of Chemical Libraries for Drug Discovery. Curr Opin Chem Biol 2006, 10, 194–202.
- Bissantz \latinet al. 2000 Bissantz, C.; Folkers, G.; Rognan, D. Protein-Based Virtual Screening of Chemical Databases. 1. Evaluation of Different Docking/Scoring Combinations. Journal of Medicinal Chemistry 2000, 43, 4759–4767, PMID: 11123984.
- Arciniega and Lange 2014 Arciniega, M.; Lange, O. F. Improvement of Virtual Screening Results by Docking Data Feature Analysis. Journal of Chemical Information and Modeling 2014, 54, 1401–1411.
- Drwal and Griffith 2013 Drwal, M. N.; Griffith, R. Combination of Ligand- and Structure-Based Methods in Virtual Screening. Drug Discovery Today: Technologies 2013, 10, e395 – e401.
- Schneider 2010 Schneider, G. Virtual Screening: An Endless Staircase? Nature Reviews Drug Discovery 2010, 9, 273–276.
- Kitchen \latinet al. 2004 Kitchen, D. B.; Decornez, H.; Furr, J. R.; Bajorath, J. Docking and Scoring in Virtual Screening for Drug Discovery: Methods and Applications. Nature Reviews Drug Discovery 2004, 3, 935–949.
- Durrant \latinet al. 2013 Durrant, J. D.; Friedman, A. J.; Rogers, K. E.; McCammon, J. A. Comparing Neural-Network Scoring Functions and the State of the Art: Applications to Common Library Screening. Journal of Chemical Information and Modeling 2013, 53, 1726–1735.
- Kinnings \latinet al. 2011 Kinnings, S. L.; Liu, N.; Tonge, P. J.; Jackson, R. M.; Xie, L.; Bourne, P. E. A Machine Learning-Based Method to Improve Docking Scoring Functions and Its Application to Drug Repurposing. Journal of Chemical Information and Modeling 2011, 51, 408–419.
- Durrant and McCammon 2010 Durrant, J. D.; McCammon, J. A. NNScore: A Neural-Network-Based Scoring Function for the Characterization of Protein-Ligand Complexes. Journal of Chemical Information and Modeling 2010, 50, 1865–1871, PMID: 20845954.
- Ballester and Mitchell 2010 Ballester, P. J.; Mitchell, J. B. A Machine Learning Approach to Predicting Protein–Ligand Binding Affinity with Applications to Molecular Docking. Bioinformatics 2010, 26, 1169–1175.
- Bengio \latinet al. 2013 Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on 2013, 35, 1798–1828.
- LeCun \latinet al. 2015 LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444.
- Bengio 2009 Bengio, Y. Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning 2009, 2, 1–127.
- Dahl \latinet al. 2014 Dahl, G. E.; Jaitly, N.; Salakhutdinov, R. Multi-Task Neural Networks for QSAR Predictions. arXiv preprint arXiv:1406.1231 2014,
- Unterthiner \latinet al. 2014 Unterthiner, T.; Mayr, A.; Klambauer, G.; Steijaert, M.; Wegner, J. K.; Ceulemans, H.; Hochreiter, S. Deep Learning as an Opportunity in Virtual Screening. Proceedings of the Deep Learning Workshop at NIPS. 2014.
- Unterthiner \latinet al. 2015 Unterthiner, T.; Mayr, A.; Klambauer, G.; Hochreiter, S. Toxicity Prediction Using Deep Learning. arXiv preprint arXiv:1503.01445 2015,
- Ramsundar \latinet al. 2015 Ramsundar, B.; Kearnes, S.; Riley, P.; Webster, D.; Konerding, D.; Pande, V. Massively Multitask Networks for Drug Discovery. arXiv preprint arXiv:1502.02072 2015,
- Lusci \latinet al. 2013 Lusci, A.; Pollastri, G.; Baldi, P. Deep Architectures and Deep Learning in Chemoinformatics: the Prediction of Aqueous Solubility for Drug-like Molecules. Journal of Chemical Information and Modeling 2013, 53, 1563–1575.
- Duvenaud \latinet al. 2015 Duvenaud, D. K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Convolutional Networks on Graphs for Learning Molecular Fingerprints. Advances in Neural Information Processing Systems. 2015; pp 2215–2223.
- Durrant and McCammon 2011 Durrant, J. D.; McCammon, J. A. NNScore 2.0: A Neural-Network Receptor-Ligand Scoring Function. Journal of Chemical Information and Modeling 2011, 51, 2897–2903.
- Weber \latinet al. 2013 Weber, J.; Achenbach, J.; Moser, D.; Proschak, E. VAMMPIRE: a Matched Molecular Pairs Database for Structure-Based Drug Design and Optimization. Journal of medicinal chemistry 2013, 56, 5203–5207.
- Collobert \latinet al. 2011 Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural Language Processing (Almost) from Acratch. The Journal of Machine Learning Research 2011, 12, 2493–2537.
- Socher \latinet al. 2012 Socher, R.; Huval, B.; Manning, C. D.; Ng, A. Y. Semantic Compositionality Through Recursive Matrix-Vector Spaces. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012; pp 1201–1211.
- Mikolov \latinet al. 2013 Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems (NIPS) 2013,
- dos Santos and Gatti 2014 dos Santos, C. N.; Gatti, M. Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. COLING. 2014; pp 69–78.
- dos Santos and Zadrozny 2014 dos Santos, C. N.; Zadrozny, B. Learning Character-Level Representations for Part-of-Speech Tagging. Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014; pp 1818–1826.
- Waibel \latinet al. 1989 Waibel, A.; Hanazawa, T.; Hinton, G.; Shikano, K.; Lang, K. J. Phoneme Recognition Using Time-Delay Neural Networks. Acoustics, Speech and Signal Processing, IEEE Transactions on 1989, 37, 328–339.
- Rumelhart \latinet al. 1988 Rumelhart, D. E.; Hinton, G. E.; Williams, R. J. Learning Representations by Back-Propagating Errors. Cognitive Modeling 1988, 5, 1.
- Bergstra \latinet al. 2010 Bergstra, J.; Breuleux, O.; Bastien, F.; Lamblin, P.; Pascanu, R.; Desjardins, G.; Turian, J.; Warde-Farley, D.; Bengio, Y. Theano: a CPU and GPU Math Expression Compiler. Proceedings of the Python for Scientific Computing Conference (SciPy). 2010; p 3.
- Huang \latinet al. 2006 Huang, N.; Shoichet, B. K.; Irwin, J. J. Benchmarking Sets for Molecular Docking. Journal of medicinal chemistry 2006, 49, 6789–6801.
- Armstrong \latinet al. 2010 Armstrong, M. S.; Morris, G. M.; Finn, P. W.; Sharma, R.; Moretti, L.; Cooper, R. I.; Richards, W. G. ElectroShape: Fast Molecular Similarity Calculations Incorporating Shape, Chirality and Electrostatics. Journal of Computer-aided Molecular Design 2010, 24, 789–801.
- Lang \latinet al. 2009 Lang, P. T.; Brozell, S. R.; Mukherjee, S.; Pettersen, E. F.; Meng, E. C.; Thomas, V.; Rizzo, R. C.; Case, D. A.; James, T. L.; Kuntz, I. D. DOCK 6: Combining Techniques to Model RNA–Small Molecule Complexes. Rna 2009, 15, 1219–1230.
- Trott and Olson 2010 Trott, O.; Olson, A. J. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading. Journal of Computational Chemistry 2010, 31, 455–461.
- Pettersen \latinet al. 2004 Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera - A Visualization System for Exploratory esearch and Analysis. Journal of Computational Chemistry 2004, 25, 1605–1612.
- Morris \latinet al. 2009 Morris, G. M.; Huey, R.; Lindstrom, W.; Sanner, M. F.; Belew, R. K.; Goodsell, D. S.; Olson, A. J. AutoDock4 and AutoDockTools4: Automated Docking with Selective Receptor Flexibility. Journal of Computational Chemistry 2009, 30, 2785–2791.
- Arlot and Celisse 2010 Arlot, S.; Celisse, A. A Survey of Cross-Validation Procedures for Model Selection. Statistics Surveys 2010, 4, 40–79.
- Jahn \latinet al. 2011 Jahn, A.; Rosenbaum, L.; Hinselmann, G.; Zell, A. 4D Flexible Atom-Pairs: An Efficient Probabilistic Conformational Space Comparison for Ligand-Based Virtual Screening. J. Cheminformatics 2011, 3, 23.
- Nicholls 2008 Nicholls, A. What do We Know and when do We Know It? Journal of Computer-Aided Molecular Design 2008, 22, 239–255.
- Neves \latinet al. 2012 Neves, M. A.; Totrov, M.; Abagyan, R. Docking and Scoring with ICM: The Benchmarking Results and Strategies for Improvement. Journal of Computer-Aided Molecular Design 2012, 26, 675–686.
- Cross \latinet al. 2009 Cross, J. B.; Thompson, D. C.; Rai, B. K.; Baber, J. C.; Fan, K. Y.; Hu, Y.; Humblet, C. Comparison of Several Molecular Docking Programs: Pose Prediction and Virtual Screening Accuracy. Journal of Chemical Information and Modeling 2009, 49, 1455–1474.