Applications of Graph Integration to Function Comparison and Malware Classification
Abstract
We classify .NET files as either benign or malicious by examining directed graphs derived from the set of functions comprising the given file. Each graph is viewed probabilistically as a Markov chain where each node represents a code block of the corresponding function, and by computing the PageRank vector (Perron vector with transport), a probability measure can be defined over the nodes of the given graph. Each graph is vectorized by computing Lebesgue antiderivatives of handengineered functions defined on the vertex set of the given graph against the PageRank measure. Files are subsequently vectorized by aggregating the set of vectors corresponding to the set of graphs resulting from decompiling the given file. The result is a fast, intuitive, and easytocompute glassbox vectorization scheme, which can be leveraged for training a standalone classifier or to augment an existing feature space. We refer to this vectorization technique as PageRank Measure Integration Vectorization (PMIV). We demonstrate the efficacy of PMIV by training a vanilla random forest on 2.5 million samples of decompiled .NET, evenly split between benign and malicious, from our inhouse corpus and compare this model to a baseline model which leverages a textonly feature space. The median time needed for decompilation and scoring was 24ms. ^{1}^{1}1Code available at https://github.com/gtownrocks/grafuple
compactenumi
I Introduction
We classify .NET files as either malicious or benign by understanding the structural and textual differences between various types of labeled directed graphs resulting from decompilation. The graphs under consideration are the function call graph and the set of shortsighted data flow graphs (SDFG) derived from traversing the abstract syntax trees, one for each function in the given file.
Each SDFG is viewed as a Markov chain and is vectorized by considering both topological features of the unlabeled graphs and the textual features of the nodes. Under this paradigm, a heuristic notion of average file behavior can be defined by computing expected values of speciallychosen functions defined on the vertex sets of the given graphs against the PageRank measure.
For each graph , we construct a filtration of subsets of defined by specifying a sequence of upper bounds on the set of PageRank values. The resulting sequence of expected values corresponds to a Lebesgue antiderivative of the function . As there are typically many SDFG graphs per file, we vectorize by computing, for each pair , percentiles of , where indexes the set of SDFG’s present in the file.
Model interpretability is a consequence of our approach by construction, because each handdesigned function , and therefore its antiderivative , is interpretable.
This vectorization technique and its application to malware classification are the main contributions of this paper.
Ia Motivation
Static analysis classifiers trained on high dimensional data can suffer from susceptibility to adversarial examples (See [1] or [2]) due to a large proportion of the feature space consisting of execution and semantics agnostic file features. These include embedded unreferenced strings, certain header information, file size, etc. See [3], [4], and [5] for indepth discussions.
Gross, K., et al. [6] show that even Deep Neural Networks trained to distinguish malicious files from benign files are vulnerable to adversarial attacks. See [7] for a more recent example.
Ironically, many of these features are high area under the curve features due to the copy pasta nature of most malware, but a model trained on such features can easily be tricked by perturbing these features. This is made possible by the fact that altering these features has no effect on the runtime behavior of the file.
Graphbased feature engineering approaches address this shortcoming by considering features extracted from the semantic structure of the file.
IB Related Work
The signaturebased approach to malware detection historically has been characterized by hand picking features for the sake of either a rulebased approach or a regression approach as in [8]. Both signatures (handwritten static rules) and regression models fit into this category. This approach is effective on known samples, but is prone to overfitting. This issue was the main motivator for moving towards modeling approaches which leverage semantic structure.
The leveraging of controlflowbased vectorization of executable files for the sake of both supervised and unsupervised learning is well established in the literature, and has proven to be a technique robust to overfitting and robust to adversarial examples. See [9] for one of the first such contributions. Other early approaches involved differentiating files based on sequences of api calls. In [10] the author builds a model based on ngrams of api calls. See [11] or [12] for similar approaches.
In addition to the sequential structure of function calls, one can also take into account the combinatorial graph structure of the calling relationships. Anderson, B., et al. [13] construct graph similarity kernels by viewing control flow graphs as Markov chains. They construct a malicious/benign classifier with these kernels, which showed significant improvement over a model built only on function call ngrams.
Chae et al. [14] successfully leveraged the information present in the combinatorial structure of the control flow graph to compute the sequences and frequency of API’s by considering a random walk kernels similar to those constructed above. See [15] for a similar approach.
We restrict our attention in this work to decompiled .NET, but the graphbased approach has been leveraged successfully in the similar realm of disassembly. Indeed, [16] discusses the use of graph similarity to compare disassembled files, which results in a kind of filelevel isomorphism useful for finding trojans. Similar kernel methods applied to graphs arising via disassembly have been shown to be effective at detecting selfmutating malware by measuring the similarity between observed control flow graphs and known control flow graphs associated with malware. See [17] for details.
Deep learning has also been used to extend similarity detection by constructing neural networks built on top of features derived from graph embeddings in order to measure crossplatform binary code similarity. In [18] the neural network learns a graph embedding for the sake of measuring control flow graph similarity. See [19] for a similar approach using graph convolutional networks. Graph embedding for the sake of measuring control flow similarity has also been applied to bug search and plagiarism detection. See [20], [21], or [22] for further details and [23] for a mathematical exposition of graph embedding.
Reinforcement learning has also been used in the security space to train models robust to adversarial examples created via gradientbased attacks on differentiable models, or genetic algorithmbased attacks on nondifferentiable models. See [24] for further discussion and the authors’ gametheoretic reinforcement approach to adversarial training.
Pure characterlevel sequence approaches (LSTM/GRU), which do not necessarily leverage the combinatorial structure of function call or control flow graphs have also been explored, as in [25]. The authors first train a language model in order to learn a feature representation of the file and then train a classifier on this latent representation. See [26] for a more basic RNN approach.
Our approach combines a graphbased feature representation with the interpretability of a logistic regression, while avoiding the training and architectural complexity common to stateoftheart graph convolutional neural networks.
Ii Data
The dataset used in this work was curated from our internal corpus and consisted of 25 million samples of .NET with 2.5 million remaining post deduplication, evenly split between benign and malicious.
The deduplication process involved decompiling, hashing each resulting function, sorting and concatenating these hashes, and then hashing the result.
Labels were assigned via the rule: label(file) == malicious iff any{labelfile malicious}, where indexes the set of vendors participating on virustotal [27] at the time of labeling.
Iii Decompilation of .NET
Decompilation is a program transformation by which compiled code is transformed into a highlevel humanreadable form, and is used in this work to study the control flow of the files in our .NET corpus. Program control flow is understood by studying the structure of two types of control flow graphs resulting from decompilation. The function call graph describes the calling structure of the functions (subroutines) constituting the overall program. The control flow of each constituent function is understood by constructing a graph from the set of possible traversals of the associated abstract syntax tree.
Iiia Abstract Syntax Trees
An abstract syntax tree is a binary tree representation of the syntactic structure of the given routine in terms of operators and operands.
For example, consider the expression
consisting of mathematical operators and numeric operands. We may express the syntactic structure of this expression with the binary tree:
action character=@
delay=content=#1, for tree=edge path= >,\forestoption{edge}](!u.parentanchor)(.childanchor)\forestoption{edgelabel};},}@++[*[5][3]][+[4][*[%[2][2]][8]]]] The root and the subsequent internal nodes represent operators and the leaves represent operands. The distilled semantic structure in this case is the familiar order of operations for arithmetic expressions.
More generally, each node of an AST represents some construct occurring in the source code, and a directed edge connects two nodes if the code representing the target node conditionally executes immediately after the code represented by the source node. These trees facilitate the distillation of the semantics of the program.
IiiB Abstract Syntax Trees for the CLR
Each node of a given AST is labeled by an operation performed on the Common Language Runtime (CLR) virtual machine. A subset of these operations is listed as follows (see Appendix B for the complete list and the details thereof):

AddressOf

Assignment

BinaryOp

break

Call

ClassRef

CLRArray

continue

CtorCall

Dereference

Entrypoint

FieldReference

FnPtrObj

LocalVar
Example III.1.
An AST snippet from a benign .NET sample.
{…
”30”:{
”type”:”LocalVar”,
”name”:”variable7”
},
”28”:{
”type”:”LocalVar”,
”name”:”locals[0]”
},
”29”:{
”type”:”CLRVariableWithInitializer”,
”varType”:”System.Web.UI”,
”name”:”variable8”,
”value”:”28”
},
”64”:{
”fnName”:”AddParsedSubObject”,
”type”:”Call”,
”target”:”62”,
”arguments”:[
”63”
]
…}
As shown in the example, the metadata available at each node is a function of the CLR operation being performed at that node.
IiiC Traversals of Abstract Syntax Trees
We consider all possible execution paths through a given abstract syntax tree and merge these paths together to form a shortsighted data flow graph (SDFG). Consider the following code snippet:
Example III.2.
Small code block resulting in a nonlinear SDFG.
if foo() {
bar();
}
else {
baz();
}
bla();
The two possible execution paths through this code snippet are given by and . See Figure 1 for the resulting SDFG.
IiiD Function Call Graphs
The function call graph represents the calling relationships between the subroutines of the file. The function call graphs in our corpus tended to be less linear than the SDFGs, and contained features which improved accuracy. Notably these features were purely graphbased and were not derived via the imposition of a Markov structure, PageRank computation, or subsequent Lebesgue integration.
Iv The PageRank Vector
The PageRank [28] vector describes the longrun diffusion of random walks through a strongly connected directed graph. Indeed, the probability measure over the nodes obtained via repeated multiplication of an initial distribution vector over the nodes by the associated probability transition matrix converges to the PageRank vector, and is in practice a very efficient method for computing it to a close approximation.
Intuitively, the PageRank vector is obtained by considering many random walks through the given graph and for each node computing the number of times we observed the walker at the given node as a proportion of all observations. See [29] for more details.
Viewing the graph in question as a Markov chain we order the vertices of the graph and define the probability transition matrix by
(1) 
where is the set of edges emanating from vertex and .
In order to apply the PerronFrobenius theorem, the probability transition matrix constructed via rownormalizing the adjacency matrix , where if there is an edge from node to node and 0 otherwise, must be irreducible. To this end, we add a smoothing term to obtain the matrix
(2) 
where
The addition of the term ensures the irreducibility of as required by the PerronFrobenius theorem, where is the probability of the Markov chain moving between any two vertices without traversing an edge and governs the extent to which the topology of the original graph is ignored. See Figure 2.
The resulting Markov chain is defined by
where in this work we heuristically set . The sensitivity of the results to is left to a future paper.
Note IV.1.
One can view this concept in the context of a running program as the repeated calling of a particular function as represented by the SDFG, where a particular execution path is viewed as a random walk through the graph. See [29] for more details.
Theorem IV.2.
PerronFrobenius. If is an irreducible matrix then has a unique eigenvector with eigenvalue 1.
The eigenvector is such that , so defines a probability measure over the vertices of , which we will write as , or just if the reference graph is either clear from the context or irrelevant.
V Integration of Functions on Graphs
Given two labeled graphs , and a mapping , where assigns a real number to each element of the disjoint union based on the label of , the connectivity at , or some other scheme, a pointwise comparison of and may not be possible. Consider for example the simple case of .
We address this difficulty by defining a probability measure for each , where is a set of labeled directed graph. Then for any subset , we can directly compare the Lebesgue integrals and .
Let be the PageRank vector given by the unique left eigenvector with eigenvalue 1 of the probability transition matrix of the directed graph , viewed as a Markov chain. Each file under consideration contains multiple graphs, and we wish to find a way to not only compare these graphs, but understand the ensemble of graphs in the given file.
Let be a partition of and let be a directed graph. Let be the probability measure on given by the PageRank vector . Consider a function . The function
(3) 
where , and . Mathematically is the Lebesgue antiderivative of over with measure given by .
The above process of building a function on from a graph and a rule which can be applied consistently to any element of can be formulated as a mapping
(4) 
where is the set of functions from to .
Vi Similarity Measure on Graph Space
Let be the set of directed graphs with vertices labeled from the alphabet and let be a set of functions defined on . Define the vectorization map
(5) 
where the expected value is taken with respect to the PageRank measure as defined in the previous section.
We construct a similarity function via (5)
for , , and .
Definition VI.1.
A metric on a set is a function
satisfying \enit@toodeep
\enit@after
Condition (ii) is satisfied since for all and (iii) is satisfied since for all by construction.
However, it is possible that for , meaning that while is effective as a measure of similarity of labeled directed graphs, it is not a metric on .
Indeed, let , , and let where is defined by . Then , which implies . The graphs and have the same topology and the same combinatorial structure, but the set of functions is insufficient to distinguish from .
Additional conditions must be imposed on and the functions defined thereon in order to guarantee the injectivity of , a necessary condition for to define a metric. We leave this analysis to a future paper.
Vii Application of Lebesgue Integration on Graphs to SDFGs
The machinery developed in the previous sections lends itself to two immediate applications.
The first is the use of the vectorization map
(6) 
(7) 
applied to .NET files, constructed via decompilation followed by integration of selected functions on SDFGs as described in Equation (4), to i) construct an class classifier on a given corpus of labeled .NET files, and ii) cluster these files in using any of the classic metrics defined on Euclidean space.
The second application is classification and clustering of .NET files within the metric space , described in Section VI. The remainder of this paper concerns the applications of the vectorization map .
Viia Feature Hashing
Feature hashing allows for the vectorization of data which is both categorical in nature and is such that the full set of categories is unknown at the time of vectorization. We construct a hash map on strings by wrapping the hash function from the Python standard library as follows:
We take a log in order to bring the integer resulting from the hash function down to a more aesthetic size. This has no effect on the model as random forests are agnostic to the magnitudes of feature values.
ViiB Functions on SDFGs
ViiC Lebesgue Integration of Functions on SDFGs
Because the nodeset of any SDFG is finite and the PageRank measure defined thereon is discrete, the Lebesgue antiderivatives of the functions defined in the previous section take the form of sequences of dot products.
We illustrate the nature of the map via an example.
Example VII.1.
Consider a SDFG G representing the traversals of some function’s abstract syntax tree. Assume and that PageRank. Assume the nodes both correspond to function calls , where represent the set of arguments passed to .
(8) 
(9) 
Take the partition of defined by
The Lebesgue antiderivative of NumPass2Call on
takes the form
In general, each entry of the vector is a linear combination of the form , where is the element of the PageRank vector assigned to node and is a real number resulting from applying to node .
Viii Vectorization of Function Call Graphs
The features of the form extracted from the function call graphs are limited to:
This function, unlike those applied to the SDFGs, is not integrated. We simply include the cryto flag as a feature directly.
The remaining features extracted from the function call graphs are combinatorial and topological in nature.
Let be the function call graph for a single .NET file and let be the connected components thereof. Let represent the number of edges connected to the vertex . Let where is the number of nodes of component . We extract the following features:

max()/min()


mean

std


Ix Experiments
We compare PMIV to a baseline method we call Uniform Measure Integration Vectorization (UMIV).
Uniform Measure Integration Vectorization is similar to PMIV in that the method is defined by computing a graphbased integral of functions defined over the node sets, where these functions are exactly those used for PMIV. The critical difference is that UMIV is defined via integration against the uniform measure.
This means that instead of computing
as in defined via Equation (3), we compute
(10) 
i.e., a simple average of the given function over the node set of the given graph. PMIV and UMIV similarly leverage the textual information embedded in SDFGs, but UMIV ignores the combinatorial structure of the SDFGs.
Ixa Parsing (same for PMIV and UMIV)
Each .NET file is decompiled resulting in i) an abstract syntax tree for each function within the file and ii) the function call graph. The abstract syntax trees are traversed individually resulting in a single SDFG for each function within the file. The function call graph is a directed graph indicating which functions call which other functions.
Example IX.1.
Consider the C# program
using System;
class Hello
{
static void Main()
{
Console.WriteLine(”Hello, World!”);
}
}
Three graphs result from decompilation  an empty function call graph and two linear SDFG graphs. See Appendix LABEL:DECOMPILATION for the decompiler output.
IxB Vectorization (PMIV)
Each file is vectorized by applying both the vectorization map (6) to the set of shortsighted data flow graphs (many per file) and the vectorization of the function call graph (one per file) as described in Section VIII.
IxB1 Sdfg
Given a file marked by its hash , we consider a set of SDFG graphs obtained by decompiling .
For each function , hash , and partition , we can compute the values
We can then compute both the mean and standard deviation of the set for each . As the number of SDFGs varies by file, this is necessary to guarantee that every file in the corpus can be mapped to for some fixed .
The file is mapped, via integrating over , to the feature space given by coordinates .
The file is then described by the feature vector given by
where is an index running over the set of functions . See Figure 3 for a resulting feature histogram for a particular .
IxB2 Function Call Graphs
The function call graph features are included as components of the final filelevel vector directly without computing means and standard deviations, as there is a single such graph per file.
IxC Vectorization (UMIV)
Files are vectorized in essentially the same way as in PMIV, except that we assign the probability to each vertex for a SDFG.
The vectorization scheme is defined in Equation (10), and the final reduction of these values across SDFGs into a single vector corresponding to a single file is identical to that of PMIV.
IxD Algorithm
We train a separate random forest ([30][31]) for each vectorization method, each with identical hyperparameters.
A random forest is an ensemble learning method for classification, regression, and other tasks that operates by constructing a multitude of decision trees at training time and scoring via a polling (classification) or averaging (regression) procedure over its constituent trees.
This algorithm is especially valuable in malware classification as scoring inaccuracy caused by unavoidable label noise is somewhat mitigated by the ensemble.
IxE Training and Validation (same for PMIV and UMIV)
The .NET corpus was first deduplicated via decompilation by first decompiling each file, hashing each resulting graph, lexicographically sorting and concatenating these hashes, and then hashing the result.
The deduplicated corpus was split into training (70%), validation (10%), and test (20%) sets. We used the grid search functionality of scikitlearn with crossvalidation for hyperparameter tuning of the random forest. The optimal model is described in Table I.
max leaf nodes  None 

min samples leaf  1 
warm start  False 
min weight fraction leaf  0 
oob score  False 
min samples split  2 
criterion  gini 
class weight  None 
min impurity split  2.09876756095e05 
n estimators  480 
max depth  None 
bootstrap  True 
max features  sqrt 
X Experimental Results
Xa Accuracy, Precision, Recall (PMIV and UMIV)
The model is 98.3% accurate on the test set using only 400 features, which is tiny for a static classifier.
The precision on malicious files was 98.94%, meaning that of the files classified as malicious by the model, 98.94% of them were actually malicious. Precision on benign files was 97.88% and recall on benign files was 99.37%.
The recall on malicious files was 96.47%, meaning that of the malicious files, 96.47% of them were correctly scored as malicious. Of the four precision/recall values, malicious recall was the weakest. There are very likely features of malicious .NET files that are not captured by the set of functions we currently leverage to construct our feature space.
As shown in tables II and III, our graphstructurebased vectorization method PMIV outperforms our baseline UMIV method by wide margins, demonstrating the efficacy of our graph integration construction.
Class  Precision  Recall  F1score  Support 

Benign  97.88%  99.37%  98.62%  696827 
Malware  98.94%  96.47%  97.69%  424420 
avg/total  98.28%  98.27%  98.27%  1121247 
False Positive Rate  1.10%  
False Negative Rate  1.72% 
Class  Precision  Recall  F1score  Support 

Benign  90.61%  87.04%  88.79%  696827 
Malware  87.80%  91.18%  89.46%  424420 
avg/total  89.19%  89.13%  89.13%  1121247 
False Positive Rate  8.79%  
False Negative Rate  12.96% 
Xi Conclusion
We have engineered a robust control flow graphbased vectorization scheme for exposing features which reveal semantically interesting constructs of .NET files. The vectorization scheme is interpretable and glassbox by construction, which will facilitate scalable taxonomy operations in addition to highaccuracy classification as benign or malicious.
The control flowtype graphs include both function call graphs, one for each file, and SDFG graphs, one for each function defined within the file. Leveraging the combinatorial structure of these graphs results in a rich feature space, within which even a simple classifier can effectively distinguish between benign and malicious files.
The vectorization scheme introduced here may be leveraged to train a standalone model or to augment the feature space of an existing model. Although we limited our experiments to decompiled .NET, we see no obstruction to applying the PMIV concept to a wider class of graphbased file data, such as disassembly.
Future work will involve the addition of new functions for control flowtype graphs to the vectorization scheme, as well as the clustering of files and the functions of which they consist within both the codomain of the vectorization map and within the graph space . We will also explore the extent to which these functions and files can be parameterized through manifold learning in Euclidean as well as graph space.
Acknowledgment
The authors would like to thank former colleague Brian Wallace for both deduplicating our .NET corpus and applying the decompiler at scale. Without his efforts, this project would not have been possible.
References
 [1] Gong Y., Li, B., Poellabauer, C., Shi, Y. (2019) RealTime Adversarial Attacks arXiv: 1905.13399v1 [cs.CR]
 [2] Zhang, W. E., Sheng, Q. Z., Alhazmi, A. (2019) Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey arXiv: 1901.06796v3 [cs.CL]

[3]
Suciu, O., Coull, S. E., Johns, J. (2019)
Exploring Adversarial Examples in Malware Detection.
arXiv: 1810.08280v3 [cs.LG]
 [4] Dahl, G.E., Stokes, J.W., Deng, L., Yu, D. Largescale malware classification using random projections and neural networks. Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on pages 34223426. IEEE, 2013
 [5] Liu, X., Lin, Y., Li, H., Zhang, J. (2018) Adversarial Examples: Attacks on Machine Learningbased Malware Visualization Detection Methods arXiv: 1808.01546v1 [cs.CR]
 [6] Gross, K., Papernot, N., Manoharan, P., Blackes, M., McDaniel, P. Adversarial perturbations against deep neural networks for malware classification. arXiv:1606.04435, 2016.
 [7] Demetrio, L., Biggio, B., Lagorio, G., Roli, F., Armando, A. Explaining Vulnerabilities of Deep Learning to Adversarial Malware Binaries arXiv: 1901.03583v2, 2019 [cs.CR]
 [8] Raman, K. et al. Selecting features to classify malware. InfoSec Southwest 2012
 [9] Allen, F.E. (1970) Control flow analysis ACM Sigplan Notices volume 5, pages 119. ACM, 1970
 [10] Kolter, J.Z., Maloof, M.A. Learning to detect malicious executables in the wild. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining pages 470478. ACM, 2004.
 [11] Zongqu Zhao. (2011) A virus detection scheme based on features of control flow graph. Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), 2011 2nd International Conference on, pages 943947, Aug 2011.

[12]
Li, C., Zhu, R., Niu, D., Mills, K., Zhang, H., Kinawi, H. (2018)
Android Malware Detection based on Factorization Machine.
arXiv: 1805.11843 [cs.CR]
 [13] Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T. (2011) Graphbased malware detection using dynamic analysis. Journal in computer virology 7(4):247258, 2011.
 [14] Chae, D., Ha, J., Kim, S., Kang B., Im, E.G. (2013) Software plagiarism detection: a graphbased approach. Proceedings of the 22nd ACM International Conference on Information and Knowledge Management pages 15771580. ACM, 2013.
 [15] Parvez Faruki, Vijay Laxmi, M. S. Gaur, and P. Vinod (2012) Mining control flow graph as api callgrams to detect portable executable malware. Proceedings of the Fifth International Conference on Security of Information and Networks, SIN ‘12, pages 130137, New York, NY, USA, 2012. ACM.
 [16] Dullien, T., Rolles, R. (2005) Graphbased comparison of executable objects (English version). SSTIC 5 (2005), 13
 [17] Bruschi, D., Martignoni, L., Monga, M. (2006) Detecting selfmutating malware using controlflow graph matching. International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment pages 129143. Springer, 2006.
 [18] Xu, X. et al., (2018) Neural Networkbased Graph Embedding for CrossPlatform Binary Code Similarity Detection arXiv: 1708.06525 [cs.CR]
 [19] Phan, A.V., Nguyen, M.L., Bui, L.T. (2018) Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction arXiv: 1802.04986 [cs.SE]
 [20] Feng, Q., et al. (2016) Scalable Graphbased Bug Search for Firmware Images. ACM Conference on Computer and Communication Security (CC’ 16)

[21]
Sun, X., Zhongyang, Y., Xin, Z., Mao, B., Xie, L. (2014)
Detecting code reuse in android applications using componentbased control flow graph.
IFIP International Information Security Conference
pages 142155. Springer, 2014.
 [22] Saxe, J., Berlin, K. Deep neural network based malware detection using two dimensional binary program features. Malicous and Unwanted Software (MALWARE), 2015 10th International Conference on pages 1120. IEEE, 2015.
 [23] Goyal, P., Ferrara, E., (2017) Graph Embedding Techniques, Applications, and Performance: A Survey. arXiv: 1705.02801 [cs.SI]

[24]
Anderson, H.S., Kharkar, A., Filar, B., Evans, D., Roth, P.
Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning
arXiv: 1801.08917v2 [cs.CR]

[25]
Athiwaratkun, B., Stokes, J.W.
Malware classification with LSTM and GRU language models and a characterlevel CNN.
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on
pages 24822486. IEEE, 2017.
 [26] Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A. Malware classification with recurrent networks. Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference pages 19161920. IEEE, 2015.
 [27] VirusTotal https://www.virustotal.com.
 [28] Brin, S. and Page, L. (1998). The Anatomy of largescale hypertextual Web search engine Computer Networks and ISND Systems, 30(17), pp. 107117
 [29] Chung, F. and Zhao, W. ”PageRank and Random Walks on Graphs”. Fete of Combinatorics and Computer Science Springer Berlin Heidelberg, 2010. 4362 Bolyai Society Mathematical Studies, Vol 20.
 [30] Ho, T.K. (1995). Random Decision Forests Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC 1416 August 1995. pp.278282

[31]
Ho, T.K. (1998).
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
20 (8): 832844
a Complete List of Functions on SDFGs
All functions are assumed to be zero on nodes for which the associated AST member is inconsistent with the function definition. For example, NumPass2Call is trivial on all nonCall nodes.
Note that value, expr are floats and are ints.
B CLR AST Dictionary
The dictionary of terms relating to the CLR is as follows: