DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning

Abstract

Computer programs written in one language are often required to be ported to other languages to support multiple devices and environments. When programs use language specific APIs (Application Programming Interfaces), it is very challenging to migrate these APIs to the corresponding APIs written in other languages. Existing approaches mine API mappings from projects that have corresponding versions in two languages. They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings. In this paper, we propose an intelligent system called DeepAM for automatically mining API mappings from a large-scale code corpus without bilingual projects. The key component of DeepAM is based on the multi-modal sequence to sequence learning architecture that aims to learn joint semantic representations of bilingual API sequences from big source code data. Experimental results indicate that DEEPAM significantly increases the accuracy of API mappings as well as the number of API mappings, when compared with the state-of-the-art approaches.

1 Introduction

Programming language migration is an important task in software development [\citeauthoryearMossienko2003, \citeauthoryearHassan and Holt2005, \citeauthoryearTonelli et al.2010]. A software product is often required to support a variety of devices and environments. This requires developing the software product in one language and manually porting it to other languages. This procedure is tedious and time-consuming. Building automatic code migration tools is desirable to reduce the effort in code migration.

However, current language migration tools, such as Java2CSharp1, require users to manually define the migration rules between the respective program constructs and the mappings between the corresponding Application Programming Interfaces (APIs) that are used by the software libraries of the two languages. For example, The API BufferedReader.read in Java should be mapped to StreamReader.read in C# . Such a manual procedure is tedious and error-prone. As a result, only a small number of API mappings are produced [\citeauthoryearZhong et al.2010].

To reduce manual effort in API migration, several approaches have been proposed to automatically mine API mappings from a software repository [\citeauthoryearNguyen et al.2014, \citeauthoryearPandita et al.2015, \citeauthoryearZhong et al.2010]. For example, Nguyen et al. [\citeyearstaminer] proposed StaMiner that applies statistical machine translation (SMT) [\citeauthoryearKoehn et al.2003] to bilingual projects, namely, projects that are released in multiple programming languages. It first aligns equivalent functions written in two languages that have similar names. Then, it extracts API mappings from the paired functions using the phrase-based SMT model [\citeauthoryearKoehn et al.2003].

However, existing approaches rely on the sparse availability of bilingual projects. The number of available bilingual projects is often limited due to the high cost of manual code migration. For example, we analyzed 11K Java projects on GitHub which were created between 2008 to 2014. Among them, only 15 projects have been manually ported to C# versions. Therefore, the number of API mappings produced by existing approaches is rather limited. In addition, given bilingual projects, they need aligning equivalent functions using name similarity heuristics. Only a portion of functions in a bilingual project have similar function names and can be aligned [\citeauthoryearZhong et al.2010].

In this paper, we propose DeepAM (Deep API Migration), a novel, deep learning based system to API migration. Without the restriction of using bilingual projects, DeepAM can directly identify equivalent source and target API sequences from a large-scale commented code corpus. The key idea of DeepAM is to learn the semantic representations of both source and target API sequences and identify semantically related API sequences for the migration. DeepAM assigns to each API sequence a continuous vector in a high-dimensional semantic space in such a way that API sequences with similar vectors, or “embeddings”, tend to have similar natural language descriptions.

In our approach, DeepAM first extracts API sequences (i.e., sequences of API invocations) from each function in the code corpus. For each API sequence, it assigns a natural language description that is automatically extracted from corresponding code comments. With the API sequence, description pairs, DeepAM applies the sequence-to-sequence learning [\citeauthoryearCho et al.2014] to embed each API sequence into a fixed-length vector that reflects the intent in the corresponding natural language description. By jointly embedding both source and target API sequences into the same space, DeepAM aligns the equivalent source and target API sequences that have the closest embeddings. Finally, the pairs of aligned API sequences are used to extract general API mappings using SMT.

To our knowledge, DeepAM is the first system that applies deep learning techniques to learn the semantic representations of API sequences from a large-scale code corpus. It has the following key characteristics that make it unique:

  • Big source code: DeepAM enables the construction of large-scale bilingual API sequences from big code corpus rather than limited bilingual projects. It learns API semantic representations from 10 million commented code snippets collected over seven years.

  • Deep model: The multi-modal sequence-to-sequence learning architecture ensures the system can learn deeper semantic features of API sequences than the traditional shallow ones.

2 Related Work

API migration has been investigated by many researchers [\citeauthoryearNguyen et al.2014, \citeauthoryearPandita et al.2015, \citeauthoryearZhong et al.2010]. Zhong et al. [\citeyearMAM] proposed MAM, a graph based approach to mine API mappings. MAM builds on projects that are released with multiple programming languages. It uses name similarity to align client code of both languages. Then, it detects API mappings between these functions by analyzing their API Transformation Graphs. Nguyen et al. [\citeyearstaminer] proposed StaMiner that directly applies statistical machine translation to bilingual projects.

However, these techniques require the same client code to be available on both the source and the target platforms. Therefore, they rely on the availability of software packages that have been ported manually from the source to the target platform. Furthermore, they use name similarity as a heuristic in their API mapping algorithms. Therefore, they cannot align equivalent API sequences from client code which are similar but independently-developed.

Pandita et al. [\citeyearTMAP] proposed TMAP, which applies the vector space model [\citeauthoryearManning et al.2008], an information retrieval technique, to discover likely mappings between APIs. For each source API, it searches target APIs that have similar text descriptions in their API documentation. However, the vector space model they applied is based on the bag-of-words assumption; it cannot identify sentences with semantically related words and with different sequences of words.

Recently, deep learning technology [\citeauthoryearSutskever et al.2014, \citeauthoryearCho et al.2014] has been shown to be highly effective in various domains (e.g., computer vision and natural language processing). Researchers have begun to apply this technology to tackle some software engineering problems. Huo et al. propose a neural model to learn unified features from natural and programming languages for locating buggy source code [\citeauthoryearHuo et al.2016]. Gu et al. apply sequence-to-sequence learning to generate API sequences from natural language queries [\citeauthoryearGu et al.2016]. Hence, this study constitutes the first attempt to apply the deep learning approach to migrate APIs between two programming languages.

3 Method

Let = denote a set of API sequences where =[] denotes the sequence of API invocations in a function. Suppose we are given a set of source API sequences = (i.e., API sequences in a source language) and a set of target API sequences = (i.e., API sequences in a target language). Our goal is to find an alignment between and , namely,

(1)

so that each source API sequence is mapped to an equivalent target API sequence .

Since and are heterogeneous, it is difficult to discover the correlation  directly. Our approach is based on the intuition of “third party translation”. That is, although and are heterogeneous, in the sense of vocabulary and usage patterns, they can all be mapped to high-level user intents described in natural language. Thus, we can bridge them through their natural language descriptions. For each , we assume that there is a corresponding natural language description =[] represented as a sequence of words.

The idea can be formulated with Joint Embedding(a.k.a., multi-modal embedding) [\citeauthoryearXu et al.2015], a technique to jointly embed/correlate heterogeneous data into a unified vector space so that semantically similar concepts across the two modalities occupy nearby regions of the space [\citeauthoryearAndrej and Li2015]. In our approach, the joint embedding of and can be formulated as:

(2)

where is a common vector space representing the semantics of API sequences; is an embedding function to map into , is an embedding function to map into , = is the space of natural language descriptions. is a function to translate from the semantic representations  to corresponding natural language descriptions .

Figure 1: An Illustration of Joint Semantic Embedding

Through joint embedding, and can be easily correlated through their semantic vectors and . Figure 1 shows an illustration of joint semantic embedding between Java and C# API sequences. We are given a corpus of API sequences (in both Java and C# ) and the corresponding natural language descriptions. Each API sequence is embedded (through or ) and translated (through ) to its corresponding description. The yellow and blue points represent embeddings of Java and C# APIs respectively. Through traning, the Java API sequence BufferedWriter.new BufferedWriter.write and the C# API sequence StreamWriter.newStreamWriter.write are embedded into a nearby place in order to generate similar corresponding descriptions write text to file and save to a text file. Therefore, the two API sequences can be identified as semantically equivalent API sequences.

3.1 Learning Semantic Representations of API Sequences

In our approach, the semantic embedding function ( or ) and the translation function  are realized using the RNN-based sequence-to-sequence learning framework [\citeauthoryearCho et al.2014]. The sequence-to-sequence learning is a general framework where the input sequence is embedded to a vector that represents the semantic representation of the input, and the semantic vector is then used to generate the target sequence. The model that embeds the sequence to a vector (i.e., or ) is called “encoder”, and the model that generates the target sequence(i.e., ) is called “decoder”.

The framework of the sequence-to-sequence model applied to API semantic embedding is illustrated in Figure 2. Given a set of API sequence, description pairs , The encoder (a bi-directional recurrent neural network [\citeauthoryearMikolov et al.2010]) converts each API sequence =[], to a fixed-length vector  using the following equations iteratively from = 1 to :

(3)
(4)

where = represents the hidden states of the RNN at each potion  of the input; [;] represents the concatenation of two vectors, and are trainable parameters in the RNN, is the activation function.

Figure 2: The sequence-to-sequence learning framework for API Semantic Embedding. A bidirectional RNN is used to concatenate the forward and backward hidden states as the semantic representations of API sequences

The decoder then uses the encoded vector to generate the corresponding natural language description  by sequentially predicting a word  conditioned on the vector  as well as previous words .

(5)
(6)
(7)

where = represents the hidden states of the RNN at each potion  of the output; , , and are trainable parameters in the decoder RNN.

Both the encoder and decoder RNNs are implemented as a bidirectional gated recurrent neural network (GRU) [\citeauthoryearCho et al.2014] which is a widely used implementation of RNN. Both GRUs have two hidden layers, each with 1000 hidden units.

3.2 Joint Semantic Embedding for Aligning Equivalent API Sequences

For joint embedding, we train the sequence-to-sequence model on both and to minimize the following objective function:

(8)

where and are the total number of source and target training instances, respectively. is the length of each natural language sentence. denotes model parameters, while (derived from Equation 3 to 7) denotes the likelihood of generating the -th target word given the API sequence  according to the model parameters .

After training, each API sequence =[ is embedded to a vector that reflects developer’s high-level intent. We identify equivalent source and target API sequences as those having close semantic vectors.

4 Implementation

In this section, we describe the detailed implementation of DeepAM, a deep-learning based system we propose to migrate API usage sequences. Figure 3 shows the overall workflow of DeepAM. It includes four main steps. We first prepare a large-scale corpus of API sequence, description pairs for both Java and C# (Step 1). The pairs of both languages are jointly embedded by the sequence-to-sequence model as described in Section 3.2 (Step 2). Then, we identify related Java and C# API sequences according to their semantic vectors (Step 3). Finally, a statistical machine translation component is used to extract general API mappings from the aligned bilingual API sequences (Step 4).

In theory, our system could migrate APIs between any programming languages. In this paper we limit our scope to the Java-to-C# migration. The details of each step are explained in the following sections.

Figure 3: The Overall Workflow of DeepAM

4.1 Gathering a Large-scale API Sequence-to- Description Corpus

We first construct a large-scale database that contains API sequence, description pairs for training the model. We download Java and C# projects created from 2008 to 2014 from GitHub2. To remove toy or experimental programs, we only select the projects with at least one star. In total, we collected 442,928 Java projects and 182,313 C# projects from GitHub.

Having collected the code corpus, we extract API sequences and corresponding natural language descriptions: we parse source code files into ASTs (Abstract Syntax Trees) using Eclipse’s JDT compiler3 for Java projects, and Roslyn4 for C# projects. Then, we extract the API sequence from individual functions using the same approach in [\citeauthoryearGu et al.2016].

To obtain natural language descriptions for the extracted API sequences, we extract function-level code summaries from code comments. In both Java and C# , it is the first sentence of a documentation comment5 for a function. According to the Javadoc guidance6, the first sentence of a documentation comment is used as a short summary of a function. Figure 4 shows an example of documentation comments for a C# function TextFile.ReadFile7 in the Gitlab CI project.

Figure 4: An example of extracting an API sequence and its description from a C# function TextFile.ReadFile

Finally, we obtain a database consisting of 9,880,169 API sequence, description pairs, including 5,271,526 Java pairs and 4,608,643 C# pairs.

4.2 Model Training

We train the sequence-to-sequence model on the collected API sequence, description pairs of both Java and C# . The model is trained using the mini-batch stochastic gradient descent algorithm (SGD) [\citeauthoryearBottou2010] together with Adadelta [\citeauthoryearZeiler2012]. We set the batch size as 200. Each batch is constituted with 100 Java pairs and 100 C# pairs that are randomly selected from corresponding datasets. The vocabulary sizes of both APIs and natural language descriptions are set to 10,000. The maximum sequence lengths  and are both set as 30. Sequences that exceed the maximum lengths will be excluded for training.

After training, we feed in the encoder with all API sequences and obtain corresponding semantic vectors from the last hidden layer of encoder.

4.3 API Sequence Alignment

After embedding all API sequences, we build pairs of equivalent Java and C# API sequences according to their semantic vectors. For each Java API sequence, we find the most related C# API sequence to align with by selecting the C# API sequence that has the most similar vector representation. We measure the similarity between the vectors of two API sequences using the cosine similarity, which is defined as:

(9)

where and are vectors of source and target API sequences. The higher the similarity, the more related the source and target API sequences are to each other.

Finally, we obtain a database consisting of aligned pairs of Java and C# API sequences.

Package Class Migration Method Migration
Precision Recall F-score Precision Recall F-score
 StaMiner  DeepAM  StaMiner  DeepAM   StaMiner   DeepAM  StaMiner  DeepAM  StaMiner  DeepAM   StaMiner   DeepAM
java.io 70.0% 80.0% 63.6% 75.0% 66.6% 72.7% 70.0% 66.7% 64.0% 87.5% 66.9% 75.2%
java.lang 82.5% 80.0% 76.7% 81.3% 79.5% 80.7% 86.7% 83.7% 76.5% 87.2% 81.3% 85.4%
java.math 50.0% 66.7% 50.0% 66.7% 50.0% 66.7% 66.7% 66.7% 66.7% 66.7% 66.7% 66.7%
java.net 100.0% 100.0% 50.0% 100.0% 66.7% 100.0% 100.0% 100.0% 33.3% 100.0% 50.0% 100.0%
java.sql 100.0% 100.0% 50.0% 100.0% 66.7% 100.0% 100.0% 50.0% 50.0% 66.7% 66.7% 57.2%
java.util 64.7% 69.6% 71.0% 72.7% 67.7% 71.1% 63.0% 64.3% 54.8% 85.7% 58.6% 73.5%
All 77.9% 82.7% 60.2% 82.6% 66.2% 81.9% 81.1% 71.9% 57.6% 82.3% 65.0% 76.3%
Table 1: Accuracy of 1-to-1 API mappings mined by DeepAM and StaMiner (%)

4.4 Extracting General API Mappings

The aligned pairs of API sequences may be project-specific. However, automated code migration tools such as Java2CSharp require commonly used API mappings. To obtain more general API mappings, we summarize mappings that have high co-occurrence probabilities in the aligned pairs of API sequences. To do so, we apply an SMT technique named phrase-based model [\citeauthoryearKoehn et al.2003] to the pairs of aligned API sequences. The phrase-based model was originally designed to extract phrase-to-phrase translation mappings from bilingual sentences. In our system, the phrase model summarizes pairs of API phrases, namely, subsequences of APIs that frequently co-occur in the aligned pairs of API sequences. For each phrase pair, it assigns a score defined as the translation probability =+, where is the number of mapping occurrences , and is the number of all occurrences of the subsequence . Finally, we select pairs whose translation probabilities are greater than a threshold as the final API mappings. We set the threshold to 0.5 as in StaMiner [\citeauthoryearNguyen et al.2014]

5 Experimental Results

5.1 Accuracy in Mining API Mappings

We first evaluate how accurate DeepAM performs in mining API mappings. We focus on 1-to-1 API mappings that are currently used by many code migration tools such as Java2Csharp. We compare the 1-to-1 API mappings mined by DeepAM (Section 4) with a ground truth set of manually written API mappings provided by Java2CSharp.
Metric We use the F-score to measure the accuracy. It is defined as: =+ where = and =. TP is true positive, namely, the number of API mappings that are both in DeepAM results and in the ground truth set. FP is false positive which represents the number of resulting mappings that are not in the ground truth set. FN is false negative, which represents the number of mappings that are in the ground truth set but not in the results.
Baselines We compare DeepAM with StaMiner [\citeauthoryearNguyen et al.2014] and TMAP [\citeauthoryearPandita et al.2015]. StaMiner is a state-of-the-art API migration approach that directly utilizes statistical machine translation on bilingual projects. TMAP [\citeauthoryearPandita et al.2015] is an API migration approach using information retrieval techniques. It aligns Java and C# APIs by searching similar descriptions in API documentation. For easy comparison, we use the same configuration as in TMAP [\citeauthoryearPandita et al.2015]. We manually examine the numbers of correctly mined API mappings on several Java SDK classes and make a direct comparison with the TMAP’s results presented in their paper.
Results Table 1 shows the accuracy of both DeepAM and StaMiner. We evaluate the accuracy of mappings for both API classes and API methods. The results show that DeepAM is able to mine more correct API mappings. It achieves average recalls of 82.6% and 82.3% for class and method migrations respectively, which are significantly greater than StaMiner (60.2% and 57.6%). The average precisions of DeepAM are 82.7% and 71.9%, slightly less than but similar to StaMiner (77.9% and 81.1%). Overall, DeepAM performs better than StaMiner, with average F-measures of 81.9% and 76.3% compared to StaMiner ’s (66.2% and 65.0%).

Table 2 shows the number of correctly mined API mappings by TMAP and DeepAPI. The column # Methods lists the total numbers of API methods for each class. As shown in the results, DeepAM can mine many more correct API mappings than TMAP, which is based on text similarity matching.

The results indicate that without the restriction of a few bilingual projects, DeepAM yields many more correct API mappings.

Class # # API mappings
Methods TMAP DeepAM
java.io.File 54 26 43
java.io.Reader 10 6 8
java.io.Writer 10 10 7
java.util.Calendar 47 5 20
java.util.Iterator 3 1 3
java.util.HashMap 17 5 14
java.util.ArrayList 28 15 26
java.sql.Connection 52 13 23
java.sql.ResultSet 187 31 33
java.sql.Statement 42 5 15
All 450 117 192
Table 2: Number of correct API mappings mined by DeepAM and TMAP
Tool # API Mapping Rules by Sequence Length Corr. EDR
1 2-3 4-7 8+ Total
StaMiner  50,992 31,754 14,370 3,708 100,825  87.1%  7.3%
DeepAM  35,973  218,957   328,290   225,268  808,488  88.2%  8.2%
Table 3: Number of API Mappings Mined by DeepAM and StaMiner
Task Java API Sequence Migrated C# API sequence by DeepAM
parse datetime from string SimpleDateFormat.new SimpleDateFormat.parse DateTimeFormatInfo.new DateTime.parseExact DateTime.parse
open a url URL.new URL.openConnection WebRequest.create Uri.new HttpWebRequest.getRequestStream
get files in folder File.new File.list File.new File.isDirectory DirectoryInfo.new DirectoryInfo.getDirectories
generate md5 hash code MessageDigest.getInstance MessageDigest.update MessageDigest.digest MD5.create UTF8Encoding.new UTF8Encoding.getBytes MD5.computeHash
execute sql statement Connection.prepareStatement PreparedStatement.execute SqlConnection.open SqlCommand.new SqlCommand.executeReader
create directory File.new File.exists File.createNewFile FileInfo.new Directory.exists Directory.createDirectory
read file System.getProperty FileInputStream.new InputStreamReader.new BufferedReader.new BufferedReader.read BufferedReader.close FileInfo.new StreamReader.new StreamReader.read StreamReader.close
create socket InetSocketAddress.new ServerSocket.new ServerSocket.bind ServerSocket.close Socket.new IPEndPoint.new Socket.bind Socket.close
download file from url URL.new URL.openConnection URLConnection.getInputStream BufferedInputStream.new WebRequest.create HttpWebRequest.getResponse HttpWebResponse.getResponseStream StreamReader.new
save an image to a file BufferedImage.new Color.new Color.getRGB BufferedImage.setRGB String.endsWith File.new ImageIO.write Bitmap.new Color.new Color.fromArgb Bitmap.setPixel Bitmap.save
parse xml DocumentBuilderFactory.newInstance DocumentBuilderFactory.newDocumentBuilder DocumentBuilder.parse XDocument.load HttpUtility.htmlEncode XDocument.parse
play audio AudioSystem.getClip File.new AudioSystem.getAudioInputStream Clip.open Clip.start Clip.isRunning Thread.sleep Clip.close SoundPlayer.new SoundPlayer.play Thread.sleep SoundPlayer.stop
Table 4: Examples of Mined API Mappings

5.2 The Scale of Mined API Mappings

We also evaluate the scalability of DeepAM on mining API mappings: we compare the number of API mappings minend by DeepAM and StaMiner [\citeauthoryearNguyen et al.2014] with respect to sequence lengths. We can make this comparison because both DeepAM and StaMiner support sequence-to-sequence mapping. We also consider the quality of mined API mappings in the comparison. We use correctness and edit distance ratio (EDR) to measure the quality as used in [\citeauthoryearNguyen et al.2014]. The correctness is defined as the percentage of correct API sequences of all the migrated results. The EDR is defined as the ratio of elements that a user must delete/add in order to transform a result into a correct one. EDR= , where measures the edit distance between the ground truth sequence  and the result sequence ; () is the number of symbols in . The value of EDR ranges from 0 to 100%. The smaller the better.
Results Table 3 shows the number of API mappings produced by DeepAM and StaMiner. Each column within # API Mappings by Sequence Length shows the number of mined API mappings within a specific range of length: one (column 1), two or three (2-3), four to seven (4-7), and eight or more (8+). As we can see, DeepAM produces many more API mappings than StaMiner, with comparable quality. The total number of mappings mined by DeepAM is 808,488, which is significantly greater than that of StaMiner (100,825). In particular, DeepAM produces more mappings for long API sequences. The quality of mappings by DeepAM is comparable to that by StaMiner. The correctness of DeepAM is 88.2%, which is slightly greater than that of StaMiner (87.1%). However, mappings produced by DeepAM need slightly more error correlations than StaMiner.

Overall, the results indicate that DeepAM significantly increases the number of API mappings than StaMiner, with comparable quality. These results are expected because DeepAM does not rely upon bilingual projects, therefore significantly increasing the size of available training corpus.

Table 4 shows some concrete examples of API mappings. We selected 12 programming tasks that are commonly used in the literature [\citeauthoryearLv et al.2015, \citeauthoryearGu et al.2016]. The results show that DeepAM can successfully migrate API sequences for these tasks. DeepAM also performs well in longer API sequences such as copy file and play audio.

Tool Java version C# version Average
IR 37.4% 44.1% 40.8%
DeepAM 60.2% 84.6% 72.4%
Table 5: Accuracy of API pair alignment by DeepAM and IR-based technique

5.3 Effectiveness of Multi-modal API Sequence Embedding

As the most distinctive feature of our approach is the multi-modal semantic embedding of API sequences, we also evaluate DeepAM’s effectiveness in embedding API sequences, namely, whether the joint embedding is effective on API sequence alignment. As described in Section 4.3, we apply the semantic embedding and sequence alignment on raw API sequences, and obtain a database of semantically related Java and C# API sequences. We randomly select 500 aligned pairs of Java and C# API sequences from the database and manually examine whether each pair is indeed related. We calculate the ratio of related pairs of the 500 sampled pairs.
Baseline We compare our results with an IR based approach. This approach aligns API sequences by directly matching corresponding descriptions using text similarities (e.g., the vector space model) [\citeauthoryearManning et al.2008]. We implement it using Lucene8. For each Java API sequence, we search the C# API sequence whose description is most similar to the description of the Java API sequence, and vice versa. We randomly select 500 aligned pairs from the results and manually examine the ratio of correctly aligned pairs.
Results Table 5 shows the performance of sequence alignment. The column Java version shows the ratio of Java API sequences which are correctly aligned to C# API sequences. Likewise, the C# version column shows the ratio of C# API sequences that are correctly aligned to Java API sequences. The results show that the joint embedding is effective for the API sequence alignment. The ratio of successful alignments is 72.4%, which significantly outperforms the IR based approach (average accuracy is 40.8%). The results indicate that the deep learning model is more effective in learning semantics of API sequences than traditional shallow models such as the vector space model.

6 Conclusion

In this paper, we propose a deep learning based approach to the migration of APIs. Without the restriction of using bilingual projects, our approach can align equivalent API sequences from a large-scale commented code corpus through multi-modal sequence-to-sequence learning. Our experimental results have shown that the proposed approach significantly increases the accuracy and scale of API mappings the state-of-the-art approaches can achieve. Our work demonstrates the effectiveness of deep learning in API migration and is one step towards automatic code migration.

Footnotes

  1. http://j2cstranslator.wiki.sourceforge.net/
  2. http://github.com
  3. http://www.eclipse.org/jdt
  4. https://roslyn.codeplex.com/
  5. A documentation comment in Java starts with ‘/**’ and ends with ‘*/’. A documentation comment in C# starts with a “summary” tag and ends with a “/summary” tag.
  6. http://www.oracle.com/technetwork/articles/java/index-137868.html
  7. https://github.com/virtualmarc/gitlab-ci-runner-win/blob/master/gitlab-ci-runner/helper/TextFile.cs
  8. https://lucene.apache.org/

References

  1. Karpathy Andrej and Fei-Fei Li. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3128–3137, 2015.
  2. Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
  3. Kyunghyun Cho, Bart Van Merriënboer, Çağlar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, October 2014. Association for Computational Linguistics.
  4. Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 631–642. ACM, 2016.
  5. Ahmed E Hassan and Richard C Holt. A lightweight approach for migrating web frameworks. Information and Software Technology, 47(8):521–532, 2005.
  6. Xuan Huo, Ming Li, and Zhi-Hua Zhou. Learning unified features from natural and programming languages for locating buggy source code. In Proceedings of the 25th 25th International Joint Conference on Artificial Intelligence (IJCAI’16), 2016.
  7. Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 48–54. Association for Computational Linguistics, 2003.
  8. Fei Lv, Hongyu Zhang, Jianguang Lou, Shaowei Wang, Dongmei Zhang, and Jianjun Zhao. CodeHow: Effective code search based on API understanding and extended boolean model. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE 2015). IEEE, 2015.
  9. Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.
  10. Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernockỳ, and Sanjeev Khudanpur. Recurrent neural network based language model. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, pages 1045–1048, 2010.
  11. Maxim Mossienko. Automated cobol to java recycling. In Software Maintenance and Reengineering, 2003. Proceedings. Seventh European Conference on, pages 40–50. IEEE, 2003.
  12. Anh Tuan Nguyen, Hoan Anh Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. Statistical learning approach for mining API usage mappings for code migration. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pages 457–468, New York, NY, USA, 2014. ACM.
  13. R. Pandita, R. P. Jetley, S. D. Sudarsan, and L. Williams. Discovering likely mappings between APIs using text mining. In Source Code Analysis and Manipulation (SCAM), 2015 IEEE 15th International Working Conference on, pages 231–240, Sept 2015.
  14. Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014.
  15. Thiago Tonelli, Krzysztof, and Ralf. Swing to swt and back: Patterns for API migration by wrapping. In Software Maintenance (ICSM), 2010 IEEE International Conference on, pages 1–10, Sept 2010.
  16. Ran Xu, Caiming Xiong, Wei Chen, and Jason J Corso. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In AAAI, pages 2346–2352. Citeseer, 2015.
  17. Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.
  18. Hao Zhong, Suresh Thummalapenta, Tao Xie, Lu Zhang, and Qing Wang. Mining API mapping for language migration. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE ’10, pages 195–204, New York, NY, USA, 2010. ACM.
113404
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
Edit
-  
Unpublish
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel
Comments 0
Request comment
""
The feedback must be of minumum 40 characters
Add comment
Cancel
Loading ...

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description