An Empirical Study on the Impact of Refactoring Activities on Evolving Client-Used APIs

An Empirical Study on the Impact of Refactoring Activities
on Evolving Client-Used APIs

Raula Gaikovina Kula raula-k@is.naist.jp Ali Ouni ali@ist.osaka-u.ac.jp Daniel M. German dmg@uvic.ca Katsuro Inoue inoue@ist.osaka-u.ac.jp Osaka University, Japan Nara Institute of Science and Technology, Japan Ecole de Technologie Superieure Montreal, Canada University of Victoria, Canada
Abstract

Context: Refactoring is recognized as an effective practice to maintain evolving software systems. For software libraries, we study how library developers refactor their Application Programming Interfaces (APIs), especially when it impacts client users by breaking an API of the library.

Objective: Our work aims to understand how clients that use a library API are affected by refactoring activities. We target popular libraries that potentially impact more library client users.

Method: We distinguish between library APIs based on their client-usage (refereed to as client-used APIs) in order to understand the extent to which API breakages relate to refactorings. Our tool-based approach allows for a large-scale study across eight libraries (i.e., totaling 183 consecutive versions) with around 900 clients projects.

Results: We find that library maintainers are less likely to break client-used API classes. Quantitatively, we find that refactoring activities break less than 37% of all client-used APIs. In a more qualitative analysis, we show two documented cases of where non-refactoring API breaking changes are motivated by other maintenance issues (i.e., bug fix and new features) and involve more complex refactoring operations.

Conclusion: Using our automated approach, we find that library developers are less likely to break APIs and tend to break client-used APIs when addressing these maintenance issues.

keywords:
Refactoring, API Breakages, Software Libraries, Software Evolution

1 Introduction

Software libraries are constantly evolving, either responding to client needs, patching bug fixes or addressing other maintainability concerns. Refactoring is a controlled and widely-used technique for improving the design of an existing software, especially with modern and large-scale software systems that depend on a large number of third-party libraries. Fowler recommends refactoring to improve software readability and reusability, while increasing the speed at which developers can write and maintain their code base Fowler1999 (); opdyke1992refactoring ().

The Application Programming Interface (API) are specifications that govern interoperability between a client application and a library. External APIs refer to the APIs available for client usage. Since clients solely rely on APIs for ‘blackbox’ access to the library’s functionality, API backward compatibility is an important consideration for both client and library developers. Clients migrating to a newer library version would be particularly concerned with whether previously invoked external APIs in an older version will continue to be invoked without error. This is known as preserving API compatibility111 Java standards documentation at http://docs.oracle.com/javase/specs/jls/se8/html/jls-13.html. Hence, any API change between two library versions that violates this linkage is known as an API breakage. From a library viewpoint, a developer refactoring an external APIs may not consider the effect its has in affecting a client’s chances of adopting the latest version. Conversely, negligence to refactor the code base may increase the complexity and maintainability efforts (Lehman’s 2 law), leading up to an eventual degradation in software quality (Lehman’s 7 law) Lehman:1996 ().

In this work, we conduct an empirical study to explore the relationship between API refactorings and breakages based on actual API usage by clients. We distinguish between library APIs based on their client-usage (refereed to as client-used APIs) in order to get a deeper understanding on the extent to which API breakages can be related to refactoring activities. Our investigation covers over 9,700 breaking classes and around 12,900 refactoring operations from eight popular Java libraries, with each library having around 1038 consecutive releases. We observe the following: (i) library maintainers are less likely to break client-used APIs compared to other classes of the library, (ii) detected refactoring operations only breaking less than 37% of client-used APIs, qualitatively finding that the (iii) rest (63%) API breakages are motivated by maintenance issues that are likely to involve more complex refactorings. Finally, we find that (iv) simple refactorings (i.e., move_method, rename_method, move_field) were less frequently applied to client-used API classes compared to other classes.

Our main contributions of this paper are three-fold and can be summarized as follows: (1) our study involves the investigation of APIs that are used by actual client, (2) using automated tooling, we conducted a large scale empirical study to investigate API breakages and refactorings and (3) we present a large dataset of API breakages and refactorings which is publicly available as a replication package at: http://sel.ist.osaka-u.ac.jp/people/raula-k/APIBreakage/

The rest of the paper is organized as follows. Section 2 describes the background and definitions. Section 3 presents our approach we use in the empirical study. Section 4 details the research questions and what method is used in the study. We then show our results in Section 5, with discussion of implications and threats of the study in Section 6. Section 7 surveys related work. Finally, Section 8 concludes the paper and presents future research directions.

2 Basic Concepts & Definitions

This section provides the necessary background and concepts that are prerequisites to understand the conducted study.

2.1 Backward Compatibility of APIs

The precise definition of backward compatibility depends in part on the Java language’s notion of binary compatibility222documentation at http://docs.oracle.com/javase/specs/jls/se8/html/jls-13.html:

“binary compatible with (equivalently, does not break binary compatibility with) pre-existing binaries if pre-existing binaries that previously linked without error will continue to link without error.”

Importantly, a class or interface should treat its accessible members (method and fields) and constructors, their existence and behavior, as a contract with its users.

In this paper, we define that any changes violating this contract are said to cause an API breakage between the library and its client user. We show two examples of API breakages. The first example of an API breakage is when a method name is modified (i.e., renamed or deleted method). For instance, the removal of the method in a class could break the API linkage, resulting in a NoClassDefFound exception error to the client application. Conversely, adding parameters (i.e., adding new fields, methods, or constructors) to an existing class or interface usually does not break an API.

The second example of an API breakage is when third-parties cause an API breakage to the library, which then indirectly breaks the client. In many cases, a library is also a client user of other libraries within their environment. For instance, any changes to the library’s environment such as an update to the Java Development Kit (JDK) may break a method in the library, and therefore ripples its effect to any client user of this library API.

2.2 Refactoring Activities and API breakages

Refactoring is a disciplined engineering practice that restructures an existing code by altering its internal structure without changing its external behavior Fowler1999 (). Fowler discusses around seventy various refactorings, which can be either simple or become quite complex. In this paper, we determine if any of the API breakages is related to a refactoring activity. Formally, we define a Refactoring Operation (R) as an atomic refactoring change applied between two library versions.

2.3 API Categorization Based on Client Usage

In this paper, we are interested in the APIs actually used by a client application, assuming that a code change between a client-used API will cause a breakage to that contract between library and client user. To investigate the extent of which developers are breaking their APIs, we must first define the usage dimension of an API. In reality, not all public entities (APIs) are intended for client usage. Based on a developer’s intended use, an API of a library can either be external or internal.

  • External APIs - are APIs designed by library maintainers for usage by clients.

  • Internal APIs - are APIs intended only for internal usage by the library code itself.

An internal API may exist for several reasons. For instance, the Finalizer class within base.internal package of the google-guava documented caseExample ():

While this class is public, we consider it to be *internal* and not part of our published API. It is public so we can access it reflectively across class loaders in secure environments.

In an ideal world, internal APIs are never used by any client. However, in reality internal APIs may be subjected to client usage. For instance, Businge et al. found that a large proportion of plugins used the Eclipse framework internal APIs DBLP:journals/sqj/BusingeSB15 (). Moreover, concepts such as the Application Binary Interfaces (ABIs) Forman:1995 () and the OSGi framework Baul2009 () have been proposed to differentiate between the two API types. However, unless explicitly documented, it is extremely difficult to distinguish between external or internal APIs.

Figure 1: A conceptual composition of all library class types. The venn diagram shows the relationship between (a) client-used API, (b) non client-used API and (c) non API class types.

As shown in Figure 1, we describe the different class categories of a library. To distinguish between external and internal APIs, we propose a method to approximate external API classes by mining actual usage by clients, defined as client-used API classes. Details of the method are explained in the subsequent methodology subsection. All library class categories are defined as follows:

  • API class - is a class that has at least one public entity (i.e., method and field members) and accessible by any client user.

  • non API class - is a class that contains no API entities, i.e., private or protected.

  • client-used API class (clientUse) - is an API class that is used by at least one client. It is an approximation of the external APIs. The set of client-used API classes should ideally cover all external APIs. However, there exist cases when a client uses an internal API.

    (i.e., )

  • non client-used API class (non clientUse) - is an API class that is not used by any client. The set of all non client-used API classes should cover all internal APIs.

    (i.e., )

Henceforth, we classify our API breakages at the class-level. Classes are then classified as either:

  • breaking class - is a changed class that is breaking its API in either class, method or field levels such as rename/move/delete changes.

  • non breaking class - a changed class that does not affect API compatibility.

We then explore the extent to which breaking changes to client-used API are caused by refactoring activities. As defined in the Section 2, R is an atomic refactoring operation applied between two library versions. We now introduce the following terminologies related to R:

  • Ref class - is a changed API class where at least one R has been applied to any of its elements. (i.e., field, methods or class attributes).

  • R density - refers to the number of R applied per class.

3 Approach

In this section, we first present our case study libraries and methodology used in the empirical study. Our method includes (1) categorization of API based on client usage, (2) API breakage detection and (3) API refactoring detection.

Library Release range #Versions Releases Time Period # Classes (min max)
guava r03 18.0 22 Apr 10 Aug 14 727 1763
httpclient 4.0 4.5 25 Aug 09 May 15 230 460
javassist 2.5.1 3.19.0 28 Feb 06 Jan 15 187 334
jdom 1.1 2.0.6 10 Sept 04 Feb 15 73 258
joda-time 0.95 2.8 22 Nov 05 May 15 191 246
log4j 1.1.3 1.2.17 17 Jun 01 May 12 242 974
slf4j 1.1.0 1.7.12 38 Dec 06 Mar 15 11 28
xerces 1.2.3 2.11.0 21 Dec 00 Nov 10 580 1652
Table 1: Studied libraries showing the releases range, number of versions, time period, and the range of number of classes per library (min-max).

3.1 Subject Libraries

We used a systematic method to select our subject libraries. Our selection of these libraries is based on the following criteria: (1) have a large enough client-user API usage and (2) have sufficient evolution history. Additionally, we required diverse libraries that (3) are from different application domains and (4) have been extensively studied in related work. This criteria was used to select libraries from a set of 2,500 client projects collected from GitHub.

Table 1 shows all 183 library versions from the eight selected libraries. For each library, we collected 10 to 38 different library versions. All libraries constitute a large client-base and are from different application domains. Moreover, three out of the eight subject libraries were used in prior work Cossette2012 (); Dig2006 (); Kapur2010 (). The chosen studied libraries range from being testing, logging, utilities and web-based libraries. As shown in the table, we selected guava guavaURL (), httpclient httpclientURL (), javassist javassistURL (), jdom jdomURL (), joda-time jodatimeURL (), log4j log4jURL (), slf4j slf4jURL () and xerces xercesURL () For all libraries, we only selected consecutive version releases, ignoring release candidates. Only the official binaries and available source code for each library were used in this study.

(a) guava
(b) httpclient
(c) javassist
(d) jdom
(e) Joda-time
(f) log4j
(g) slf4j
(h) xerces
Figure 2: Cumulative count of client-used API classes (x-axis) represented as a function over the number of client projects (y-axis). The saturation function (coasting of the curve) indicates that a stable number of client-used API classes classes have been reached (See Table 2).

3.2 Client-Used API Extraction Method

Actual client usage is needed to distinguish between external or internal APIs. Specifically, we would need to compile each individual client system to know what APIs are used by clients. To enable a large scale analysis, we use the jcabi-aether jcabiURL () library and JavaCompiler (ver.1.6) Eclipse compiler javaCompilerURL () to dynamically compile and log all client-loaded classes. As a result, we are able to extract the fully qualified library class name of all external APIs for many clients.

. At Saturation Point (SP) # Collected Clients used clients at SP client-used API classes at SP Guava 195 98 184 httpclient 149 67 87 Javassist 14 11 30 Jdom 35 16 26 Joda-time 69 20 27 log4j 195 36 46 Slf4j 321 20 9 Xerces 17 15 47 All clients 995

Table 2: Collected client-used API classes as shown in Figure 2

One of the main challenges to determine client-used API collection is the coverage of all external APIs. Hence, our technique consists of continuously collecting client systems until full coverage is reached (i.e., no more APIs are used). We coin this coverage as the saturation point reached for a library version. So instead of trying to compile as many clients are possible, we use the saturation point as a heuristic to show that enough clients have been collected. Figure 2 and Table 2 shows the saturation point for our case studies. The saturation point is represented as a cumulative count of client-used API classes (x-axis) represented as a function over the number of client projects (y-axis), with the coasting of the curve assuring confidence that a stable number of client-used API classes have been reached. For example, of the 195 collected clients, guava reached a saturation with 98 client systems to cover 184 API classes. It is important to note that each project was selected at random, making the formation of the curve unintentional. The table also summarizes the number of client GitHub projects that we mined for each of the eight subject libraries (total code base size of 600GB). To ensure maturity and quality of the client projects, the projects dataset only includes java projects that had at least 100 commits. We ran experiments for about 30 days. The process of client-used API collection of a single project took between 10 min 3 hours.

3.3 API Breakage Detection Method

In recent times, state–of–the–art API breakage detection tools japicmp (); clirr (); jacc (); jdiff (); revapi () have been extensively used by both researchers Jezek2015 (), Raemaekers2014 () and practitioners guavaURL (), httpclientURL () alike, especially for a systematic comparison of API checking backward incompatibilities between library versions333For instance, developers of the google guava library, use JDiff to report changes between two versions, e.g., API changes from guava v18 to v19 are at http://google.github.io/guava/releases/19.0/api/diffs/ As noted by Raemaekers Raemaekers2014 (), these tools are underestimations– as all detected breaking API changes will definitely break an API but some binary compatible APIs could be semantically incompatible.

To identify the API differences between two library binaries, we use the Japi-cmp library japicmpURL (). Similar to other tools, Japi-cmp is able to detect and differentiate changes in instrumented and generated classes to determine binary compatibility as well as public or private accessibility. Using the definitions in Section 2, we then map and label all classes as either breaking or non-breaking. Overall, the resulting dataset consists of over 9,700 detected breaking classes from the eight libraries.

3.4 Refactorings Detection Method

To automatically collect R applied between the two versions, we use the state–of–the–art Ref-Finder prete2010template () tool. Based on template logic rules, the tool identifies up to different 52 refactoring types between two versions. It is important to note that the collected refactorings are structural, only detectable by mechanical transformations; “ Ref-Finder does not include changes that may either require restricted conditions to be met, or to some degree of additional specification from a developer that could not be automatically inferred by a tool” Cossette2012 (). As a result, our dataset consists of 12,900 R from all eight libraries.

3.5 Mapping Refactorings to API Breakages

Figure 3: Venn diagram of the overlapping relations of refactored and breaking classes.

The study involves a mapping between the collected Ref and breaking classes, where a Ref class contains at least one R. Figure 3 describes this mapping as an intersection between breaking classes and Ref classes. It is important to note false positives, where the tools detect refactorings in unchanged classes. Upon manual inspection of some cases, we confirmed these were false positives as the classes were unchanged. As a result, we semi-automatically identified and discarded 2,100 instances of such false positives, finally leaving us with 10,800 R from all eight libraries.

A simple example of a refactoring that breaks API can be seen with the com.google.common.collect.ImmutableMultiset of the Guava library444 the API change at http://google.github.io/guava/releases/12.0/api/diffs/changes/com.google.common.collect.ImmutableMultiset.html#methods. According to the API Diff report, the ImmutableMultiset<E> of(E[]) method (i.e., which takes E[] and returns an immutable multiset) was removed between version 11.0.02 and 12.0. In this example, our approach automatically detects this change as the remove_method R. The official Java documentation states that ‘deleting a method or constructor from a class breaks compatibility with any pre-existing binary that referenced this method or constructor; a NoSuchMethodError may be thrown when such a reference from a pre-existing binary is linked. Such an error will occur only if no method with a matching signature and return type is declared in a superclass’.

4 Empirical Study

In this section, we present the goals and motivation, followed by the method used to address each research question.

4.1 Research Questions

Our motivation is to inspect the relationship between refactorings and API breakages. Related, Dig and Johnson Dig2006 () manually inspected library release notes for documented API changes to investigate the role of refactoring during API evolution of a library. They cited two reasons why they preferred a manual analysis over the use of automated tools: (1) ‘since most API changes follow a long deprecation replace-remove cycle, an obsolete API can coexist with the new API for a long time’ and (2) some behavioral refactoring cases that ‘would have been misinterpreted by a tool, but a human expert can easily spot’. In this study, we find that state–of–the–art tools are now able to detect deprecations, thus negating the first reason. Additionally, we find that the automated approach is not as reliant on documentation.

Our goal in this study is to use an automated approach to investigate how client usage-APIs are affected by the refactoring activities. The automated approach has the benefit of reducing manual inspection and heuristic errors and enables a large-scale empirical study. We designed a rigorous quantitative empirical study, formulating the following research questions:

  • (RQ1). To what extent are library maintainers breaking client-used APIs over time? We want to understand the API breaking tendencies of library maintainers.

  • (RQ2). To what extent are refactoring activities breaking client-used APIs? Sometimes API breakages are unavoidable, even for the more popular client-used APIs. Prior work indicates that refactoring is common with API changes. Therefore, we want to understand how much of client-used API breakage is related to refactoring activities.

In RQ2, we identified many API breakages not related to refactoring activities. We then formulated RQ3 and RQ4 for a deeper analysis of the detected changes (both refactoring and non refactoring related) that break client-used APIs:

  • (RQ3). What non-refactoring-related code changes are breaking client-used APIs? Specifically, our motivation is to understand what API breaking changes are not related to refactorings.

  • (RQ4). What refactoring-related code changes are breaking client-used APIs? From the perspective of all refactoring activities, we would like to understand (i) how much and (ii) types of refactoring operations that are breaking client-used APIs.

4.2 Research Method for RQ1

. Compatible Change Incompatible Changes () client-used API API compatible API Breaking code change non client-used API API compatible Incompatible change unintended for client non API Not affect client Incompatible change does not affect client

Table 3: Library Class Categories Incompatibility Matrix

To answer RQ1, we followed two steps. First, we studied consecutive versions of a library to understand the library evolution. The goal is to study how (i) client-used API classes, (ii) non client-used API classes and (iii) non API classes evolve over several consecutive versions. Next, we investigate the number of code changes that lead to incompatibility with respect to the different class categories that we defined above. Since the tool is only able to compare two versions at a time, we performed a side-by-side (i.e., each comparison is the current version against the immediate successive library version). We introduce a normalized metric namely break to describe the rate of the number of breaking changes over all class changes at that version release as defined in Equation 1:

(1)

where refers to a given library version and ranges from 0 1 for each class category of . Values that are closer to 1 indicate that there are more breakages per class changes.

Table 3 shows the metric interpretation based on the class type. Hence, the metric has different interpretations based on the class type. For instance, for non API classes, the metric shows significant changes that do not affect clients. We believe that it is important to track which classes are more prone to incompatible code changes. To assess the significance of breakages between the different library class categories, we use the Kruskal Wallis and Mann-Whitney non-parametric test. The null hypothesis would state no statistical difference between the class types. Furthermore, to assess the difference magnitude, we study the effect size based on Cohen’s d tagkey1977iii (). The effect size is considered: (1) small if 0.2 d 0.5, (2) medium if 0.5 d 0.8, or (3) large if d 0.8. For the effect size, we use the Mann-Whitney tests with Bonferroni correction.

4.3 Research Method for RQ2

For RQ2, our method is to identify library refactorings that are applied to client-used API classes. We followed two steps. To analyze the impact of the refactoring activities, we first identified for each library the (i) number of Ref classes and (ii) R density. We then identified the Ref classes that are breaking. To map refactorings to API breakages as described in Section 3.5, we introduce a normalized metric namely breaking–to–Ref rate as Equation 2:

(2)

where refers to a given library version. The metric returns a percentage that ranges from [0..100%] for each class category of . Values that are closer to 100% indicate that there are more refactorings that are breaking each of the different class categories. Conversely, from an API breakage perspective, we now introduce a normalized metric namely Ref–to–breaking rate to describe the ratio of overlap with respect to all breaking classes as defined in Equation 3:

(3)

where refers to a given library version. The metric returns a score that ranges from [0..1] for each class category of . Values that are closer to 1 indicate that there are more breakages that are related to refactoring activities.

4.4 Research Method for RQ3

For RQ3, we used a qualitative approach to investigate the breaking APIs changes that were not detected in our approach as refactoring operations. Results from the prior RQ2 (See Section 5.2 Table 5) indicate that three of the six projects (Guava, HttpClient and xerces) have many client-used API breakages that were not related to refactoring activities. We consulted related change logs of these three projects; Guava555and example of Release 11 https://github.com/google/guava/wiki/Release11, HttpClient666https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt and Xerces 777 change logs at https://xerces.apache.org/xerces2-j/releases.html to understand the reason why these API breakages were performed by the developer. We manually checked documented change logs of each release to map an API breakage to either a bug fix issue or to accommodate a new enhancement (feature) in the library. To reduce bias, a manual check was carried by a team of three researchers (one postdoctoral and two graduate master students) persons with have an intermediate level of java programming and software development. Since team members do not posses any project-specific knowledge, we solely rely on keywords or issues links (i.e., issueID) in the change log comments to map each API breakage with a bug issue or new features. Xerces was later removed from the analysis as there was too many ambiguous references with no clear linkage to the source code. Analysis will include the aggregation of all API documented changes as either bug fixes or new features and show how many can be mapped to the API breakages that did not involve any refactoring operations.

For a deeper analysis and validation, we will investigate and present some examples of these non-refactoring related API breaking classes.

4.5 Research Method for RQ4

For RQ4, we identified what refactoring operations are breaking client-used APIs. We followed two steps in the analysis. For a library, we aggregated the number of Ref  instances where a certain R (e.g., move_method) has been applied. In the second step, we used a normalized metric to describe the ratio of overlapped breaking refactorings between client-used API classes and non client-used API classes as defined in Equation 4.

(4)

where refers to a given library, R  refers to a certain refactoring operation type.

Our hypothesis is that a prsv ratio less than 1 ( 0 prsv 1) indicates that developers are applying less refactoring operations to client-used API classes. Conversely, a high prsv ratio ( ) indicates that more refactoring operations are applied to client-used API classes. A value of 1 indicates that the certain R type is equally applied to both client-used API classes and non client-used API classes.

5 Results

In this section, we present our results of the study by addressing each of the four research questions.

5.1 Findings for RQ1

(a) Guava
(b) Httpclient
(c) Javassist
Figure 4: An evolution of changed classes per class types for (a) guava, (b) httpclient, (c) javassist. These figures show the different # of classes identified in chronological order of release.
(a) Jdom
(b) Joda-time
(c) log4j
Figure 5: An evolution of # of classes per class types for (a) jdom, (b) joda-time and (c) log4j libraries. Similar to Figure 4, these figures show the different # of classes identified in chronological order of release.
(a) Slf4j
(b) Xerces
Figure 6: An evolution of # of classes per class types for (a) slf4j and (b) xerces libraries. Similar to Figure 4 and Figure 5, these figures show the different # of classes identified in chronological order of release.

Figures 4, 5 and 6 depict class category analysis of each consecutive library version. Each figure shows the evolution of (i) client-used API classes, (ii) non client-used API classes and (iii) non API classes over consecutive library versions. From these figures, we summarize our findings with three observations (i.e., Figures (a)a(b)b). First, we observe that most libraries are composed of non client-used API classes categories (green line), showing that libraries usually have more non client-used API classes than client-used API classes. The exception is log4j, which is shown in Figure (c)c to have most APIs intended for external API usage. Interestingly, we see in Figure (c)c that non client-used API classes of javassist disappears from the more recent libraries. Upon closer inspection, we noticed that this was because developers had changed these non client-used API classes into non API classes. Second, we observe a stable number of client-used API classes (red line) shown across all projects. From a client user viewpoint, the findings indicate that developers of a library are less likely to expand their external APIs. The obtained results show that the number of non API classes (blue line) is constantly changing (i.e., illustrated by various peaks) over time. We find that some of the peak changes can be correlated to different events, such as a major or specially-named releases, beta releases such as xercesImpl2 and log4j, or modifying private non API classes into public APIs such as in the case of httpclient.

(a) For each of the eight libraries, we show the comparing (1) client-used API classes in red, (2) non client-used API classes in green and (3) non API classes in blue.
# versions breaking class instances changed class instances
Guava 22 2,215 9,973
httpclient 25 113 1,426
Javassist 28 1,017 2,572
Jdom 10 106 445
Joda-time 22 1,097 2,922
log4j 17 583 3,051
Slf4j 38 21 235
Xerces 21 4,622 7,796
Totals 183 9,774 28,420
(b) Corresponding to Figure (a)a, this table shows the # of analyzed (1) library versions, (2) breaking and (3) changed classes collected.
Figure 7: Results of the rates for all eight libraries analyzed.
Figure 8: Summary of break comparing (1) client-used API classes in red, (2) non client-used API classes in green and (3) non API classes in blue.

Library maintainers are less likely to apply client-used API classes changes compared to other class categories. \@testtrue

Figure (a)a shows the rates for all eight libraries. From this figure, we observe that except for javassist and joda-time, library developers are more likely to break non client-used API classes than client-used API classes. Related, Figure 8 depicts the rates grouped by all class categories. The Figure shows that non client-used API classes are more prone to breakages than client-used API classes for all libraries. As shown in the Figure, non client-used API classes are reported to have the most breakages. A Kruskal Wallis test revealed a significant differences between client-used API classes, non client-used API classes and non API classes values (p0.01). The post-hoc test using Mann-Whitney tests with Bonferroni correction proves the effect size to be medium (p0.01, r = 0.54) when comparing all class categories.

Findings show that incompatible API code changes are statistically more likely to occur in non client-used API classes compared to client-used API classes. \@testtrue

clientUse

non clientUse

non API

clientUse

non clientUse

non API

guava 32 143 44 31 24 139
httpclient 3 11 6 16 7
Javassist 111 8 6 29 10 2
jdom 1 1 3
Joda-time 30 11 29 5 12
log4j 1 2 3
slf4j 1 2 1
xerces 44 244 31 23 104 66
R density
(Median) guava 7 8 5 4 4 8
httpclient 5 5 2 3 2
Javassist 2 5 2 1 1 1
jdom 1 1 4
Joda-time 3 10 1 1 1
log4j 1 2 2
slf4j 1 3 22
xerces 8 6 4 4 4 5
Table 4: The table reports (a) number of Ref classes and (b) R density per Ref class (R). Note that (–) represents no matches.

Ref–to–breaking rate

breaking–to–Ref rate

(Median) (Median)

breaking

non breaking

breaking

non breaking

# versions

clientUse

guava 22 2
httpclient 25
Javassist 28 2 1 3 1 37% 75%
jdom 10
Joda-time 22 15 2 65 45 6% 48%
log4j 17
slf4j 38
xerces 21 64%

non clientUse

guava 22 86%
httpclient 25 14%
Javassist 28 4 1 2 1 5% 44%
jdom 10
Joda-time 22 1 9 7
log4j 17
slf4j 38
xerces 21
Table 5: Matrix that shows the median average # refactored API classes per library. For each library, we summarized the median values across all library versions. Table includes median () of matched refactored classes. 0 represents a value less than 0.01. (–) reports no matched classes.

5.2 Findings for RQ2

Table 4 presents a summary of Ref classes and their R density. For instance, we identified 32 guava client-used API classes that were Ref classes. Out of the 32 Ref classes, we report a median of 7 R that were applied per Ref class. From this table, we can see that, in general, library maintainers applied more R to non client-used API classes and non API classes, as compared to client-used API classes, except for javassist. For example, the table shows that for xerces, around 244 non client-used API classes were refactored, compared to 44 client-used API classes. In more detail, the results show that apart from slf4j, the median density of R per class ranges from 1 to 10 R at most. Interestingly, we find that slf4j had a high number of R applied to one non-breaking client-used API classes (=22). log4j, slf4j and jdom libraries reported only a few breaking classes matched to R, which is consistent with recent empirical studies conducted by Cossette et al. Cossette2012 ().

Table 5 reports the median values of R that cause API breakages. We use this table to compare between client-used API classes and non client-used API classes. For guava, non client-used API classes (=9) were breaking due to refactorings compared to client-used API classes (=2). From this table, we find that non refactoring changes are more likely to break client-used API classes than non client-used API classes. Moreover, applied refactorings tend to break more non client-used API classes compared to client-used API classes. The results show that many of the API breakages are not mapped to the detected refactorings (i.e., non Ref classes). We find that more refactoring non client-used API classes are breaking compared to refactored client-used API classes, with the exception of Javassist.

Table 5 also shows the breaking–to–Ref and Ref–to–breaking rates. We report that the median Ref–to–breaking rate for client-used API classes is up to 37% across all projects (=1%37%). Except for javassist, the result provides evidence the detected API breakages could not be mapped to refactoring operations. Alternatively, the breaking–to–Ref rates reported for client-used API classes in Table 5 indicates that breaking refactorings accounted for a median range of up to 75% of all R. The highest breaking-Ref rate for non client-used API classes was 86%, reported for the guava library.

Findings show that up to 75% refactored API classes are breaking their client-used APIs. However, these API breaking refactorings account for less than 37% of all client-used API breakages. \@testtrue

5.3 Findings for RQ3

Library Change Log # Issues # New Features breaking
Release clientUse
(# mapped to change logs)
Guava v11 26 13 4 (4)
v12 43 24 4 (4)
v13 26 28 7 (6)
v14 64 10 5 (5)
v15 53 11 7 (6)
v16 19 8 5 (4)
v17 11 5 6 (4)
v18 21 7 3 (2)
263 106 41 (34) 82%
Httpclient v4.1.2 5 - 3 (2)
v4.1.3 4 - 3 (2)
v4.2 15 4 3 (1)
v4.2.1 8 - 4 (2)
v4.2.2 8 - 4 (2)
v4.2.3 21 - 2 (2)
v4.2.4 6 9 3 (1)
v4.2.5 6 4 2 (2)
73 17 24 (14) 58%
Table 6: Shows for a library release (i) the number of issues and new features per version release analysis and (ii) the number of non-refactoring related API breaking classes. We also show in the number of these API breakages mapped to the change log comments.

Table 6 shows results of the manual study of API breakages that did not map to any Ref Classes in developer documentation (i.e., change logs). For instance, release 11 of the guava library listed 26 issues and 13 new features888The release notes are available at https://github.com/google/guava/wiki/Release11. The table confirms our results that find these client-used API class breakages are not only related to refactoring activities (i.e.,  non Ref Class). We find that all four API breakages could be mapped to the API documented changes. From the table, we were able to map 82% of non refactoring client-used API breakages to the API documentation for guava and 58% for httpclient. This finding indicates that many of the API breakages not involved in refactorings are most likely motivated by maintenance issues such as bug fixes and for new feature enhancements.

(a) This code change in the method resolve was detected as breaking API compatibility for users of the older JDK.
(b) This code change breaks API compatibily with the replacement HashCodes to HashCode in the method. This refactoring was missed by the automated approach
Figure 9: We show two examples of API breaking changes that were not mapped to detected refactoring operations (i.e., non-Ref). We conjecture that these changes are (a) in response to a complex defect in the code and (b) consist of a complex refactoring that is not captured by the automated approach.

From our manual analysis and similar to a study by Murphy-Hill et al. Murphy-Hill2009 (), we find that not all API changes appear in the API change logs. Figure 9 does show two documented case examples of API breakages that are not mapped to a detected refactoring operation (i.e., non-Ref). These examples provide evidence that these many client-used API class breakages are: (a) motivated by a bug fix or new feature or (b) consists of a complex refactoring that is not captured by the automated approach. In the first example (i.e., Figure (a)a, we show an unavoidable API breaking change, especially if it is used to fix a complex defect such as a third party library. This API breaking change was triggered in response to an error reported by a client user ”JDK and Guava TypeVariable implementations are no longer compatible under 1.7.0 51-b13” 999issue at https://github.com/google/guava/issues/1635 and fix at https://goo.gl/bqDpxU It was widely reported to affect many client users of the library. Developers found that a change in the standard Java library (JDK) causes guava to break API compatibility, as prior guava version implemented an undocumented internal API of the JDK (i.e., Types.TypeVariable.newTypeVariable())101010A blogger discussions by users is at https://goo.gl/8tcHfY. After much discussion among developers, the accepted API change was documented to ‘conditionally work only under the new JDK’.

In the second example (i.e., Figure (b)b), we acknowledge cases where the automated approach is unable to detect more complex refactoring operations. Soares et al. Soares2013 () showed that Ref-Finder is unable to correctly detect all types of refactoring operations, which is a validity threat and is discussed in detail (See Sections 6.3 and Section 6.4). Moreover, this change is listed as a submitted enhancement issue111111urlhttps://github.com/google/guava/issues/1495 related to ‘Move HashCodes static methods to HashCode’ and involves 17 changed files (261 added and 219 deleted lines of code)121212the code change is at https://goo.gl/JHVi5J.

Findings indicate that many client-used API breakages are likely to be motivated by other maintenance issues (i.e., bug fixes and new features) and involve more complex refactoring operations. \@testtrue

Classification of R guava httpclient xerces
breaking non breaking breaking non breaking breaking non breaking

clientUse

non clientUse

prsv

clientUse

non clientUse

clientUse

non clientUse

prsv

clientUse

non clientUse

clientUse

non clientUse

prsv

clientUse

non clientUse

change_parameter (53) (11) (273)
cdcf* (118)
extract_method (40) (70)
extract_subclass
extract_superclass
inline_method (9) (30)
inline_temp (8) (28)
introduce_explaining_variable (41)
introduce_null_object
move_field (76) (270)
move_method (34) (9) (268)
pull_up_constructor_body
pull_up_field
push_down_field
ratp* (16)
remove_control_flag (5) (3) (12)
remove_middle_man
remove_parameter (50) (9) (62)
rename_method (82) (363)
rcwfm* (4)
replace_data_with_object (15)
replace_exception_with_test
rmnwc* (50) (214)
rmwmo* (44)
rncgc* (36)
replace_temp_with_query
pull_up_method (3) (7)
extract_interface
Median ()
Mean ()

Note types abbreviations - cdcf = consolidate_duplicate_cond_fragment, rcwfm = replace_constructor_with_factory_method, ratp = remove_assignment_to_parameters, rmnwc = replace_magic_number_with_constant, rmwmo = replace_method_with_method_object, rncgc = replace_nested_cond_guard_clause

Table 7: Classification of R for API classes with presver ratio. Note that one class may be classified under several refactoring types. Note (–) represents no matches. We also show the total of all breakages (cu. + ncu.) and use to colors to highlight when prsv = low and prsv = high.

5.4 Findings for RQ4

Table 7 shows a classification of the applied R for the three libraries guava, xerces, and httpclient in our collected dataset. As seen in the table, guava developers applied the change_parameter R  28 times to breaking client-used API classes. Developers subsequently applied the same R and broke 25 non client-used API classes during the library evolution of guava. We find that the guava and xerces libraries tend to refactor and break their versions during evolution than httpclient. Our results align with the findings of Cossette et al. Cossette2012 () on API transformations, where they also used the same libraries in their experiments. Results in Table 7 show that developers apply specific R more frequently when evolving their libraries. For instance, R such as move_method (guava- 34 R), change_parameter (httpclient- 11 R), and rename_method (xerces- 363 R) were the most frequently applied that cause API breakages. For client-used API classes, remove_parameter (guava- 30 R), move_method (httpclient- 3 R) and rename_method (xerces- 146 R) are reported as most frequent. Notably, move_method (guava- 190 R, xerces- 256 R), remove_parameter (httpclient- 8 R) were applied to non client-used API classes.

Table 7 also reports the prsv ratio for each library. This metric measures the degree of likelihood to which library developers apply certain R to client-used API classes compared to non client-used API classes (i.e., preserving client-used API classes). We use color to highlight the prsv scores. Green highlights in the table represents a low preservation of client-used API classes, while the red highlights indicates a high ratio of R in non client-used API classes. For example, the library developers of both guava (prsv= 0.08) and xerces (prsv = 0.05) tend to apply less move_method refactoring operations to client-used API classes. Our results shows library maintainers are less likely to refactor (using the more frequent R) client-used API classes than non client-used API classes. For example, 5 out of 10 R in guava, 3 out of 5 R types in httpclient, and 16 out of 17 R types in xerces are less likely applied to client-used API classes. We find that many high prsv ratios (depicted by red in the table) where by the rarely applied R types (e.g., remove_control_flag (guava- 5 R, httpclient- 2 R) and pull_up_method (httpclient- 3 R, xerces- 7 R).

Findings show that library maintainers were more likely to refactor non client-used API classes compared to client-used API classes. \@testtrue

6 Discussion

In this section, we first discuss the implications of results and then compare with related work. We then discuss some challenges of our approach and finally present threats to the validity of our study.

6.1 Implications

Our results indicate that when evolving libraries, out of all code changes applied, maintainers are less likely to apply incompatible code changes to external API classes compared to the other classes during the library evolution. This implies that library developers may understand the efforts by clients needed to update their libraries. Complementary to this finding, Bloch mentions the growing awareness of library maintainers to APIs Bloch:2006 (). This is also reinforced by Seo et al. Seo:2014 () where they found that there are many cases where API breakage changes are only applied when unavoidable (i.e., in response to either vulnerabilities or needed bug fixes etc…). There are benefits to this awareness of client-used API breakages. In particular, the evolution of APIs encourages trust and reduce the latency of adoption by client projects, which is currently being experienced as a problem by many OSS clients KulaSANER2014 (). Larman LarmanPV () introduced a notion of the Protected Variation (PV) pattern: identify points of predicted variation and create a stable interface around them. This PV pattern could explain how contemporary developers build and evolve libraries in relation to client-used APIs.

6.2 Comparison to Literature

It is important to understand that our work cannot be simply compared at face-value to prior studies. As outlined in Section 4.1 there are obvious differences with our approach, compared to the studies of Dig and Johnson Dig2006 () and Cossette et al Cossette2012 (). Dig and Johnson used the change logs as heuristic to locate all API changes, and other considered public entities are APIs. In this study, we detect syntactic changes in classes to infer changes and determine client usage to identify if the change has an effect to its users. As a result, our approach is unable to detect behavioral API breakages. Dig and Johnson’s study included behavioral breakages, which we do not consider due to the limitations of our approach. Our definition of an API does differ from prior work. Dig and Johnson considered all public entities. Our work are more similar to the work of Cossette et al., in which we include the detection of protected entities. In this work, we go further and use client usage to focus on API breakages the more popular APIs.

The usage of tools revealed more API breakages, some of which were not reported in the API change logs, which was also consistent with the findings of Murphy-Hill et al Murphy-Hill2009 (). These undocumented API changes could also explain the disparity in results between manual (i.e., Dig and Johnson study) and machinery refactoring detection. For mechanical refactoring detection, since Ref-Finder is template-based refactoring reconstructing approach, we were only able to identify 23 out of 70 of Fowler’s catalog. In fact, Cossette et al. Cossette2012 () also believed that tools would miss some behavioral refactoring, saying that they ‘…do not believe that some changes would be easily handled by mechanical transformation tools; instead the API maintainer, or the client developer would need to craft some minimal specification that would describe how to remap classes to accommodate these breaking changes.’ Another difference in our method that may have influenced results, is where we analyze API changes between consecutive versions, while prior work analyzed versions that were not consecutive.

Figure 10: Example of a misidentified detected refactoring-related API breaking change (Ref breaking class).Ref-Finder detects this code change between Guava version 12 and 13 as an add_parameter R, while the API breaking tool reports it as binary incompatible modified method.

6.3 Challenges of the Automated Approach

Key threats to the automated approach accuracy is when: (i) refactorings are missed by our approach (i.e., Ref-Finder Soares2013 ()), (ii) developers may not report all API changes Murphy-Hill2009 () and (iii) misidentification of breaking APIs is reported but it did not cause a breakage. Figure 10 presents an example of a misidentification reported by our automated approach 131313 commit can be found at goo.gl/CwXoBj and API change at https://goo.gl/VPPTIX . In this example, Ref-Finder detects this change as an add_parameter R, that is also API breaking since it has a change in the method signature. However, according to Java documentation, the superclass extends indicates that the change of this special ‘type parameter of the class does not, in itself, have any implications for binary compatibility’. We believe these limitations will encourage researchers to further investigate and help us understand how developers evolve their libraries, especially in regards to avoidable API breakages.

6.4 Threats to Validity

Internal Threats: The most significant internal threat is correctness of the automated tools, especially Ref-Finder. To mitigate this and as a sanity check, we randomly inspected a small sample of the results for validation. Mentioned earlier in the paper, an example of a false positive was when a unchanged file was reported to have a refactoring identified. In the end, we understand that recall is not as obvious to investigate as ground truth is unknown. Ref-Finder is the current state–of–the–art and actively used in research.

Another minor threat to our approach is that API breakages false positives caused by the class-level granularity of analysis. Theoretically, an external API class that has a breakage related to a private entity could be a false positive. However, even with this assumption in mind, our analysis may be underestimations. It is true that the accuracy of the saturation point is fairly dependent on the sample size. We believe our sample clients are sufficient to at least identify the most popular APIs that reside in the client-used API classes. Sometimes variations between the refactored classpath (originating from source code) and API breakages class path (originating from binary code) may cause a miss-match. To overcome this, we manually validated the consistency of file paths to ensure consistency and completeness. Correct ordering of consecutive library releases is another minor threat. We therefore consider Maven MavenCentralURL () as the ground truth to base our chronological ordering of the released versions of a library. Some of our conclusions are based on the statistical analysis. We believe that due to outliers and nature of the data collected, non parametric statistical tests were deemed appropriate.

External Threats: As an external threat, we understand that our collected clients and the six selected OSS libraries are not necessary complete representations of the real world. However, we believe that the diverse nature (such as size, domain, team) of the six libraries is enough to assume generalization. Although our approximations of external APIs can only be justified through documentation and developers, we believe our method provides sufficient confidence of external client coverage. Another important threat is selection of the more popular libraries. As a results, our findings may not be applicable for less popular libraries. In this study, we consider that both library developers and users are more concerned with popular APIs, as they tend to reach a larger client user-base. Moreover, the same libraries that we study have been used in prior studies by researchers. As future work, we plan to expand our study to investigate more frameworks and libraries. Since our study is focused only on java libraries, we cannot make generalizations to other programming languages. We are confident that our research method is scalable and can be replicated with different sets of clients and subject libraries in other languages.

7 Related Work

In this section, we introduce literature related to API usage, library migration support and library evolution.

API usage. There has been different work that have collected clients API usage. For example, work such as De Roover et al. DeRoover2013 () exploit API usage to understand popularity and usage patterns of clients. The data collected is visualized to further explore to provide program comprehension as well as identify patterns in the code. Another set of research use the API usage as a measure of stability or popularity McDonnell2013 (); Mileva:2009 (). Our previous work 2014VISSOFTKula (), among work leveraged popularity to recommend when libraries are deemed safe to use by the masses. Other related work that studied the impact of API evolution on their clients on online forums such as Stack Overflow DBLP:conf/iwpc/VasquezBPOP14 () and the Android App DBLP:journals/tse/BavotaVBPOP15 (), Pharo DBLP:conf/icsm/HoraRAEDV15 () and Smalltalk ecosystems DBLP:conf/sigsoft/RobbesLR12 ().

Library Migration Support. Much work has been in transformation of the client code to support migration of library API changes. Work by Chow and Notkin Chow:1996 () and Balaban et al. Balaban:2005 () use a change specification language. There is work that provides the client with automatic tool support to accommodate changes made the APIs of a library. For instance, SemDiff Dagenais:2009 () recommends replacements for framework methods that were accessed by clients. Other similar tools were proposed by Xing and Stroulia Xing2007 () and Schafer et al. Schafer:2008 (). Other work on reuse support is through code analysis. This area of work considers code clone detection techniques KamiyaTSE2002 () to support which library version is most appropriate candidate for migration. Godfrey et al. Godfrey2005 () proposed origin analysis to recover context of code changes. Our previous work Kawamitsu2014 () tracked how code is reused cross-projects. Related works Kapur2010 () focused on support for clients migrating to a newer library version. Likewise, other works McDonnell2013 (); Malpohl:2000 (); Mezini:1997 (); Steyaert:1996 () studied how library maintainers balance API compatibility with an evolving library.

Library Evolution. There is similar work with respect to library maintenance and evolution. Cossette et al. Cossette2012 () manually illustrated the complexities of library changes and transformations. Other work such as Kim et al. Kim2011 () studied the role of refactoring during software evolution. Recently, there has been large-scale empirical studies conducted on library migrations and evolution. Empirical studies by Raemakers et al. Raemaekers2014 (); RaemaekersICSM (), Jezek et al. Jezek2015 () and Joel et al. Cox:2015 () studied in-depth how libraries that reside in the Maven Central super-repository evolve and break APIs.

8 Conclusions and Future Work

Refactorings is a key maintainability practice, even for library maintainers. When evolving code, we find that library developers are less likely to break APIs. However, we find that many of these API breaking changes relate to bug fixes and new features, with only up to 37% of client-used API breakages related to refactoring operations. The study finds that there are still challenges to improving our tools. The study also reveals challenges faced by the tools. As future work, we envision that this study encourages more research into automated refactoring detection techniques to advance our understanding of refactoring activities on API breakages.

9 Acknowledgments

This work is supported by JSPS KANENHI (Grant Numbers JP25220003 and JP26280021) and the “Osaka University Program for Promoting International Joint Research.”.

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
254442
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description