Do Developers Update Their Library Dependencies?
Third-party library reuse has become common practice in contemporary software development, as it includes several benefits for developers. Library dependencies are constantly evolving, with newly added features and patches that fix bugs in older versions. To take full advantage of third-party reuse, developers should always keep up to date with the latest versions of their library dependencies. In this paper, we investigate the extent of which developers update their library dependencies. Specifically, we conducted an empirical study on library migration that covers over 4,600 GitHub software projects and 2,700 library dependencies. Results show that although many of these systems rely heavily on dependencies, 81.5% of the studied systems still keep their outdated dependencies. In the case of updating a vulnerable dependency, the study reveals that affected developers are not likely to respond to a security advisory. Surveying these developers, we find that 69% of the interviewees claim that they were unaware of their vulnerable dependencies. Furthermore, developers are not likely to prioritize library updates, citing it as extra effort and added responsibility. This study concludes that even though third-party reuse is commonplace, the practice of updating a dependency is not as common for many developers.
Keywords:software reuse, software maintenance, security vulnerabilities
In contemporary software development, developers often rely on third-party libraries to provide a specific functionality in their applications. In 2010, Sonatype reported that Maven Central111http://search.maven.org/ contained over 260,000 maven libraries222Link at http://goo.gl/SV9d68. As of November 2016, this collection of libraries rose to 1,669,639 unique Maven libraries333statistics accessed Nov-26th-2016 at https://search.maven.org/#stats, which is almost six times more than it was in 2010 and making it one of the largest hosting repositories of OSS libraries. Libraries aim to save both time and resources and reduce redundancy by taking advantage of existing quality implementations.
Many libraries are in constant evolution, releasing newer versions that fix defects, patch vulnerabilities and enhance features. In fact, Lehman:1996 states that software either ‘undergoes continual changes or becomes progressively less useful’. As software development transitions into the maintenance phase, a developer becomes the maintainer and is faced with the following software maintenance dilemma: ‘When should I update my current library dependencies?’ We define this dilemma of updating libraries as the library migration process, which involves movement from a specific library version towards a newer replacement version of the same library, or to a different library altogether.
The decision to migrate a library can range from being rather trivial to extremely difficult. Typically, a developer evaluates the overall quality of the new release version, taking into account: (i) new features, (ii) compatibility compared to the current version, (iii) popular usage by other systems and (iv) documentation, support and longevity provided by the library. On the other hand, migration of a vulnerable dependency requires an immediate response from the developer. It is strongly recommended to immediately migrate a vulnerable dependency, as it exposes the dependent application to malicious attacks. In response to these vulnerabilities, emergence of awareness mechanisms such as the Common Vulnerabilities and Exposures (CVE)444http://cve.mitre.org/cve/index.html database are designed to raise developer awareness and trigger the migration of a vulnerable dependency.
In this paper, we investigate the extent of how library migration is practiced in the real-world. Our goals are to investigate (1) whether or not library dependencies are being updated and (2) the level of developer awareness to library migration opportunities. Specifically, we performed a large-scale empirical study to track library migrations between an application client (defined as a system) and their dependent library provider (defined as a library). The study encompasses 4,659 projects, 8 case studies and a developer survey to draw the following conclusions:
(1) Library Migration in Practice: Although systems depend heavily on libraries, findings show that many of these systems rarely update their library dependencies. Developers are less likely to migrate their library dependencies, with up to 81.5% of systems keeping outdated dependencies.
(2) Developer Responsiveness to Awareness Mechanisms: Our findings indicates patterns of either consistent migration or a lack of library migration. We find many cases where developers prefer an older and popular dependency over a newer replacement. Importantly, the study depicts developers as being non responsive to a security advisory. In a follow-up survey of affected developers, 69% of the interviewees claim that they were unaware of the vulnerability and who then promptly migrated away from that vulnerable dependency. Furthermore, developers cite (i) a lack of awareness in regard to library migration opportunities, (ii) impact and priority of the dependency, and (iii) the assigned roles and responsibilities as deciding factors on whether or not they should migrate a library dependency.
Our main contributions are three-fold. Our first contribution is a study on library migration pertaining to developer responsiveness to existing awareness mechanisms (i.e., security advisory). Our second contribution is the modeling of library migration from system and library dimensions, with different metrics and visualizations such as the Library Migration Plot (LMP). Finally, we make available our dataset of 852,322 library dependency migrations. All our tools and data are publicly available from the paper’s replication package at https://raux.github.io/Impact-of-Security-Advisories-on-Library-Migrations/.
1.1 Paper Organization
The rest of the paper is organized as follows. Section 2 describes the basic concepts of library migrations and awareness mechanisms. Section 3 motivates our research questions, while Section 4 describes our research methods to address them. The results and case studies of the empirical study are presented in Section LABEL:sec:prac and Section LABEL:sec:LMT. We then discuss implications of our results and the validity threats in Section LABEL:sec:dis, with Section LABEL:sec:related surveying related works. Finally, Section LABEL:sec:conclude concludes our paper.
2 Basic Concepts & Definitions
In this section, we introduce the library migration process and the related terminologies that will be used in the paper. Building on our previous work of trusting the latest versions of libraries (KulaSANER2014) and visualizing the evolution of libraries (2014VISSOFTKula), this paper is concerned with empirically tracking library migration and understanding the awareness mechanisms that trigger the migration process. We first present the library migration process in Section 2.1. Then later in Section 2.2, we introduce two common awareness mechanisms that are designed to trigger a library migration.
2.1 The Library Migration Process
We identify these three generic steps performed by a developer during the library migration process:
Step 1: Awareness of a Library Migration Opportunity. Step 1 is triggered when a developer becomes aware of an opportunity to migrate a specific dependency. The awareness mechanism may be in the form of either a new release announcement or a security advisory by authors of the library. In order for a successful migration, a developer must also identify a suitable replacement for the current dependency. In the case of a vulnerable dependency, a developer must identify a safe (patched) library version as a viable replacement candidate for the migration.
Step 2: Migration Effort to Facilitate the Replacement Dependency. Step 2 involves the efforts of a developer to ensure that the replacement dependency is successfully integrated into the system. Specifically, we define this migration effort as the amount of work and testing needed to facilitate the replacement dependency. This step may involve writing additional integration code and testing to make sure that the replacement library does not break current functionality, or affect other dependencies that co-exist within the system.
Step 3: Performing the Library Migration. Step 3 ends the library migration process. Once the migration effort in Step 2 is completed, the prior dependency is then abandoned, with the replacement library adopted by the system.
2.2 Library Migration Awareness Mechanisms
To trigger the library migration process, developers must first become aware of the necessity to migrate a dependency. In this section, we discuss the two most common types of awareness mechanisms that include (1) a new version release announcement and (2) a security advisory.
(1) A New Release Announcement:
The traditional method to raise awareness of a new release is through an announcement from the official homepage of the library. Documentation such as the developer change logs are useful guides to estimate the migration effort needed to perform a successful migration. In detail, we can infer the migration effort required from the following two sources:
Change logs of releases - New releases may be caused by newer versions that support the state-of-the-art environments (i.e., support for the Java Development Kit (JDK)). Specific to the library, the change logs detail API changes between releases555Application Programming Interface (API) changes will result in more migration effort for developers, new features and fixes to bugs in the prior versions.
Semantic versioning of releases - The semantic versioning naming convention666http://semver.org/ hints the migration effort needed to perform the migration. For instance, a major released version may require more migration effort than a minor released version of that library.
(2) A Security Advisory:
A security advisory is an official public announcement of a verified vulnerable library dependency. Security advisories are circulated through various mail forums, special mailing lists and security forums with the key objective of raising developer awareness to these vulnerabilities. Figure 1 is an example of a mail announcement of the CVE-2014-0050 vulnerability sent to Apache Open Source developers and maintainers. Vendors and researchers keep track of each vulnerability through a tagged CVE Identifier (i.e., CVE-xxx-xxxx). Generally, the advisory contains the following information: (i) a description of the vulnerability, (ii) a list of affected dependencies and (iii) a set of mitigation steps that usually includes a viable (patched) replacement dependency.
In order to understand the required library migration effort, we first need to understand the role played by a security advisory in the life-cycle of a vulnerability. As defined by CVE, a vulnerability undergoes the following four phases:
Threat detection - this is the phase where the vulnerability threat is first discovered by security analysts.
CVE assessment - this is the phase where the threat is assessed and assigned a rating by the CVE.
Security advisory - this is the phase where the threat is publicly disclosed to awareness mechanisms such as the US National Vulnerability Database (NVD)777https://web.nvd.nist.gov/ to gain the attention of maintainers and developers.
Patch release - this is the phase where the library developers provide mitigation options, such as a replacement dependency to patch the threat.
Once a viable replacement dependency (i.e., patch release) becomes available, developers can proceed to complete the library migration process. There exist cases where the vulnerability life-cycle is not synchronized with the migration process. For instance, a viable replacement dependency may become available before the security advisory. In this case, a developer may migrate their vulnerable dependency before the security advisory is disclosed to the general public.
3 Research Questions
Our motivation stems from reports of outdated and vulnerable libraries being widespread in the software industry. In 2014, Heartbleed888https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-0160, Poodle999https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566, Shellshock101010https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6271, –all high profile library vulnerabilities were found to have affected a significant portion of the software industry. In that same year, Sonatype determined that over 6% of the download requests from the Maven Central repository were for component versions that included known vulnerabilities. The company reported that in review of over 1,500 applications, each of them had an average of 24 severe or critical flaws inherited from their components111111report published January 02, 2015 at http://goo.gl/i8J1Zq.
The goals of our study is to investigate (1) whether or not dependencies are being updated and (2) the level of developer awareness to dependency migration opportunities. To do so, we design three research questions that involves a rigorous empirical study and follow-up survey on reasons why developers did not update their library dependencies. Hence, we first formulate (RQ1) to investigate library migration in practice:
Library Migration in Practice.
(RQ1) To what extent are developers updating their library dependencies? Prior studies have shown that developer responsiveness to library updates is slow and lagging. A study by Robbes:2012 shows how projects from the Smalltalk ecosystem exhibited a slower reaction to Application Programming Interface (API) updates. Similar results were observed for projects developed in the Pharo (hora:2015) and Java (Sawant2016) programming languages. Bavota:2015 studies how changes in an Application Programming Interface (API) may trigger library migrations within the ecosystem of Apache products. These studies are examples of current literature that has analyzed trends of library usage at the API level of abstraction.
In this work, we would like to better understand (i) the extent to which developers use third-party libraries and (ii) the migration trends of these libraries. Therefore, in (RQ1), we define and model library migration as evolving systems and their library dependencies at a higher abstraction than the API level.
In this study, we are particularly interested in the effect of awareness mechanisms on maintainers. Henceforth, (RQ2) and (RQ3) were formulated to investigate how developers respond to current awareness mechanisms:
Developer Responsiveness to Awareness Mechanisms.
(RQ2) What is the response to important awareness mechanisms such as a new release announcement and a security advisory on library updates? To fully utilize the benefits of a library, developer are recommended to make an immediate response to a library migration opportunity. Therefore, in (RQ2) we study maintainer responsiveness to the awareness mechanisms of (i) new releases and (ii) security advisories.
(RQ3) Why are developers non responsive to a security advisory? Studies show that influencing factors such as personal opinions, organizational structure or technical constraints (Bogart:SCGSE15; Plate:ICSME2015) determines whether or not a developer will migrate a dependency. In fact, these studies conclude that developers often ‘struggle’ with change, citing current awareness mechanisms as being insufficient. However, we conjecture that a vulnerable dependency warrants the immediate attention of all project members. Therefore, in (RQ3) we seek developer feedback to understand why developers would not respond to a vulnerable dependency threat.
4 Research Methods
In this section, we present the research methods used to address each of the three research questions. Firstly, to answer (RQ1), we conduct an empirical study by mining and reconstructing historic library migrations for a set of real-world projects. For (RQ2), we analyze case studies of library migrations pertaining to new releases and vulnerable dependency updates. Finally to answer (RQ3), we interview developers who currently have vulnerable dependencies in their projects.
4.1 (Rq1) To what extent are developers updating their library dependencies?
Our research method to answer the first research question (RQ1) is a vigorous statistical analysis of library migration for real-world projects. Our method is comprised of three steps: (1) tracking systems and dependency updates, (2) extraction and analysis system and library dependency measures (3) data collection. The results of (RQ1) are presented in Section LABEL:sec:prac.
(1) Tracking System and Library Updates:
To accurately track dependency migrations, we define a model of system and library dependency relations. Hence, we formally use the following notations. We define for a system, and for a library. (lib,v) denotes version of a library lib, and (sys,w) for version of a system sys. Adoption of a library version (lib,v) by a system version (sys,w) creates a dependency relation between them.
Figure 2 illustrates the notation used to represent the dependency relations between systems and libraries over time. This model consists of the following systems and libraries:
Library A has 1 version (A,1).
System B has 2 versions (B,1) and (B,2).
Library C has 2 versions (C,1) and (C,2).
System D has 3 versions (D,1), (D,2) and (D,3).
Figure 2 depicts the following library dependency relationships as an orange dotted line. Below we list all dependencies between these systems and libraries at some point in time:
Library (A,1) is used as a dependency of system B.
Library (C,1) is used as a dependency of system B and D.
Library (C,2) is used as a dependency of system D.
From a system perspective, our model is able to track how often maintainers update their libraries. Since a system version may contain multiple dependency migrations, we track the number of migrations that occur during one system update, which is denoted as DU.
Dependency Update (DU) is a count of library migrations that occur at one system version update.
Figure 2 depicts an example of a DU update where at the release of (B,2), one dependency update occurred (i.e., DU=1). We can see in the figure, that for (B,2), a new dependency ((A,1)) is added while still keeping the (B,1) dependency.
From the alternative library viewpoint, our model is able to track library usage trends over time. We track the number of library migrations that occur within the universe of known systems to determine the usage of a library, which is denoted as LU.
Library Usage (LU) is the total population count of dependent systems at a specific point in time.
Figure 2 shows an example of the LU metrics. The figure shows that at point in time, the LU of (C,1) is two (B and D). However at point , since (D,2) migrates its dependency to (C,2), the LU of (C,1) becomes one (B) while the LU of (C,2) is now one (D). Moreover, systems can depend on older versions of a library. This is modeled and shown in the figure, as a line branching out from the original line of libraries. For instance, library C separates into two different branches because (C,1) is still being actively depended upon by other systems (i.e., (A,2)).
|m1||System||Dep. Per System (#Dep.)||# Dependencies|
|m2||Dep. Update Per System (DU)||# Dependencies updated|
|m3||Library||Library Usage(LU)||# library users|
|m4||Peak LU||max. # library users|
|m5||Current LU||current # library users|
|m6||Pre-Peak||time to reach Peak LU|
|m7||Post-Peak||time after Peak LU|
|m8||Library Residue||% remaining systems after Peak LU|
(2) Analysis Method:
Table 1 provides a summary of the metrics provided by our model. To fully understand this phenomena, we analyze library migrations from both the system and library dimensions.
From the system dimension, we use system metrics to investigate the distribution of dependencies per system (m1) and the frequency of library migrations per library (m2). First, we utilize boxplots and descriptive statistics to report the median () and mean () for each metric. We then test the hypothesis that systems with more dependencies tend to have more frequent updates. We employ the Spearman and Pearson correlation tests (Edgell84) to determine any correlation relation between metrics m1 and m2. A high correlation score confirms the assumption that a more complex systems will tend to have more updates, while a low correlation will confirm the hypothesis that the number of library dependencies does not influence the frequency of updates.
From the library dimension, we investigate how the migration away from a specific library dependency spreads over time. This work is inspired by the Diffusion of Innovation curves (DoI), which seeks to explain how, why, and at what rate new ideas and technology spreads. Figure 3 is a visual example of the LU metrics from Table 1. We utilize the LU metrics to study the (i) LU trends (i.e., whether or not a library dependency is gaining or losing system users) and the (ii) rate of decline after system users begin to migrate away from the dependency. Based on the LU (m3) metric, Figure 3 introduces a simple example of the derived LU metrics that characterize a LU trend:
LU counts - The Peak LU (m4) metric describes the maximum population count of user systems reached by a dependency. The Current LU (m5) is a related metric that describes the latest population count of user systems that actively use this dependency in their systems.
LU over time - The Pre-Peak (m6) metric refers to the time taken for a dependency to reach a peak LU (days). Conversely, Post-Peak (m7) metric refers to the time passed since the peak LU was reached (days).
LU rate after Peak LU - The Library Residue (m8) metric describes the percentage of user systems remaining after Peak LU (m4) has been reached for a dependency (i.e., Current LU (m5) / Peak LU (m4)).
In Figure 3, we show the LU metrics as a LU trending curve. In detail, we find that the Peak LU is 5 users at t1, with the current LU at 2 users. At the starting point , Pre-Peak is the period from to and Post-Peak being the time from to . Quantitatively, we conjecture that the low Library Residue (i.e., 40% (2/5)) indicates that a developer using this dependency should consider migration towards a replacement dependency.
To address the library dimension of (RQ1), we present four statistical analysis to report the LU trends. First, we use a cumulative frequency distribution graph to understand the distribution of popular library versions (m4 and m5). We then use a cumulative distribution to measure the average time for libraries to reach their peak usages (m6 and m7). Third, we use boxplots to measure the distribution of the Library Residue metric (m8). Finally, we plot and analyze the amount of system dependencies and their Library Residue.
(3) Data Collection:
It is important that we test our approach from a quality set of real-world projects to improve confidence on our results. Therefore, we conducted a large-scale empirical evaluation of software systems and library migrations, focusing on popular Java projects that use Maven libraries as their third-party dependencies. We mine and collect projects that reside in GitHub121212https://github.com/ as the source of our dataset. To ensure that our dataset is a quality representation of real-world applications, we enforce the following pre-processing data quality filters:
Projects that are mature and well-maintained - The first quality filter is to ensure that migrations are indicative of active and large-scale projects that are hosted on GitHub (i.e., removing toy projects). Hence, we select projects that had more than 100 commits and had at least a recent commit between January 2015 and November 2015.
Projects that are unique and not duplicates - The second quality filter is to ensure that no duplicates exist within the collected dataset. Hence, we semi-automatically inspect repository names to validate that none of the projects are forks from other projects (i.e., same project name in different repository).
Projects that use a dependency management tool - We conjecture that projects managed by a dependency management tool is more likely to consider library migration practices. Therefore, the third filter distinguishes projects that implement a dependency management tool such as the Maven dependency management tool. For a Maven dependency, every project in the Maven repository includes a Project Object Model file (i.e., pom.xml) that describes the project’s configuration meta-data —including its compile and run time dependencies.