Do Developers Update Their Library Dependencies?

Do Developers Update Their Library Dependencies?

An Empirical Study on the Impact of Security Advisories on Library Migration
Raula Gaikovina Kula Raula Gaikovina Kula, Ali Ouni, Takashi Ishio and Katsuro Inoue Osaka University, Japan 33email: {raula-k, ali, ishio, inoue} M. German University of Victoria, Canada 66email:    Daniel M. German Raula Gaikovina Kula, Ali Ouni, Takashi Ishio and Katsuro Inoue Osaka University, Japan 33email: {raula-k, ali, ishio, inoue} M. German University of Victoria, Canada 66email:    Ali Ouni Raula Gaikovina Kula, Ali Ouni, Takashi Ishio and Katsuro Inoue Osaka University, Japan 33email: {raula-k, ali, ishio, inoue} M. German University of Victoria, Canada 66email:    Takashi Ishio Raula Gaikovina Kula, Ali Ouni, Takashi Ishio and Katsuro Inoue Osaka University, Japan 33email: {raula-k, ali, ishio, inoue} M. German University of Victoria, Canada 66email:    Katsuro Inoue Raula Gaikovina Kula, Ali Ouni, Takashi Ishio and Katsuro Inoue Osaka University, Japan 33email: {raula-k, ali, ishio, inoue} M. German University of Victoria, Canada 66email:
Received: date / Accepted: date

Third-party library reuse has become common practice in contemporary software development, as it includes several benefits for developers. Library dependencies are constantly evolving, with newly added features and patches that fix bugs in older versions. To take full advantage of third-party reuse, developers should always keep up to date with the latest versions of their library dependencies. In this paper, we investigate the extent of which developers update their library dependencies. Specifically, we conducted an empirical study on library migration that covers over 4,600 GitHub software projects and 2,700 library dependencies. Results show that although many of these systems rely heavily on dependencies, 81.5% of the studied systems still keep their outdated dependencies. In the case of updating a vulnerable dependency, the study reveals that affected developers are not likely to respond to a security advisory. Surveying these developers, we find that 69% of the interviewees claim that they were unaware of their vulnerable dependencies. Furthermore, developers are not likely to prioritize library updates, citing it as extra effort and added responsibility. This study concludes that even though third-party reuse is commonplace, the practice of updating a dependency is not as common for many developers.

software reuse, software maintenance, security vulnerabilities

1 Introduction

In contemporary software development, developers often rely on third-party libraries to provide a specific functionality in their applications. In 2010, Sonatype reported that Maven Central111 contained over 260,000 maven libraries222Link at As of November 2016, this collection of libraries rose to 1,669,639 unique Maven libraries333statistics accessed Nov-26th-2016 at, which is almost six times more than it was in 2010 and making it one of the largest hosting repositories of OSS libraries. Libraries aim to save both time and resources and reduce redundancy by taking advantage of existing quality implementations.

Many libraries are in constant evolution, releasing newer versions that fix defects, patch vulnerabilities and enhance features. In fact, Lehman:1996 states that software either ‘undergoes continual changes or becomes progressively less useful’. As software development transitions into the maintenance phase, a developer becomes the maintainer and is faced with the following software maintenance dilemma: ‘When should I update my current library dependencies?’ We define this dilemma of updating libraries as the library migration process, which involves movement from a specific library version towards a newer replacement version of the same library, or to a different library altogether.

The decision to migrate a library can range from being rather trivial to extremely difficult. Typically, a developer evaluates the overall quality of the new release version, taking into account: (i) new features, (ii) compatibility compared to the current version, (iii) popular usage by other systems and (iv) documentation, support and longevity provided by the library. On the other hand, migration of a vulnerable dependency requires an immediate response from the developer. It is strongly recommended to immediately migrate a vulnerable dependency, as it exposes the dependent application to malicious attacks. In response to these vulnerabilities, emergence of awareness mechanisms such as the Common Vulnerabilities and Exposures (CVE)444 database are designed to raise developer awareness and trigger the migration of a vulnerable dependency.

In this paper, we investigate the extent of how library migration is practiced in the real-world. Our goals are to investigate (1) whether or not library dependencies are being updated and (2) the level of developer awareness to library migration opportunities. Specifically, we performed a large-scale empirical study to track library migrations between an application client (defined as a system) and their dependent library provider (defined as a library). The study encompasses 4,659 projects, 8 case studies and a developer survey to draw the following conclusions:

(1) Library Migration in Practice: Although systems depend heavily on libraries, findings show that many of these systems rarely update their library dependencies. Developers are less likely to migrate their library dependencies, with up to 81.5% of systems keeping outdated dependencies.

(2) Developer Responsiveness to Awareness Mechanisms: Our findings indicates patterns of either consistent migration or a lack of library migration. We find many cases where developers prefer an older and popular dependency over a newer replacement. Importantly, the study depicts developers as being non responsive to a security advisory. In a follow-up survey of affected developers, 69% of the interviewees claim that they were unaware of the vulnerability and who then promptly migrated away from that vulnerable dependency. Furthermore, developers cite (i) a lack of awareness in regard to library migration opportunities, (ii) impact and priority of the dependency, and (iii) the assigned roles and responsibilities as deciding factors on whether or not they should migrate a library dependency.

Our main contributions are three-fold. Our first contribution is a study on library migration pertaining to developer responsiveness to existing awareness mechanisms (i.e., security advisory). Our second contribution is the modeling of library migration from system and library dimensions, with different metrics and visualizations such as the Library Migration Plot (LMP). Finally, we make available our dataset of 852,322 library dependency migrations. All our tools and data are publicly available from the paper’s replication package at

1.1 Paper Organization

The rest of the paper is organized as follows. Section 2 describes the basic concepts of library migrations and awareness mechanisms. Section 3 motivates our research questions, while Section 4 describes our research methods to address them. The results and case studies of the empirical study are presented in Section LABEL:sec:prac and Section LABEL:sec:LMT. We then discuss implications of our results and the validity threats in Section LABEL:sec:dis, with Section LABEL:sec:related surveying related works. Finally, Section LABEL:sec:conclude concludes our paper.

2 Basic Concepts & Definitions

In this section, we introduce the library migration process and the related terminologies that will be used in the paper. Building on our previous work of trusting the latest versions of libraries (KulaSANER2014) and visualizing the evolution of libraries (2014VISSOFTKula), this paper is concerned with empirically tracking library migration and understanding the awareness mechanisms that trigger the migration process. We first present the library migration process in Section 2.1. Then later in Section 2.2, we introduce two common awareness mechanisms that are designed to trigger a library migration.

2.1 The Library Migration Process

We identify these three generic steps performed by a developer during the library migration process:

  • Step 1: Awareness of a Library Migration Opportunity. Step 1 is triggered when a developer becomes aware of an opportunity to migrate a specific dependency. The awareness mechanism may be in the form of either a new release announcement or a security advisory by authors of the library. In order for a successful migration, a developer must also identify a suitable replacement for the current dependency. In the case of a vulnerable dependency, a developer must identify a safe (patched) library version as a viable replacement candidate for the migration.

  • Step 2: Migration Effort to Facilitate the Replacement Dependency. Step 2 involves the efforts of a developer to ensure that the replacement dependency is successfully integrated into the system. Specifically, we define this migration effort as the amount of work and testing needed to facilitate the replacement dependency. This step may involve writing additional integration code and testing to make sure that the replacement library does not break current functionality, or affect other dependencies that co-exist within the system.

  • Step 3: Performing the Library Migration. Step 3 ends the library migration process. Once the migration effort in Step 2 is completed, the prior dependency is then abandoned, with the replacement library adopted by the system.

2.2 Library Migration Awareness Mechanisms

To trigger the library migration process, developers must first become aware of the necessity to migrate a dependency. In this section, we discuss the two most common types of awareness mechanisms that include (1) a new version release announcement and (2) a security advisory.

Figure 1: Example of a security advisory related to CVE-2014-0050 that was posted in the Apache common developers mailing list.

(1) A New Release Announcement:

The traditional method to raise awareness of a new release is through an announcement from the official homepage of the library. Documentation such as the developer change logs are useful guides to estimate the migration effort needed to perform a successful migration. In detail, we can infer the migration effort required from the following two sources:

  1. Change logs of releases - New releases may be caused by newer versions that support the state-of-the-art environments (i.e., support for the Java Development Kit (JDK)). Specific to the library, the change logs detail API changes between releases555Application Programming Interface (API) changes will result in more migration effort for developers, new features and fixes to bugs in the prior versions.

  2. Semantic versioning of releases - The semantic versioning naming convention666 hints the migration effort needed to perform the migration. For instance, a major released version may require more migration effort than a minor released version of that library.

(2) A Security Advisory:

A security advisory is an official public announcement of a verified vulnerable library dependency. Security advisories are circulated through various mail forums, special mailing lists and security forums with the key objective of raising developer awareness to these vulnerabilities. Figure 1 is an example of a mail announcement of the CVE-2014-0050 vulnerability sent to Apache Open Source developers and maintainers. Vendors and researchers keep track of each vulnerability through a tagged CVE Identifier (i.e., CVE-xxx-xxxx). Generally, the advisory contains the following information: (i) a description of the vulnerability, (ii) a list of affected dependencies and (iii) a set of mitigation steps that usually includes a viable (patched) replacement dependency.

In order to understand the required library migration effort, we first need to understand the role played by a security advisory in the life-cycle of a vulnerability. As defined by CVE, a vulnerability undergoes the following four phases:

  1. Threat detection - this is the phase where the vulnerability threat is first discovered by security analysts.

  2. CVE assessment - this is the phase where the threat is assessed and assigned a rating by the CVE.

  3. Security advisory - this is the phase where the threat is publicly disclosed to awareness mechanisms such as the US National Vulnerability Database (NVD)777 to gain the attention of maintainers and developers.

  4. Patch release - this is the phase where the library developers provide mitigation options, such as a replacement dependency to patch the threat.

Once a viable replacement dependency (i.e., patch release) becomes available, developers can proceed to complete the library migration process. There exist cases where the vulnerability life-cycle is not synchronized with the migration process. For instance, a viable replacement dependency may become available before the security advisory. In this case, a developer may migrate their vulnerable dependency before the security advisory is disclosed to the general public.

3 Research Questions

Our motivation stems from reports of outdated and vulnerable libraries being widespread in the software industry. In 2014, Heartbleed888, Poodle999, Shellshock101010, –all high profile library vulnerabilities were found to have affected a significant portion of the software industry. In that same year, Sonatype determined that over 6% of the download requests from the Maven Central repository were for component versions that included known vulnerabilities. The company reported that in review of over 1,500 applications, each of them had an average of 24 severe or critical flaws inherited from their components111111report published January 02, 2015 at

The goals of our study is to investigate (1) whether or not dependencies are being updated and (2) the level of developer awareness to dependency migration opportunities. To do so, we design three research questions that involves a rigorous empirical study and follow-up survey on reasons why developers did not update their library dependencies. Hence, we first formulate (RQ1) to investigate library migration in practice:

Library Migration in Practice.

  • (RQ1) To what extent are developers updating their library dependencies? Prior studies have shown that developer responsiveness to library updates is slow and lagging. A study by Robbes:2012 shows how projects from the Smalltalk ecosystem exhibited a slower reaction to Application Programming Interface (API) updates. Similar results were observed for projects developed in the Pharo (hora:2015) and Java (Sawant2016) programming languages. Bavota:2015 studies how changes in an Application Programming Interface (API) may trigger library migrations within the ecosystem of Apache products. These studies are examples of current literature that has analyzed trends of library usage at the API level of abstraction.

    In this work, we would like to better understand (i) the extent to which developers use third-party libraries and (ii) the migration trends of these libraries. Therefore, in (RQ1), we define and model library migration as evolving systems and their library dependencies at a higher abstraction than the API level.

In this study, we are particularly interested in the effect of awareness mechanisms on maintainers. Henceforth, (RQ2) and (RQ3) were formulated to investigate how developers respond to current awareness mechanisms:

Developer Responsiveness to Awareness Mechanisms.

  • (RQ2) What is the response to important awareness mechanisms such as a new release announcement and a security advisory on library updates? To fully utilize the benefits of a library, developer are recommended to make an immediate response to a library migration opportunity. Therefore, in (RQ2) we study maintainer responsiveness to the awareness mechanisms of (i) new releases and (ii) security advisories.

  • (RQ3) Why are developers non responsive to a security advisory? Studies show that influencing factors such as personal opinions, organizational structure or technical constraints (Bogart:SCGSE15; Plate:ICSME2015) determines whether or not a developer will migrate a dependency. In fact, these studies conclude that developers often ‘struggle’ with change, citing current awareness mechanisms as being insufficient. However, we conjecture that a vulnerable dependency warrants the immediate attention of all project members. Therefore, in (RQ3) we seek developer feedback to understand why developers would not respond to a vulnerable dependency threat.

4 Research Methods

In this section, we present the research methods used to address each of the three research questions. Firstly, to answer (RQ1), we conduct an empirical study by mining and reconstructing historic library migrations for a set of real-world projects. For (RQ2), we analyze case studies of library migrations pertaining to new releases and vulnerable dependency updates. Finally to answer (RQ3), we interview developers who currently have vulnerable dependencies in their projects.

4.1 (Rq1) To what extent are developers updating their library dependencies?

Our research method to answer the first research question (RQ1) is a vigorous statistical analysis of library migration for real-world projects. Our method is comprised of three steps: (1) tracking systems and dependency updates, (2) extraction and analysis system and library dependency measures (3) data collection. The results of (RQ1) are presented in Section LABEL:sec:prac.

Figure 2: Library migration between systems and libraries. The orange arrow depicts dependency relations between them.

(1) Tracking System and Library Updates:

To accurately track dependency migrations, we define a model of system and library dependency relations. Hence, we formally use the following notations. We define for a system, and for a library. (lib,v) denotes version of a library lib, and (sys,w) for version of a system sys. Adoption of a library version (lib,v) by a system version (sys,w) creates a dependency relation between them.

Figure 2 illustrates the notation used to represent the dependency relations between systems and libraries over time. This model consists of the following systems and libraries:

  • Library A has 1 version (A,1).

  • System B has 2 versions (B,1) and (B,2).

  • Library C has 2 versions (C,1) and (C,2).

  • System D has 3 versions (D,1), (D,2) and (D,3).

Figure 2 depicts the following library dependency relationships as an orange dotted line. Below we list all dependencies between these systems and libraries at some point in time:

  • Library (A,1) is used as a dependency of system B.

  • Library (C,1) is used as a dependency of system B and D.

  • Library (C,2) is used as a dependency of system D.

From a system perspective, our model is able to track how often maintainers update their libraries. Since a system version may contain multiple dependency migrations, we track the number of migrations that occur during one system update, which is denoted as DU.

Dependency Update (DU) is a count of library migrations that occur at one system version update.

Figure 2 depicts an example of a DU update where at the release of (B,2), one dependency update occurred (i.e., DU=1). We can see in the figure, that for (B,2), a new dependency ((A,1)) is added while still keeping the (B,1) dependency.

From the alternative library viewpoint, our model is able to track library usage trends over time. We track the number of library migrations that occur within the universe of known systems to determine the usage of a library, which is denoted as LU.

Library Usage (LU) is the total population count of dependent systems at a specific point in time.

Figure 2 shows an example of the LU metrics. The figure shows that at point in time, the LU of (C,1) is two (B and D). However at point , since (D,2) migrates its dependency to (C,2), the LU of (C,1) becomes one (B) while the LU of (C,2) is now one (D). Moreover, systems can depend on older versions of a library. This is modeled and shown in the figure, as a line branching out from the original line of libraries. For instance, library C separates into two different branches because (C,1) is still being actively depended upon by other systems (i.e., (A,2)).

Alias Dimension Metric Brief Description
m1 System Dep. Per System (#Dep.) # Dependencies
m2 Dep. Update Per System (DU) # Dependencies updated
m3 Library Library Usage(LU) # library users
m4 Peak LU max. # library users
m5 Current LU current # library users
m6 Pre-Peak time to reach Peak LU
m7 Post-Peak time after Peak LU
m8 Library Residue % remaining systems after Peak LU
Table 1: Summary of System and Library migration metrics defined for (RQ1). Note we use dep. = Dependencies and ver. = version
Figure 3: Simple example of the LU-based metrics. We show the Peak LU at time t1, current LU at time t2 and library residue (Peak LU / Current LU).

(2) Analysis Method:

Table 1 provides a summary of the metrics provided by our model. To fully understand this phenomena, we analyze library migrations from both the system and library dimensions.

From the system dimension, we use system metrics to investigate the distribution of dependencies per system (m1) and the frequency of library migrations per library (m2). First, we utilize boxplots and descriptive statistics to report the median () and mean () for each metric. We then test the hypothesis that systems with more dependencies tend to have more frequent updates. We employ the Spearman and Pearson correlation tests (Edgell84) to determine any correlation relation between metrics m1 and m2. A high correlation score confirms the assumption that a more complex systems will tend to have more updates, while a low correlation will confirm the hypothesis that the number of library dependencies does not influence the frequency of updates.

From the library dimension, we investigate how the migration away from a specific library dependency spreads over time. This work is inspired by the Diffusion of Innovation curves (DoI), which seeks to explain how, why, and at what rate new ideas and technology spreads. Figure 3 is a visual example of the LU metrics from Table 1. We utilize the LU metrics to study the (i) LU trends (i.e., whether or not a library dependency is gaining or losing system users) and the (ii) rate of decline after system users begin to migrate away from the dependency. Based on the LU (m3) metric, Figure 3 introduces a simple example of the derived LU metrics that characterize a LU trend:

  • LU counts - The Peak LU (m4) metric describes the maximum population count of user systems reached by a dependency. The Current LU (m5) is a related metric that describes the latest population count of user systems that actively use this dependency in their systems.

  • LU over time - The Pre-Peak (m6) metric refers to the time taken for a dependency to reach a peak LU (days). Conversely, Post-Peak (m7) metric refers to the time passed since the peak LU was reached (days).

  • LU rate after Peak LU - The Library Residue (m8) metric describes the percentage of user systems remaining after Peak LU (m4) has been reached for a dependency (i.e., Current LU (m5) / Peak LU (m4)).

In Figure 3, we show the LU metrics as a LU trending curve. In detail, we find that the Peak LU is 5 users at t1, with the current LU at 2 users. At the starting point , Pre-Peak is the period from to and Post-Peak being the time from to . Quantitatively, we conjecture that the low Library Residue (i.e., 40% (2/5)) indicates that a developer using this dependency should consider migration towards a replacement dependency.

To address the library dimension of (RQ1), we present four statistical analysis to report the LU trends. First, we use a cumulative frequency distribution graph to understand the distribution of popular library versions (m4 and m5). We then use a cumulative distribution to measure the average time for libraries to reach their peak usages (m6 and m7). Third, we use boxplots to measure the distribution of the Library Residue metric (m8). Finally, we plot and analyze the amount of system dependencies and their Library Residue.

(3) Data Collection:

It is important that we test our approach from a quality set of real-world projects to improve confidence on our results. Therefore, we conducted a large-scale empirical evaluation of software systems and library migrations, focusing on popular Java projects that use Maven libraries as their third-party dependencies. We mine and collect projects that reside in GitHub121212 as the source of our dataset. To ensure that our dataset is a quality representation of real-world applications, we enforce the following pre-processing data quality filters:

  • Projects that are mature and well-maintained - The first quality filter is to ensure that migrations are indicative of active and large-scale projects that are hosted on GitHub (i.e., removing toy projects). Hence, we select projects that had more than 100 commits and had at least a recent commit between January 2015 and November 2015.

  • Projects that are unique and not duplicates - The second quality filter is to ensure that no duplicates exist within the collected dataset. Hence, we semi-automatically inspect repository names to validate that none of the projects are forks from other projects (i.e., same project name in different repository).

  • Projects that use a dependency management tool - We conjecture that projects managed by a dependency management tool is more likely to consider library migration practices. Therefore, the third filter distinguishes projects that implement a dependency management tool such as the Maven dependency management tool. For a Maven dependency, every project in the Maven repository includes a Project Object Model file (i.e., pom.xml) that describes the project’s configuration meta-data —including its compile and run time dependencies.

    1    ...
    2    <groupId>GitWalker</groupId>
    3    <artifactId>GitWalker</artifactId>
    4    <version>0.0.1-SNAPSHOT</version>
    5    <name>GitWalker</name>
    6    ...
    7    <dependencies>
    8      <dependency>
    9        <groupId></groupId>
    10        <artifactId>javaparser</artifactId>
    11        <version>1.0.8</version>
    12      </dependency>
    13      <dependency>
    14        <groupId>org.gitective</groupId>
    15        <artifactId>gitective-core</artifactId>
    16        <version>0.9.9</version>
    17      </dependency>
    18    </dependencies>
    19    ...\end{lstlisting}
    21  Listing \ref{list:pom} shows a \texttt{pom.xml}, which lists dependency relationships between a particular system version with any valid Maven library version.
    22  In this example, we extract the dependency relation for system \sys{Gitwalker}{0.0.1-SNAPSHOT} that uses the \lib{javaparser}{1.0.8} and \lib{gitective-core}{0.9.9} dependencies.
    23  To automatically extract the history of dependency migrations for a project, we mine the historic changes of the \texttt{pom.xml}.
    24  We package our method in a tool called PomWalker\footnote{\url{}}.
    26  \item \textit{Popular and latest dependency versions - } LU trends require sufficient usage by systems.
    27  As a result, we focus on the more popular libraries for a higher quality result.
    28  Moreover, to capture migrations away from a library dependency, we filter out the latest versions of any library in the dataset.
    33 \begin{table}[t]
    34  \begin{center}
    35    \fontsize{8}{10}\selectfont
    36    \tabcolsep=0.1cm
    37    \caption{ Summary of the collected dataset}
    38    \label{tab:dataset}
    39    \begin{tabular}{lcccc}
    40      \hline
    41      \multirow{1}{*}{}
    42      & \multirow{1}{*}{\textbf{Dataset statistics}}
    43      \\
    44      %Time-period outliers& &(1274)& \\ \hline
    45      projects creation dates &2004-Oct to 2009-Jan& \\
    46      projects last update & 2015-Jan to 2015-Nov\\  \hline
    47      \# unique systems (projects) &48,495 (4,659)& \\
    48      \# unique library versions & 2,736& \\
    49      %median versions per library (mean) & 4.0 (18.94)  & \\
    50      total size of projects & 630 GB & \\
    51      \hline
    52      \# commits related to \texttt{pom.xml} &4,892,770& \\
    53      \# library dependency migrations & 852,322 \\  \hline
    54    \end{tabular}
    55  \end{center}
    56 \end{table}
    58Table \ref{tab:dataset} presents a summary of the filtered 4,659 projects after pre-processing from an original collection of 10,523  GitHub projects.
    59Our study tracks dependency migration between a Maven library and each unique system within each project (\ie~a project may contain multiple systems).
    60We then mine 48,495 systems from the 4,659 software projects to extract 852,322 dependency migrations.
    61For the LU trend analysis, we filter out rarely used libraries (\ie~dependencies with less than 4 user systems are defined as unpopular) and 213 of the latest library versions, leaving 2,736 library versions available for our study.
    65\subsection {(RQ2) \RqTwo}
    66Our method to answer the second research question (RQ2) is through a case study analysis of developer responsiveness to the awareness mechanism.
    67It is comprised of three steps: (1) tracking library migration in response to awareness mechanisms (2) analysis method (3) data collection.
    68Case studies for the new release announcement are presented in Section \ref{sec:nr}, with those for the security advisory presented in Section \ref{sec:ad}.
    70 \begin{figure}
    71  \centering
    72  \includegraphics[width=1\textwidth]{LAC}
    73  \caption{A Library Migration Plot for libraries \lib{beanutils}{1.9.1} and \lib{beanutils}{1.9.2}.
    74    In this example, the release of a related security advisory \texttt{CVE-2014-0114} (black dashed line) that affects \lib{beanutils}{1.9.1} (marked with crossbones). We also show which JDK (5+) version in which the version supports.}
    75  \label{fig:LAP}
    76 \end{figure}
    78\paragraph{\textbf{(1) Tracking Migration in Response to Awareness Mechanisms}:}
    79Figure \ref{fig:LAP} presents the Library Migration Plot (LMP) used to track LU trends over time.
    80Together with documentation, we use LMPs to infer library migration patterns and trends.
    81%Figure \ref{fig:LAP} presents the Library Migration Plot (LMP) as a visualization of the migration process of user systems given a library version.
    82%The y-axis of the LMP represents-- LU of the library, at a point in time (the x-axis).
    83The LMP shows LU changes in the library (y-axis) with respect to time (x-axis).
    84The LMP curve itself should not be taken at face value, as the smoothing algorithm is generated by a predictive model and it is not a true reflection of all data points.
    85In Figure \ref{fig:LAP}, we observe that the \texttt{commons-beanutils} library \lib{commons-beanutils}{1.9.1} (red line) had 19 user systems using it as a dependency in April 2014.
    86Then by January 2015, its LU had decreased to 11 user systems.
    87The LMP depicts an effect of awareness mechanisms through annotation of either  or a new release announcement or a security advisory as follows:
    89\item \sloppypar{\textit{Official Release Announcement} - Figure \ref{fig:LAP}  depicts an example of two versions: \lib{commons-beanutils}{1.9.1} and  \lib{commons-beanutils}{1.9.2}. Hence, we can use the LMP to compare the migration patterns between versions of a library. For instance, the LMP presents the effect of the new release of \lib{commons-beanutils}{1.9.2}, illustrated by the declining LU curve at \lib{commons-beanutils}{1.9.1}}.
    91\item  \sloppypar{\textit{Security Advisory Disclosure} - Figure \ref{fig:LAP} annotates when the security advisory \texttt{CVE-2014-0114} was disclosed to the public (\ie~April 2014). In detail, the LMP presents evidence of how a security vulnerability triggers the library migration from \lib{commons-beanutils}{1.9.1}, illustrated by its declining LU curve.}
    92%To ensure that our LMPs contain only regularly updated systems and to filter out older libraries, for the study, the LMPs are generated for systems that were last updated from January, 2015.}
    95\paragraph{\textbf{(2) Analysis Method}:}
    96Our approach to answer (RQ2) involves a manual case study analysis to understand developer responsiveness to either a new release announcement or a security advisory.
    97%Therefore, to address RQ2, we use the LMP to visually show the responsiveness of developers to awareness mechanisms.
    98For more useful and practical scenarios, selection of our case studies included (i) new releases from the more popular libraries (\ie~as they tend to impact more developers) and (ii) more severe security advisories  (\ie~warrants immediate developer attention).
    100At the quantitative level, we first visually analyze the LMP, using our LU metrics to quantify the LU trend response towards the awareness mechanism.
    101We then manually consult online documentation such as the release logs, and its semantic versioning schema to estimate the effort needed to migrate towards a newer replacement dependency.
    102%For a security advisory, we use the LMP to infer the reaction and effect of the awareness mechanisms, which is the security advisory release date.
    103%Furthermore, we use the LMP to investigate LU trends and when the security advisory was disclosed to understand its effect on related library releases.
    104For the vulnerable dependencies, we consult information from the security advisory and the life-cycle of a vulnerability (See Section \ref{sec:am}) to estimate the needed migration effort.
    105For example, in Figure \ref{fig:LAP}, we infer from the release notes that \lib{commons-beanutils}{1.9.1} to \lib{commons-beanutils}{1.9.2} update is a compatible minor update with 2 bug fixes and 1 new feature.
    106Since both are supported by the latest JDK (Java 5 and higher), we assume that the migration effort required is much lower compared to a migration to a different JDK environment.
    108%we collect feedback from developers with systems that show no reaction to the awareness mechanisms. Our strategy for qualitative analysis of the feedback is through (i) reading each response, (ii) checking and summarizing text by consistency and omissions and (iii) looking for similarities or differences between the responses.
    110  \begin{center}
    111    \fontsize{8}{10}\selectfont
    112    \tabcolsep=0.1cm
    113    \caption{Top 20 LU library versions
    114    }
    115    \label{tab:newCandid}
    116    \begin{tabular}{llccc}
    117      \hline
    118      \multirow{1}{*}{}
    119      & \multirow{1}{*}{\textbf{Library}}
    120      & \multirow{1}{*}{\textbf{Versions}}
    121      \\
    122      *&junit& (4.11), (4.10), (4.8.2), (3.8.1), (4.8.1)\\
    123      &javax.servlet-servlet-api& (2.5)\\
    124      &commons-io-commons-io& (2.4), (2.6)\\
    125      *&log4j-log4j&(1.2.16), (1.2.17)\\
    126      &commons-lang& (2.6)\\
    127      &commons-logging& (1.1.1)\\
    128      &commons-lang& (3-3.1)\\
    129      &commons-collections& (3.2.1)\\
    130      &javax.servlet-jstl& (1.2)\\
    131      &org.mockito-mockito-all& (1.9.5)\\
    132      &commons-httpclient&(3.1)\\
    133      *&guava&(14.0.1), (18.0)\\
    134      &commons-dbcp& (1.4) \\ \hline
    135    \end{tabular}
    136  \end{center}
    141  \begin{center}
    142    \fontsize{8}{10}\selectfont
    143    \tabcolsep=0.1cm
    144    \caption{New Release case studies from three popular libraries. For each library, we look at the LU trends of three libraries.
    145    }
    146    \label{tab:newVer}
    147    \begin{tabular}{lcccc}
    148      \hline
    149      \multirow{1}{*}{\textbf{Alias}}
    150      & \multirow{1}{*}{\textbf{Library}}
    151      & \multirow{1}{*}{\textbf{ver.1}}
    152      & \multirow{1}{*}{\textbf{ver.2}}
    153      & \multirow{1}{*}{\textbf{ver.3}}
    154      \\
    155      NR1&{google-guava}  &16.0.1 (2014-02-03) & 17.0 (2014-04-22) &18.0 (2014-08-25)\\
    156      NR2&{junit}  &3.8.1 (2002-08-24) & 4.10 (2011-09-29) & 4.11 (2012-11-15)\\
    157      NR3&{log4j}  &1.2.15 (2007-08-24) & 1.2.16 (2010-04-06) & 1.2.17 (2012-05-06) \\ \hline
    158    \end{tabular}
    159  \end{center}
    162\paragraph{\textbf{(3) Data Collection}:}
    163Since our research method to answer (RQ2) is through the use of case studies, we systematically select a subset of eligible projects from the dataset collected in (RQ1).
    164Selection of a new release candidate is comprised of three steps.
    165First, since our objective is to find common LU trends popular libraries, we select the top 20 library versions out of the 2,736 libraries.
    166The top 20 libraries are shown in Table \ref{tab:newCandid}.
    167Then, for each of the 20 library versions, we generate and categorize them based on LMP curve patterns.
    168Finally, we select three case studies that depict distinctive LU trends.
    169Table \ref{tab:newVer} shows the nine popular library versions of  \texttt{google-guava}\footnote{\url{}}, \texttt{junit}\footnote{{\url{}}} and \texttt{log4j}\footnote{{\url{}}} that meet our selection criteria.
    172  \begin{center}
    173    \fontsize{7}{12}\selectfont
    174    \tabcolsep=0.05cm
    175    \caption{Security Advisory case studies from the Apache Family of Maven libraries. Note that the affected versions include all prior versions. Likewise safe versions also include all superseding versions.
    176    }
    177    \label{tab:vulnerableDataset}
    178    \begin{tabular}{lccccc}
    179      \hline
    180      \multirow{1}{*}{\textbf{Alias}}
    181      & \multirow{1}{*}{\textbf{CVE Id}}
    182      & \multirow{1}{*}{\textbf{library}}
    183      & \multirow{1}{*}{\textbf{Release}}
    184      & \multirow{1}{*}{\textbf{Affected ver.}}
    185      & \multirow{1}{*}{\textbf{Attack(CVSS)}}
    186      \\
    187      %Time-period outliers& &(1274)& \\ \hline
    188      V1&CVE-2014-0114 & commons-beautils & 2014-04-30 & 1.9.1 & Denial of Service (7.5)\\
    189      V2&CVE-2014-0050 & commons-fileupload & 2014-01-04& 1.3 & man--in--the--middle(5.8)\\
    190      V3&CVE-2012-5783 &  commons-httpclient & 2012-04-11 &3.x & man--in--the--middle(4.3)\\
    191      V4&CVE-2012-6153 & httpcomponents & 2014-09-04& 4.2.2 & man--in--the--middle(7.5)\\
    192      V5&CVE-2012-2098 & commons-compress & 2012-06-29 &1.4 & man--in--the--middle(5.0)\\ \hline
    193    \end{tabular}
    194  \end{center}
    197Table \ref{tab:vulnerableDataset} shows the 5 security advisory case studies that meet our selection criteria.
    198As part of the selection criteria process, we manually inspect and match CVE security advisories between 2009-2014, that affected any of our collected systems in (RQ1).
    199Particularly, we select 123 products from the popular Apache Software Foundation (ASF) products, and associated with 686 disclosed security advisories\footnote{An updated listing is available online at \url{}}.
    200We find that 15 out of the 123 ASF products were third-party libraries.
    201We then select case studies that had severe risk of malicious exposure to attackers and would require immediate attention of the developer.
    202Specifically, the security advisory should have a medium to high Common Vulnerability Score (CVSS)\footnote{it is officially known as the CVSS v2 base score. The calculation is shown at \url{}} (\ie~4 or higher).
    203So out of the remaining 15 libraries, we select 5 security advisory cases with a CVSS base score of 4 or higher.
    204As shown in Table \ref{tab:vulnerableDataset} our selected case studies exhibit the following malicious exposures: \texttt{V1} causes a \textit{Denial of Service (DoS)} with a high CVSS score.
    205The remaining four security advisory cases all describe web application exposure to a remote \textit{‘man in the middle’} web attack, with a medium-to-high CVSS severity rating.
    207\subsection {(RQ3) \RqThree}
    208Our research method to answer the third research question (RQ3) is through a survey targeting developers that belong to projects that were identified as non responsive to a severe security advisory.
    209The method comprises of two steps: (1) survey design and (2) data collection.
    210Results to (RQ3) are presented in Section \ref{sec:barriers}.
    213\paragraph{\textbf{(1) Survey Design}:}
    214Our research method makes use of a qualitative survey interview form.
    215Listing \ref{list:email} shows the template of our survey form\footnote{the complete form is available at \url{}} sent to developers of the contactable projects.
    216Not all projects facilitate a contact medium, so we targeted projects that allowed public communication, either through an issue management system or a mailing list.
    217The survey form is designed with two parts.
    218First, we customize the survey form to include project specifics, such as the exact location of the \texttt{pom.xml} file where the dependency is being relied upon by the project.
    219We then ask developers to respond on the following two questions: (i)\textit{Were you aware of the vulnerability? If so, then how long ago} and (ii) \textit{What are some factors that influence you not to update?} %and (3) \textit{Does your project employ a update strategy, to check and update?}
    222\begin{lstlisting}[language= email,
    232caption={Email snippet of the survey form sent to developers of the selected projects that were non responsive to a security vulnerability.}, label=list:email,
    233tabsize=2, frame=lrtb]
    234<!---email snippet/>
    235Dear GitHub OSS Developer,
    237As a part of my study I particularity focused on the <library version/> and
    238the <CVE-xxx-xxxx/> <CVE URL/>, announced on <date>, which affects versions xxx.
    239We noticed that your project on GitHub is still configured to depend on a
    240vulnerable version of <library version/> at  <>
    241We understand that there are many reasons for not migrating, thus we appreciate
    242if you could simply detail the following:
    2431. Were you aware of the vulnerability? If so, then how long ago.
    2442. What are some factors that influence you not to update?
    246<!---email snippet/>\end{lstlisting}
    249For the analysis, we first tally responses according as to whether or not the developer was aware of the vulnerable threat.
    250%We then analyze the feedback.
    251Our strategy to analyze the feedback is through a systematic (i) reading of each response, (ii) checking and summarizing text by consistency and omissions and (iii) looking for similarities or differences between interviewee responses.
    252We perform the analysis in three steps.
    253First, the main author performs a categorization of responses.
    254Then, another author is tasked to verify and criticize each category of responses.
    255Finally, the categories are presented to rest of the authors for a group consensus.
    257\paragraph{\textbf{(2) Data Collection}:}
    258Since our approach to answer (RQ3) is through a survey, our data is from the security advisory case studies in (RQ2).
    259From the LMP analysis in (RQ2), we identified candidate projects that are non responsive to the security advisory announcement.
    260Since we collected 16 developer responses, categorization of the similarities and differences was manageable by one author and then later criticized and verified by other authors for the final consensus.
    261All results of the collected dataset, including the tally of listed and contactable projects are presented in Section \ref{sec:barriers}.
    263\section{Library Migration in Practice}
    266In this section, we present the results for (RQ1) \RqOne
    267In detail, we present the statistical results from both a system (Section \ref{sec:sys}) and library dimension (Section \ref{sec:lib}), before finally answering (RQ1).
    270  \centering
    271  \subfigure[ \# dependencies per system ]{\label{fig:sysLU}%
    272    \includegraphics[width=0.3\columnwidth]{sysDep}
    273  }
    274  \subfigure[\# \dmc~per system]{\label{fig:fLU}
    275    \includegraphics[width=0.3\columnwidth]{updateFreq}
    276  }
    277  \subfigure[\# dependencies vs. \# \dmc]{\label{fig:freqvsDeps}
    278    \includegraphics[width=0.3\columnwidth]{freqvsDeps}
    279  }
    280  \caption{Updates from a System dimension depicts (a) \# of dependencies per system. ($\bar{x}$=147, $\mu$=267.2, $\sigma$=311.56) (b) frequency of \dmc s per system ($\bar{x}$=1, $\mu$=2.4, $\sigma$=4.2) and (c) relationship between \# of dependencies vs. \# of \dmc s (log-scale). }
    281  \label{fig:sysStats}
    284\subsection{System Dimension}
    287Figure \ref{fig:sysStats} shows the results on how maintainers manage and update their dependencies from a system viewpoint.
    288Specifically, the distribution of library dependencies per system in Figure \ref{fig:sysLU} confirms that systems show heavy dependence on libraries ($\bar{x}$=147, $\mu$=267.2, $\sigma$=311.56).
    289A reason for this heavy reliance on libraries is because many of the analyzed projects are comprised of multiple subsystems which form a complex set of dependencies.
    290Furthermore, Figure \ref{fig:fLU} suggests that systems rarely update library dependencies, with a low frequency of \dmcfull s per system (i.e., $\bar{x}$=1, $\mu$=2.4), with each \dmc~containing at least two library dependencies (i.e., $\bar{x}$=2, $\mu$=4.1, $\sigma$=14.9).
    291Finally, according to Figure \ref{fig:freqvsDeps} visually, we did not find a strong correlation between the number of library dependencies and the frequency of \dmc, with statistical tests reporting weak correlations (pearson = 0.05, spearman = 0.07). This result confirms the hypothesis that the \textit{number of library dependencies in a system does not influence the frequency of updates}.
    294  \centering
    295  \subfigure[LU Distributions per dependency]{\label{fig:PLU}%
    296    \includegraphics[width=0.5\columnwidth]{cfdLU}
    297  }
    298  \subfigure[Time-frame analysis]{\label{fig:TP}
    299    \includegraphics[width=0.5\columnwidth]{time-frame}
    300  }
    301  \subfigure[Library Residue per dependency]{\label{fig:LR}
    302    \includegraphics[width=0.35\columnwidth]{RQ1UsageResidue}
    303  }
    304  \caption{Updates from a Library dimension depicts the cumulative frequency distribution (a) of Peak LU and Current LU  (Log scale), (b) time-frame metric distributions and the boxplot of (c) library residue (\%) for 2,736 dependencies.}
    305  \label{fig:RQ1}
    308%From the library perspective, we are particularly interested in the LU trends of Maven libraries. Table \ref{fig:LUmetrics} also describes the metrics from a library perspective. Concretely, we use the library metrics to study (i) the trend of library usage (\ie~whether a library is gaining or losing users) and (ii) how much of the library users remain after a library has fallen in decline. Henceforth, based on the LU (\texttt{m3}) metric introduced in our model, Figure \ref{fig:LUmetrics} describe the different measures of the trend:
    311  \centering
    312  \includegraphics[width=.5\textwidth]{residueVsPLU}
    313  \caption{A correlation of Library Residue against Peak LU, showing that popular library dependencies (with higher peaks) also tend to exhibit higher Library Residue. }
    314  \label{fig:residueVsPLU}
    317\subsection{Library Perspective}
    319Figure \ref{fig:RQ1} and Figure \ref{fig:residueVsPLU} presents LU trends of library dependencies used by our studied systems.
    320Figure \ref{fig:PLU} shows that LU for 75\% of the popular libraries is 12 (\ie~peak LU).
    321Interestingly, we also found 596 libraries that exhibited no library migration, such that peak library usage is the current library usage (i.e., peak LU=current LU).
    322Additionally, Figure \ref{fig:TP} shows that reaching the peak library usage is slow for most dependencies.
    323Concretely, the figure shows that 25\% of dependencies took less than a day to reach their peak LU.
    324Afterwards the rate slows down (depicted by curve), showing 75\% of dependencies took less than 770 days to reach their peak LU (i.e., Pre-Peak).
    325Upon closer inspection, we found that these dependencies were specialized libraries that were used by a smaller number of systems (\ie~low LU).
    326After reaching peak usage, dependent systems tend to slowly migrate away, shown in Figure, with 75\% of dependencies experiencing some migration of its users over the next 450 days (ie., Post-Peak).
    327Importantly, Figure \ref{fig:LR} suggests that many systems remain with an outdated dependency, even after some library migration away from the dependency has begun.
    328The figure shows that most of the 2,736 studied dependencies exhibit high library residue (i.e., $\bar{x}$=85.7\%, $\mu$=81.5\%, $\sigma$=22.2\%).
    329An example is the popular \texttt{log4j} logging library \lib{log4j}{1.2.15}, which is an older library, but has a library residue of 98\%.
    330Finally, Figure \ref{fig:residueVsPLU} shows that the system are more likely to remain with the more popular libraries, with higher peaking libraries exhibiting more library residue.
    331Returning to (RQ1):
    334  We conducted an empirical study to understand the extent to which (i) systems use and manage their library dependencies and (ii) library usage trends. To answer (RQ1): \textit{ (i) although system heavily depend on libraries, most systems rarely update their libraries and (ii) systems are less likely migrate their library dependencies, with 81.5\% of systems remaining with a popular older version.}
    338\section{Developer Responsiveness to Awareness Mechanisms}
    341%(RQ2) \RqTwo \small
    347  \centering
    348  \subfigure[LMP for consecutive releases of the \texttt{google-guava} (NR1) library]{\label{fig:guava}
    349    \includegraphics[width=1\columnwidth]{guava-cropped}
    350  }
    351  \subfigure[LMP for consecutive releases of the \texttt{junit} (NR2) library]{\label{fig:junit}
    352    \includegraphics[width=1\columnwidth]{junit-cropped}
    353  }
    354  \subfigure[LMP for consecutive releases of the \texttt{log4j} (NR3) library]{\label{fig:log4j}
    355    \includegraphics[width=.9\columnwidth]{log4j-cropped}
    356  }
    357  \caption{Library Migration Plots (LMP) of three libraries depicting successive library version releases without vulnerability alerts.}
    358  \label{fig:newVersions}
    362  \begin{center}
    363    \fontsize{7}{10}\selectfont
    364    \tabcolsep=0.1cm
    365    \caption{Alias names for our (RQ2) selected case studies.
    366    }
    367    \label{tab:vulnerableDataset2}
    368    \begin{tabular}{lccrcc}
    369      \hline
    370      \multirow{1}{*}{\textbf{Alias}}
    371      & \multirow{1}{*}{\textbf{Awareness Mechanism}}
    372      & \multirow{1}{*}{\textbf{Library}}
    373      & \multirow{1}{*}{\textbf{Analyzed versions}}
    374      \\
    375      NR1&New Release&{google-guava} &(16.0.1), (17.0), (18.0) \\
    376      NR2&&{junit}& (3.8.1), (4.10), (4.11)\\
    377      NR3&&{log4j}& (1.2.15), (1.2.16), (1.2.17)  \\ \hline
    378      V1&Security Advisory & commons-beautils & (1.9.1), (1.9.2) \\
    379      V2& & commons-fileupload & (1.2.2), (1.3), (1.3.1) \\
    380      V3& & commons-httpclient & (3.1), (4.2.2) \\
    381      V4& & httpcomponents & (4.2.2), (4.2.3), (4.2.5)\\
    382      V5& & commons-compress &(1.4), (1.4.1) \\ \hline
    383    \end{tabular}
    384  \end{center}
    387In this section, we present the results for (RQ2) and (RQ3).
    388To answer (RQ2), \RqTwo, we address in Section \ref{sec:nr} and Section \ref{sec:ad}.
    389To answer (RQ3), \RqThree, we address in Section \ref{sec:barriers}.
    390Table \ref{tab:vulnerableDataset2} shows the aliases (\ie~NR1, ..., NR3, V1, ..., V5) used as a reference to each of the case studies.
    392\subsection{A New Release Announcement}
    394Figure \ref{fig:newVersions} depicts our case studies (NR1, NR2, NR3) related to responsiveness of a new release, with (A) consistent and (B) non responsive library migration trends.
    396\paragraph{(A) Cases of an Active Developer Response to a New Release.}
    397Figure \ref{fig:guava} shows an example of library that have a consistent library migration trend.
    398Concretely, the LMP of the \texttt{google-guava} (NR1) \lib{NR1}{16.0.1} and \lib{NR1}{17.0} depicts a consistent pattern of migration with 48 and 49 peak LU.
    399This pattern is consistent, despite the libraries having a relatively high library residue of 60.4\% and 85\% for all studied versions.
    402We find that the reasons for consistent migration trends are mainly related to the estimated migration effort required to complete the migration process.
    403Through inspection of the online documentation, we find that migration from \lib{NR1}{16.0.1} to \lib{NR1}{17.0} contains 10 changed packages\footnote{details at \url{}}.
    404Similarly, migration from  \lib{NR1}{17.0} to \lib{NR1}{18.0} also contained 7 changed packages.
    405Yet, all three library versions require the same Java 5 environment which indicates no significant changes to the overall architectural design of the library.
    406From the documentation, we deduce that popular use of \lib{NR1}{18.0} is due to the prolonged period between the next release of \lib{NR1}{19.0}, which is more that a year  after the release of \lib{NR1}{18.0} in December 10, 2015.
    407In fact, previous versions had shorter release times, around 2-3 months of \lib{NR1}{16.0.1} in February 03 2014, \lib{NR1}{17.0} in April 22 2014, and \lib{NR1}{18.0} in August 25 2014.
    408The prolonged released cycles of the library could be related to the relatively higher peak LU of \lib{NR1}{18.0} at 100 LU compared to the lower peaks LU of \lib{NR1}{16.0.1} at 48 LU and 49 LU for the \lib{NR1}{17.0} dependency. }
    411% Some user systems have a consistent migration pattern effect to newer version releases. The LMP for the \texttt{google-guava} depicts this migration pattern.
    415\paragraph{(B) Cases of a Developer Non Response to a New Release.}
    416Figure \ref{fig:junit} depicts a developer ‘no response’ reaction to a dependency migration opportunity.
    417The LMP curve from figure depicts the older popular versions as exhibiting no migration movement (\ie~peak LU= current LU).
    418Specifically for the \texttt{junit} (NR2) library, the dependency \lib{NR2}{3.8.1} does not follow the typical migration pattern of the \lib{NR2}{4.10} and \lib{NR2}{4.11} dependencies.
    420Similar to the consistent migration to a new release, we find that the reason for a non response to a migration opportunity is related to the estimated migration effort.
    421For instance, as shown in Figure \ref{fig:junit}, the newer \texttt{Junit} version 4 series libraries requires a change of platform to Java 5 or higher (\lib{NR2}{4.10} and \lib{NR2}{4.11}), inferring significant changes to the architectural design of the library.
    422Intuitively, we see that even though \lib{NR2}{3.8.1} is older, it still maintains its maximum library usage (i.e., current LU and peak LU=342).
    423This LMP curve pattern is also apparent in the \texttt{log4j} (NR3) library shown in Figure \ref{fig:log4j}, with the \lib{NR3}{1.2.15} dependency being older, but still active library version (\ie~with over 100 current LU).
    424We visually observe that as \lib{NR3}{1.2.17} dependency reaches its peak LU the \lib{NR3}{1.2.16} dependency remains more popular, with a higher LU than superseding library release.
    425This result complements the findings in (RQ1) that popular library dependencies tend to retain most of their users, even if a possible migration to a new release opportunity is available.
    429% There exists older but still popular libraries such as \lib{junit}{3.8.1} (i.e., current usage is also the peak library usage) that does not follow a consistent LMP migration pattern.
    434%The results suggest that sometimes older libraries are favored over newer versions.
    435%Reasons may include social aspects, such as familiarity and popularity and could be the reason why developers do not update.
    436%Additionally, the decision may be influenced by co-dependencies in the software ecosystem.
    437%For instance, compatibility of the newer library with other co-dependencies or even build platforms.
    438%In this study, we notice how library migration can be influenced by the build environment (Java 4 vs. Java 5).