Astor: Exploring the Design Space of Generate-and-Validate Program Repair beyond GenProg

Astor: Exploring the Design Space of Generate-and-Validate Program Repair beyond GenProg

Matias Martinez Martin Monperrus University of Valenciennes, France KTH Royal Institute of Technology, Sweden
Abstract

During last years, researches have proposed novel repair approaches that automatically generate patches for repairing software bugs. Repair approaches can be loosely characterized along the main design philosophy such generate- and-validate or synthesis-based. Each of those repair approaches is a point in the design space of program repair. Our goal is to facilitate the design, development and evaluation of repair approaches by providing a framework that: a) contains components commonly present in approaches implementations thus new approaches can be built over them, b) provides built-in implementations of existing repair approach. This paper presents a framework named Astor that encores the design space of generate-and-validate repair approaches. Astor provides extension points that form the explicit decision space of program repair. Over those extension points, researchers can reuse existing components or implements new ones. Astor includes 6 Java implementation of repair approaches, including one of the pioneer: GenProg. Researcher have been already defining new approaches over Astor, proposing improvements of those built-in approaches by using the extension points, and executing approaches implementations from Astor in their evaluations. The implementations of the repair approaches built over Astor are capable of repair, in total, 98 real bugs from 5 large Java programs.

keywords:
Software Maintenance, Automated Program Repair, Software Testing, Evaluation Frameworks, Software Bugs, Defects, Generation of Patches

1 Introduction

Automated software repair has emerged during the last decade for repairing real bugs of software application. The main goal is to reduce costs and time of software maintenance by proposing to developers automatically synthesized patches that solve bugs present in their applications. Pioneers repair systems are GenProg Weimer2009 (), Semfix Nguyen:2013:SPR (), Prophet prophet (), Nopol nopol (), among many others Kim2013 (); Xiong2017 (); Mechtaev2016 (); Le2017JSR (); Le2017SSS (); Ke2015RPS (); prophet (); Long2017AIC (); Perkins2009 (); SANER2017 (); Durieux2016DDC (); weimer2013AE (); directfix (); qi2014strength (); spr (). Automation of bug fixing is possible by using automated correctness oracles. For instance, GenProg Weimer2009 () introduced the use of test suite as correctness oracle: correctness of a program is determined if it passes all tests from its associated test suite.

Program repair systems can be loosely characterized along the main design philosophy such as being generate-and-validate (aka random search) or synthesis based. For example, GenProg and JAFF ArcuriEvolutionary () are based on genetic programming. More generally, every repair system is a point in the design space of program repair. By making design decisions explicit in that design space, one can start to have a fine-grain understanding of the core conceptual differences. For example, the main conceptual difference between GenProg and PAR Kim2013 () lies in the repair operators: they do not use the same code transformation for synthesizing patches.

To foster research on program repair, we aim at providing the research community with a generic framework that encodes the design space of generate-and-validate repair approaches. We envision a framework that allows researchers to easily implement new repair approaches.

In this paper, our main contribution is Astor (Automatic Software Trans-formations fOr program Repair). Astor is repair framework for Java, it provides 6 publicly available generate and validate repair approaches: jGenProg (a Java implementation of GenProg, originally on C), jKali (an implementation of Kali Qi2015 (), originally on C), jMutation (an implementation of MutRepair debroy2010using (), not publicly available), DeepRepair (an extension of jGenProg by white2017dl ()), Cardumen CardumenArxiv (), and TIBRA (an extension of jGenProg introduced in this paper). Those repair approaches are based on twelve extension points that form the first ever explicit design space of program repair. Over those twelve extension points, the program repair researcher can either choose an existing component (among 33 ones), or can implement a new technique.

Astor has been extensively used by the research community for extending or incorporating new functionality in a novel repair system based on Astor Tanikado2017NewStrategies (); white2017dl (); Zu2017Test4Repair (); gpfl2017 (), as a publicly available implementation of existing approaches in the context of comparative evaluations xin2017leveraging () for reusing numerical results and/or patches obtained with Astor Xiong2017 (); le2016history (); Saha:2017:EEO (); Arja1712.07804 ().

Astor is publicly available on Github and actively maintained. A user community is able to provide support. Bug fixes and extensions are welcome as external contributions (pull requests). From an open-science perspective, since the whole code base is public, peer researchers can validate the correctness of the implementation and hence minimize threats to internal validity.

To sum up, our contributions are:

  • The explicit design space of generate-and-validate program repair.

  • The realization of that design space in Astor, where the most important design decisions of program repair are encoded as extension point.

  • Twelve extension points, for which the program repair researcher can either choose an existing component (i.e., taking one already implemented design decision in the design space), or can implement a new technique.

  • Six repair approaches that can be used out-of-the-box in comparative evaluations, incl. jGenProg, the most used Java implementation of GenProg according to citation impact Weimer2009 (); LeGoues2012TSEGP (); LeGoues2012 (); Weimer2010 (); forrest2009genetic ().

  • The study of the impact of those extension points on repairability and efficiency based on the Defects4J bug benchmark.

This paper in a mostly rewritten extension of a previous short paper astor2016 (). It includes a detailed explanation of Astor’s architecture, extension points as well as a large evaluation. The paper continues as follows. Section 2 describes the design of Astor, Section 3 presents the extension points provided by Astor. Section 4 presents the built-in approaches included in Astor. Section 5 presents a evaluation of the built-in approaches and different implementations for the extension points. Section 6 presents the related work. Finally, section 7 concludes the paper.

2 Architecture

2.1 The Design of Astor

Astor is a framework that allows researchers and developers to implement automated program repair approaches, or to use and extend built-in repair approaches such as jGenProg defects4j-repair () (implementation of GenProg Weimer2009 ()), jKali astor2016 () (implementation of Kali Qi:2015:APP:2771783.2771791 ()), jMutation astor2016 () (implementation from MutRepair debroy2010using ()), DeepRepair white2017dl (), Cardumen CardumenArxiv ().

Astor encodes the design space of generate-and-validate repair approaches, which first search within a search space to generate a set of patches, and then validate using a correctness oracle. Astor provides twelve extension points that form the design space of a program repair. New approaches can be implemented by choosing an existing component for each extension point, or to implement new ones.

The extension points allow Astor users to define the design of a repair approach. Main design decisions are: a) software transformations (aka repair operators) used to define the solution search space; b) different strategies for navigating the search space of candidate solutions; and c) mechanism for validating a candidate solution.

Astor was originally conceived for building test-suite based repair approaches Weimer2009 () and the first implemented approach over it was named jGenProg, a Java implementation of GenProg Weimer2009 (), originally written in OCaml language for repairing C code. In test-suite based repair, test suites are considered as a proxy to the program specification, and a program is considered as fulfilling its specification if its test suite passes all the these cases otherwise, the program has a defect. The test suite is used as a bug oracle, i.e., it asserts the presence of the bug, and as correctness oracle.

An approach over Astor requires as input a buggy program to be repaired and a correctness oracle such as a test suite. As output, the approach generates, when it is possible, one or more patches that are valid according to the correctness oracle.

0:  Program under repair
0:  test suite
0:  A list of test-suite adequate patches
1:  suspicious run-fault-localization(P, TS) //EP_FL
2:  mpl create-modification-points(suspicious) //EP_MPG
3:  ops get-operators() //EP_OD
4:  tsa-patches-refined
5:  while continue-searching() do
6:     tsa-patches tsa-patches + navigation-search-space(P, mpl, ops, TS) //EP_NS
7:  end while
8:  tsa-patches-refined refining-patches(tsa-patches) //EP_SP
9:  return  tsa-patches-refined
Algorithm 1 Main steps done by Repair Approaches over Astor (in comments, the invocations to extension points)

Algorithm 1 displays the high-level steps executed, in sequence, by a generate-and-validate repair approach built over Astor. They are: 1) Fault localization (line 1), 2) Creation of a representation (line 2), 3) Navigation of solution (line 6), and 4) Solution post-processing (line 8). In the remain of this section, we describe each step.

2.2 Fault Localization

The fault localization step is the first step executed by an approach built over Astor. It aims at determining what is wrong in the program received as input. This step is executed on line 1 of Algorithm 1. Fault localization consists of computing locations that are suspicious to contain the bug to repair (which is exposed by at least one failing test case). In the context of repair, fault localization allows to reduce the search space by discarding those code locations that are probably healthy. As a repair approach can use the suspiciousness values of locations to guide the search in the solution space, the fault localization has an impact on the effectiveness of the repair approach mao2012FL (). Test-suite based repair approaches from Astor use fault localization techniques based on spectrum analysis. Those techniques execute the test cases of a buggy program and trace the execution of software components (e.g., methods, lines). Then, from the collected traces and the tests results (i.e., fail or pass), the techniques use formulas to calculate the suspicious value of each component. The suspicious value goes for 0 (lowest probability that the component contains a bug) to 1 (highest). Repair approaches use different formulas, for instance, GenProg uses an ad-hoc formula Weimer2009 (), while MutRepair debroy2010using () uses the Tarantula formula Jones2002 ().

Astor provides fault localization as an extension point named EP_FL, where researchers can plug implementations of any fault localization technique. Astor provides a component (used by default) that implement that point and uses the fault localization library named GZoltar gzoltar2012 () and the Ochiai formula abreu2006evaluation ().111http://www.gzoltar.com/

2.3 Identification of Modification Points

Once the fault localization step returns a list of suspicious code locations (such as statements), Astor proceeds to create a representation of the program under repair.

Definition 1: A Modification point is a code element (e.g., a statement, an expression) from the buggy program under repair that can be modified with the goal of repairing the bug.

Astor creates modification points from the suspicious statements returned by the fault localization approach (section 2.2). This step is executed in line 2 of Algorithm 1. Astor provides an extension point named EP_MPG (section 3.2) to define the granularity of each modification point according to that one targeted by a repair approach built over Astor. For instance, jGenProg creates one modification point per each statement indicated as suspicious by the fault localization. Cardumen, another approach built over Astor, works at a fine-grained level: it creates a modification point for each expression contained in a suspicious statement. Other approaches focus on particular code elements, such as jMutation which creates modification points only for expressions with unary and binary operators. For example, let us imagine that the fault localization marks as suspicious the two lines presented in Listing 1.

 9      ….
10      myAccount = getAccount(name);
11      myAccount.setBalance(previousMonth + currentMonth);
12      ….
Listing 1: Two suspicious statements

jGenProg creates two modification points, both pointing to statements, one to the assignment at line 10, another to the method invocation at line 11. Contrary, Cardumen creates 3 modifications points: one pointing to the expression at the right size of the assignment at line 10, a second one to the method invocation at line 11 (note that the method invocation it is also an expression), and the last one pointing to the expression (previousMonth + currentMonth) which is the parameter of the method invocation at line 11.

0:  Program under repair
0:  List of modification points
0:  List of operator
0:  Test suite
0:  A list of patches
1:  mps choose-modification-points(MPs) //EP_MPS
2:  transformations
3:  for all mp-i mps  do
4:     ops choose-Operators(mp-i, OS) //EP_OS
5:     transformations transformations (mp-i, op-j)
6:  end for
7:  program-variants apply-transformations(transformations, P)
8:  for all pv-i program-variants do
9:     patch-i synthesize-patch-from-variant(P, pv-i)
10:     validation-result validate(TS, P, patch-i, pv-i) //EP_PV
11:     if is-valid(validation-result) //EP_FF then
12:        tsa-patches tsa-patches pc-i
13:     end if
14:  end for
15:  return  tsa-patches
Algorithm 2 Inspection of Candidate Solutions from the Search Space (in comments, the invocations to extension points)

2.4 Creation of the repair operator space

Astor synthesizes patches by applying code changes over modification points. Those changes are done by repair operators and the set of all repair operators that an approach considers during the repair conform the repair operator space.

Definition 2: a Repair operator is an action that transforms a code source (associated to a modification point) into another code source.

Astor provides an extension point named EP_OD (section 3.5) for specifying the operator space that a repair approach will use. The extension point is invoked at line 3 of Algorithm 1. Astor works with two kinds of repair operators:

Synthesis based on autonomous repair operators

An approach synthesizes new code by directly applying one of such autonomous operators into a modification point, without the need of any extra information. One of them is the repair operator from jMutation which changes a logical operator from > to >=. For example, it generates the new code (fa * fb) >= 0.0) from the code (fa * fb) > 0.0).

Synthesis based on ingredients

There are operators that need some extra information before applying a code transformation in a modification point . For instance, two operators (Insert and Replace) from GenProg Weimer2009 () need one statement (aka the ingredient) taken from somewhere in the application under repair. The ingredient is later inserted before or replace the code at . Such approaches are known as Ingredient-based repair approaches Weimer2009 (). Astor gives support to repair approaches to define operators that needs ingredients and provides different strategies for selecting ingredients. jGenProg and DeepRepair white2017dl () are two ingredient-based approach built over Astor.

2.5 Navigation of the search space

Once that an approach built over Astor identifies the modification points (i.e., places where code transformations can be applied to repair the bug), it proceeds with the navigation of the search space. The goal is to find, between all possible modified versions of the buggy program, one or more versions that do not contain the bug under repair.

The navigation of the search space has a main loop (Algorithm 1 line 5) that allows to visit in each iteration a element of the search space, i.e., a modified version of the buggy program (Algorithm 1 line 6). On each iteration, the approach verifies whether a set of code changes done by repair operators over the modification points (which produce a modified version of the program) repair the bug. A modified version of a buggy program is represented in Astor by a Program variant.

Definition 3: a Program variant is a entity that stores: a) the repair operators applied to each modification point; b) the code source resulting from the execution of repair operators over the associated modification points.

Then, Astor computes a candidate patch from a program variant.

Definition 5: A Patch produced by Astor is a set of changes between the buggy version and a modified version represented by a program variant.

Algorithm 2 shows the main steps that Astor executes for creating a program variant. Let us analyze each of them.

2.5.1 Selection of modification points

First, a repair approach built over Astor chooses, according to a selection strategy, the modifications points (at least one) to apply repair operators (Algorithm 2 line 1). Astor provides an extension point named EP_MPS (section 3.4) for specifying customized strategies of modification points selection.

2.5.2 Selection of repair operators

For each selected modification point , an approach selects one repair operator to apply at (Algorithm 2 line 4) and adds the tuple (,) to the set of code transformations (line 5). Astor provides an extension point named EP_OS (section 3.6) for specifying customized strategies of operator selection.

2.5.3 Creation of program variants

A repair approach over Astor generates a program variant by applying all selected operators in the corresponding selection points. This step produces the synthesis of candidate patches (Algorithm 2 line 7).

As an ingredient-based repair operator needs ingredients (i.e., statements) for synthesizing patches (section 2.4), Astor gives them support to carry out the patch synthesis. Astor first creates an ingredient pool from the program under repair. Astor provides an extension point named EP_IPD (section 3.7) for plugging a customize strategy of ingredient pool construction. Then, an ingredient-based approach queries the ingredient pool when a repair operator needs ingredients for synthesizing the candidate patch code. Astor provides an extension point named EP_IS (section 3.8) for plugging a customize strategy of ingredient selection.

Moreover, when an operator gets an ingredient for the ingredient pool, it can uses as it is (i.e., without applying any transformation) or to apply a transformation over it. For instance, a transformation proposed by Astor is to replace variables from the ingredient that are not in-the-scope in the location pointed by the modification point affected by the operator. Astor provides an extension point named EP_IT (section 3.9) for plugging a customize strategy of ingredients transformation.

2.5.4 Candidate patch validation

Once program variants are created, a repair approach synthesizes from each variant a patch code (Algorithm 2 line 9), then applies it to the buggy version of the program under repair and finally evaluates the modified version (line 10) using the correctness oracle. If the patched version is valid, the corresponding patch is a solution and it is stored (line 12).

Astor provides an extension point named EP_PV (section 3.10) for specifying the validation process to be used by the repair approach. Built-in repair approaches over Astor use test-suite as specification of the program Weimer2009 () and as correctness oracle. No failing fest cases means the program is correct according to the specification encoded on the test suite. To validate candidate patches, Astor runs the test suite on the patched version of the buggy program.

Moreover, Astor defines an extension point named EP_FF (section 3.11) to specify the Fitness Function that evaluates the patch using the output from the validation process (Algorithm 2 line 11). The result of this function is used to determine if a patch is a solution (i.e., repair the bug) or not. By default, the fitness function on Astor counts the number of failing test cases. No failing test case means the patch is a solution and is known as test-suite adequate patch.

2.6 Evaluation of conditions for ending navigation

An approach over Astor finishes the search of patches, i.e., stops the loop at line 5 from Algorithm 1, when any of these conditions is fulfilled: a) finding valid repairs, b) iterating times, c) executing during hours (timeout).

2.7 Solution post-processing

After finishing navigating the search space, Astor provides an extension point named EP_SP (section 3.12) for processing the patches found, if any. We envision two kinds of post-processing. First, the post-processing of each patch found for applying, for instance, patch minimization o code formatting. Second, the post-processing of the list of patches for sorting patches according to a given criterion.

3 Extension points provided by Astor

Extension point Component Explanation
Fault GZoltar Use of third-party library GZoltar
localization (EP_FL) CoCoSpoon Use of third-party library CoCoSpoon
Granularity modification points (EP_MPG) Statements Each modification point corresponds to a statement and repair operators generate code at the level of statements
Expression Each modification point corresponds to an expression and repair operators generate code at the level of expressions
logical-relational-operators Modification points target to binary expression whose operators are logical (AND, OR) or relational (e.g., ¿ ==)
if conditions Modification points target to the expression inside if conditions
Navigation Exhaustive Complete navigation of the search space
strategy (EP_NS) Selective Partial navigation of search space guided, by default, by random steps
Evolutionary Navigation of the search space using genetic algorithm
Selection of suspicious Uniform-random Every modification point has the same probability to be changed by an operator
modification points (EP_MPS) Weighted-random The probability of changed of a modification point depends on the suspiciousness of the pointed code
Sequential Modification points are changes according to the suspiciousness value, in decreasing order
Operator space definition (EP_OD) IRR-statements Insertion, Removement and Replacement of statements
Relational-Logical-op Change of unary operators, and logical and relational binary operators
Suppression Suppression of statement, Change of if conditions by True or False value, insertion of remove statement
R-expression replacement of expression by another expression
Selection of operator (EP_OS) Uniform-Random Every repair operator has the same probability of be chosen to modify a modification point
Weighted-Random Selection of operator based on non-uniform probability distribution over the repair operators.
Ingredient pool definition (EP_IPD) File Pool with ingredients written in the same file where the patch is applied.
Package Pool with ingredients written in the same package where the patch is applied.
Global Pool with all ingredients from the application under repair.
Selection of Uniform-random Ingredient randomly chosen from the ingredient pool
ingredients (EP_IS) Code-similarity-based Ingredient chosen from similar method to that where the candidate patch is written
Name-probability-based Ingredient chosen based on the frequency of its variable’s names
Ingredient No-Transformation Ingredients are not transformed
transformation (EP_IT) Random-variable-replacement Out-of-scope variables from an ingredients are replaced by randomly chosen in-scope variables
Name-cluster-based Out-of-scope variables from an ingredients are replaced by similar named in-scope variables
Name-probability-based Out-of-scope variables from an ingredients are replaced by in-scope variable based on the frequency of variable’s names
Candidate patch Validation (EP_PV) Test-suite Original test-suite used for validating a candidate patch
Augmented-test-suite New test cases are generated for augmented the original test suite used for validation
Fitness function for evaluation (EP_FF) Number-failing-tests The fitness is the number of failing test cases. Lower is better. Zero means the patch is a test-suite adequate patch
Solution prioritization (EP_SP) Chronological Generated valid patches are printing Chronological order, according with the time they were discovered
Less-Regression Patches are presented according to the number of failing cases from those generated test cases, in ascending order
Table 1: Summary of extension points and components already implemented in Astor.

In section 2 we presented the main design of generate-and-validate repair approach built over Astor. In this section we detail the main extension points that are provided by Astor framework for creating new repair approaches or customizing those already included in the framework. For each extension point we list the name and description of the components already implemented and included in the framework. Table 1 displays the extension points and the components already implemented in Astor for each extension point.


3.1 Fault Localization (EP_FL)

3.1.1 Implemented components

  • GZoltar: use of third-party library GZoltar.

  • CoCoSpoon: use of third-party library CoCoSpoon.

  • Custom: name of class that implements interface FaultLocalizationStrategy.

This extension point allows to specify the fault localization algorithm that Astor executes (at Algorithm 1 line 1) to obtain the buggy suspicious locations as explained in section 2.2. The extension point takes as input the program under repair and the test suite, and produces as output a list of program locations, each one with a suspicious value. The suspicious value associated to location goes between 0 (very low probability that is buggy) and 1 (very high probability that is buggy). New fault localization techniques such that PRFL presented by Zhang et al. Zhang2017BSF () can be implemented in this extension point.


3.2 Granularity of Modification points (EP_MPG)

3.2.1 Implemented components

  • Statements: each modification point corresponds to a statement. Repair operators generate code at the level of statements.

  • Expressions: each modification point corresponds to an expression. Repair operators generate code at the level of expressions.

  • Logical-relational-operators: Modification points target to binary expression whose operators are logical (AND, OR) or relational (e.g., ).

  • Custom: name of class that implements interface TargetElementProcessor.

3.2.2 Description

The extension point EP_MPG allows to specify the granularity of code that is manipulated by a repair approach over Astor. The granularity impacts two components of Astor. First, it impacts the program representation: Astor creates modifications points only for suspicious code elements of a given granularity (Algorithm 1 line 2). Second, it impacts the repair operator space: a repair operator takes as input code of a given granularity and generates a modified version of that piece of code. For example, the approach jGenProg manipulates statements, i.e., the modification points refer to statements and it has 3 repair operators: add, remove and replace of statements. Differently, jMutation manipulates binary and unary expressions using repair operators that change binary and unary operators.


3.3 Navigation Strategy (EP_NS)

3.3.1 Implemented components

  • Exhaustive: complete navigation of the search space.

  • Selective: partial navigation of search space guided, by default, by random steps.

  • Evolutionary: navigation of the search space using genetic algorithm.

  • Custom: name of class that extends class AstorCoreEngine.

3.3.2 Description

The extension point EP_NS allows to define a strategy for navigating the search space. Algorithm 2 from section 2.5 displays a general navigation strategy, where most of its steps are calls to other extension points. Astor provides three navigation strategies: exhaustive, selective and evolutionary.

Exhaustive navigation

This strategy exhaustively navigates the search space, that is, all the candidate solutions are considered and validated. An approach that carries out an exhaustive search visits every modification point and applies to it every repair operator from the repair operator space. For each combination and , the approach generates zero or more candidates patches. Then, for each synthesized the approach applies it into the program under repair and then executes the validation process as explained in section 2.5.4.

Selective navigation

The selective navigation visits a portion of the search space. This strategy is necessary when the search space is too large to be exhaustively navigated. On each step of the navigation, it uses two strategies for determining where to modify (i.e., modification points) and how (i.e., repair operators). By default, the selective navigation uses weighted random for selecting modification points, where the weight is the suspiciousness value, and uniform random for selecting operators. Those strategies can be customized using extension points EP_MPS (section 3.4) and EP_OS (section 3.6), respectively.

Evolutionary navigation

Astor frameworks also provides the Genetic Programming Koza:1992:GPP () technique for navigating the solution search space. This technique was introduced in the domain of the automatic program repair by JAFF ArcuriEvolutionary () and GenProg Weimer2009 (). The idea is to evolve a buggy program by applying repair operators to arrive to a modified version that does not have the bug. In Astor, it is implemented as follows: one considers an initial population of size of program variants and one evolves them across generations. On each generation , Astor first creates, for each program variant , an offspring (i.e., a new program variant) and applies, with a given probability, repair operators to one or more modification points from . Then, it applies, with a given probability, the crossover operator between two program variants which involves to exchange one or more modification points. Astor finally evaluates each variant (i.e., the patch synthesized from the different operators applied) and then chooses the variants with best fitness values (section 2.5.4) to be part of the next generation.


3.4 Selection of suspicious modification points (EP_MPS)

3.4.1 Implemented components

  • Uniform-random: every modification point has the same probability to be selected and later changed by an operator.

  • Weighted-random: the probability of changed of a modification point depends on the suspiciousness of the pointed code.

  • Sequential: modification points are changes according to the suspiciousness value, in decreasing order.

  • Custom: name of class that extends class SuspiciousNavigationStrategy.

The extension point EP_MPS allows to specify the strategy to navigate the search space of suspicious components represented by modification points. This extension point is invoked in every iteration of the navigation loop (Algorithm 1 line 5): the strategy selects the modification points where the repair algorithm will apply repair operators (Algorithm 2 line 1).


3.5 Operator spaces definition (EP_OD)

3.5.1 Implemented components

  • IRR-Statements: insertion, removement and replacement of statements.

  • Relational-Logical-operators: change of unary operators, and logical and relational binary operators.

  • Suppression: suppression of statement, change of if conditions by True or False value, insertion of remove statement.

  • R-expression: replacement of expression by another expression.

  • Custom: name of class that extends class OperatorSpace.

3.5.2 Description

After a modification point is selected, Astor selects a repair operator from the repair operator space to apply into that point. Astor provides the extension point EP_OD for specifying the repair operator space used by a repair approach built on Astor. The extension point is invoked at line 3 of Algorithm 1. The operators space configuration depends on the repair strategy. For example, jGenProg has 3 operators (insert, replace and remove statement) whereas Cardumen has one (replace expression).


3.6 Selection of repair operator (EP_OS)

3.6.1 Implemented components

  • Uniform-random: every repair operator has the same probability of be chosen to modify a modification point.

  • Weighted-random: selection of operator based on non-uniform probability distribution over the repair operators.

  • Custom: name of class that extends OperatorSelectionStrategy.

3.6.2 Description

The extension point EP_OS allows Astor’s users to specify a strategy to select, given a modification point , one operator from the operator space. By default, Astor provides a strategy that applies uniform random selection and it does not depend on the selected . This strategy is used by approaches that uses selective navigation of the search space such as jGenProg and it is executed at line 4 from Algorithm 2. This extension point is useful for implementing strategies based on probabilistic models such those presented by Martinez2013 (). In that work, several repair models are defined from different sets of bug fix commits, where each model is composed by repair operators and their associated probabilities calculated based on changes found in the commits.


3.7 Ingredient pool definition (EP_IPD)

3.7.1 Implemented components

  • File: pool with ingredients written in the same file where the patch is applied.

  • Package: pool with ingredients written in the same package where the patch is applied.

  • Global: pool with all ingredients from the application under repair.

  • Custom: name of class that extends class AstorIngredientSpace.

3.7.2 Description

The ingredient pool contains all pieces of code that an ingredient-based repair approach can use for synthesizing a patch (section 2.4). The extension point EP_IPD allows to customize the creation of the ingredient pool. Astor provides three methods of building an ingredient pool: file, package and global scopes. For synthesizing a candidate patch to be applied in the modification point , when “file” scope is used, the approach selects ingredients from the same file where the patch will be applied (i.e., ). When the scope is “package”, the ingredient pool is conformed by all code from the package that contains the modification point . When the scope is “global”, the ingredient pool has all code from the program under repair. The “file” ingredient pool is smaller than the package-one, which is itself smaller than the global one.


3.8 Selection of ingredients (EP_IS)

3.8.1 Implemented components

  • Uniform-random: ingredient randomly chosen from ingredient pool.

  • Code-similarity-based: ingredient chosen from similar methods to the buggy method.

  • Name-probability-based: ingredient chosen based on the frequency of its variable’s names.

  • Custom: name of class that extends class IngredientSearchStrategy.

3.8.2 Description

The extension point EP_IS allows to specify the strategy that an ingredient-based repair approach from Astor uses for selecting an ingredient from the ingredient pool. Between the implementations of this point provided by Astor, One, used by default by jGenProg, executes uniform random selection for selecting an ingredient from a pool built given a scope (see section 3.7). Another, defined for DeepRepair approach, prioritizes ingredients that come from methods which are similar to the buggy method.


3.9 Ingredient transformation (EP_IT)

3.9.1 Implemented components

  • No-transformation: ingredients are not transformed.

  • Random-variable-replacement: out-of-scope variables from an ingredients are replaced by randomly chosen in-scope variables.

  • Name-cluster-based: out-of-scope variables from an ingredients are replaced by similar named in-scope variables.

  • Name-probability-based: out-of-scope variables from an ingredients are replaced by in-scope variable based on the frequency of variable’s names.

  • Custom: name of class that extends class IngredientTransformationStrategy.

3.9.2 Description

The extension point EP_IT allows to specify the strategy used for transforming ingredients selected from the pool. Astor provides four implementation of this extension point. For instance, the strategy defined for DeepRepair approach replaces each out-of-scope variable form the ingredient by one variable in the scope of the modification points. The selection of that variable is based on a cluster of variable names, which each cluster variable having semantically related names white2017dl (). Cardumen uses a probabilistic model for selecting the most frequent variables names to be used in the patch. On the contrary, jGenProg, as also the original GenProg, does not transform any ingredient.


3.10 Candidate Patch Validation (EP_PV)

3.10.1 Implemented components

  • Test-suite: original test-suite used for validating a candidate patch.

  • Augmented-test-suite: new test cases are generated for augmented the original test suite used for validation.

  • Custom: name of class that extends class ProgramVariantValidator.

3.10.2 Description

The extension point EP_PV executes the validation process of a patch (section 2.5.4). Astor framework provides to test-suite based repair approaches a validation process that runs the test-suite on the patched program. The validation is executed in Algorithm 2 line 10.

Another strategy implemented in Astor was called MinImpact Zu2017Test4Repair (), proposed to alleviate the problem of patch overfitting smith2015cure (). MinImpact uses additional automatically generated test cases to further check the correctness of a list of generated test-suite adequate patches and returns the one with the highest probability of being correct. MinImpact implements the extension point EP_PV by generating new test cases (i.e., inputs and outputs) over the buggy suspicious files, using Evosuite evosuite () as test-suite generation tool. Once generated the new test cases, MinImpact executes them over the patched version. The intuition is that the more additional test cases fail on a tentatively patched program, the more likely the corresponding patch is an overfitting patch. MinImpact then sorts the generated patches by prioritizing those with less failures over the new tests.

Moreover, this extension point can be used to measure other functional and not functional properties beyond the verification of the program correctness. For example, instead of focusing on automated software repair, an approach built over Astor could target on minimizing the energy computation. For that, that approach would extend this extension point for measuring the consumption of a program variant.


3.11 Fitness Function for evaluating candidate (EP_FF)

3.11.1 Implemented components

  • Number-failing-tests: the fitness is the number of failing test cases. Lower is better. Zero means the patch is a test-suite adequate patch.

  • Custom: name of class that implements FitnessFunction.

3.11.2 Description

The extension point EP_FF allows to specify the fitness function, which consumes the output from the validation process of a program variant and assigns to its fitness value. Astor provides an implementation of this extension points which considers as fitness value the number of failing test cases (low is better).

On evolutionary approaches (section 3.3.2) such as jGenProg, this fitness function guides the evolution of a population of program variants throughout a number of generations. In a given generation , those variant with better fitness will be part of the population at generation . On the contrary, on selective or exhaustive approaches, the fitness function is only used to determined if a patched program is solution o not.


3.12 Solution prioritization (EP_SP)

3.12.1 Implemented components

  • Chronological: generated valid patches are printing chronological order, according with the time they were discovered.

  • Less-regression: patches are presented according to the number of failing cases from those generated test cases, in ascending order.

  • Custom: name of class that implements SolutionVariantSortCriterion

3.12.2 Description

The extension point EP_SP allows to specify a method for sorting the discovered valid patches. By default, approaches over Astor present patches sorted by time of discovery in the search space. Astor proposes an implementation of this point named Less-regression. The strategy, defined by MinImpact Zu2017Test4Repair (), sorts the original test-suite adequate patches with the goal of minimizing the introduction of regression faults, i.e., the approach prioritizes the patches with less failing test cases from those tests automatically generated during the validation process.

4 Repair Approaches implemented in Astor

Name Extension jGenProg jKali jMutation DeepRepair Cardumen TIBRA
Point 4.1 4.2 4.3 4.4 4.5 4.6
3.1 Fault localization (EP_FL) GZoltar - GZoltar GZoltar GZoltar GZoltar
3.2 Granularity of modification points (EP_MPG) Statement Statements + if Relational-Logical-operators Statement Expression Statement
3.3 Navigation strategy (EP_NS) Selective or Evolutionary Exhaustive Exhaustive or Selective Selective or Evolutionary Selective Selective
3.4 Selection susp. points (EP_MPS) Weighted-random Sequence Weighted-random Weighted-random Weighted-random Weighted-random
3.5 Operator space definition (EP_OD) IRR-statements Suppression Relational-Logical-operators Suppression R-expression Weighted-random
3.6 Selection operator (EP_OS) Random Sequential Random Random - Random
3.7 Ingredient pool definition (EP_IPD) Package - - Package Global Package
3.8 Selection ingredients (EP_IS) Uniform-random - - Code-similarity-based Uniform-random Uniform-random
3.9 Ingredient transformation (EP_IT) No-transf. - - Name-cluster-based Name-probability-based Random-variable-replacement
3.10 Candidate patch validation (EP_PV) Test-suite Test-suite Test-suite Test-suite Test-suite Test-suite
3.11 Fitness function (EP_FF) #failing-tests #failing-tests #failing-tests #failing-tests #failing-tests #failing-tests
3.12 Solution prioritization (EP_SP) Chronol. Chronol. Chronol. Chronol. Chronol. Chronol.
Table 2: Main Extension points and decision adopted by each approach. Each approach and extension point includes a reference to the section that explain it.

In this section we present a brief descriptions of repair approaches built over Astor framework and publicly available at Github platform. Those approaches were built combining different components implemented for the extension points presented in section 3. Table 2 displays the components that form each built-in repair approach from Astor. The approaches are presented in the order they were introduced into Astor framework.

4.1 jGenprog

jGenProg is an implementation of GenProg Weimer2009 () built over Astor framework. The approach belongs to the family of ingredient-based repair approaches (section 2.4) and it has 3 repairs operators: insert, replace and remove statements. For the two first mentioned operators, jGenProg uses statements written somewhere in the application under repair for synthesizing patches that insert or replace statement. jGenProg can navigate the search space (section 3.3.2) in two ways: a) using evolutionary search, as the original GenProg does; or b) using selective, as RSRepair qi2014strength () does.

4.2 jKali

The technique Kali was presented by Qi:2015:APP:2771783.2771791 () for evaluating the incompleteness test suites used by repair approaches for validating candidate patches. The intuition of the authors was that removing code from a buggy application was sufficient to pass all test from incomplete test suite. Consequently, the generated patches overfit the incomplete test suite and are not valid of inputs not included on it. jKali is an implementation of Kali built over Astor framework which removes code and skips the execution of code by adding return statements and turning True/False expressions from if conditions. As Kali, jKali does an exhaustive navigation of the search space (section 3.3.2).

4.3 jMutation

Mutation-based repair system was introduced by Debroy et Wong debroy2010using () to repair bugs using mutation operators proposed by mutation testing approaches DeMillo1978Hint (); Hamlet1977TPA (). We implemented that system over Astor, named jMutation, with has as repair operators the mutation of relational (e.g.,, ) and logical operators (e.g., AND, OR). jMutation does an exhaustive or selective navigation of the search space (section 3.3.2).

4.4 DeepRepair

DeepRepair white2017dl () is an ingredient-based approach built over jGenProg that applies deep learning based techniques during the patch synthesis process. In particular, DeepRepair proposes new strategies for: a) selecting ingredients (section 3.8) based on similarity of methods and classes: ingredients are taken from the most similar methods or classes to those that contains the bug; b) transforming ingredients (section 3.9) based on semantic of variables names: out-of-scope variables from an ingredients are replaced by in-scope variables with semantically-related names.

4.5 Cardumen

Cardumen CardumenArxiv () is a repair system that targets fine-grained code elements: expressions. It synthesizes repairs from templates mined from the applications under repair. Then, it creates concrete patches from those templates by using a probabilistic model of variable names: it replaces template placeholders by variables with the most frequent names at the buggy location. Cardumen explores the search space using the selective strategy (section 3.3.2).

4.6 Tibra

TIBRA (Transformed Ingredient-Based Repair Approach), is an extension of jGenProg which, as difference with the original GenProg who discards ingredients with out-of-scope variables, it applies transformation into the ingredients. The approach adapts an ingredient i.e., statement taken somewhere, by replacing all variables out-of-scope from it by in-scope variables randomly chosen from the buggy location.

5 Evaluation

In this section we present an evaluation of repair approaches built over Astor. We first study the capacity of approaches presented on section 4 to repair real bugs. Then, we evaluate each extension point from section 3 by comparing different implementations of it.

5.1 Research questions

The research questions that guide the evaluation of Astor framework are:

  1. Repairability:

    1. RQ 1.1 - How many bugs from Defects4J are repaired using the repair approaches built over Astor?

    2. RQ 1.2: What are the bugs uniquely repaired by approaches from Asto?

  2. Impact of core design decisions:

    1. RQ 2.1 - What is the impact of using different fault localization tools?

    2. RQ 2.2 - Which code granularity repairs more bugs?

    3. RQ 2.3 - Does the use of a threshold on suspiciousness values help to repair faster?

    4. RQ 2.4 - What are good sets of repair operators?

    5. RQ 2.5 - Does prioritizing repair operators impact repairability?

  3. Focus on ingredient-based repair:

    1. RQ 3.1 - Does a reduced ingredient space allow to repair faster?

    2. RQ 3.2 - To what extent does the ingredient selection strategy impact repairability?

    3. RQ 3.3 - To what extent does a the ingredient transformation strategy impact repairability?

5.2 Protocol

5.2.1 Description of Defects4J: a dataset of bugs in Java

We have used repair approaches from Astor for a large evaluation consisting in searching for patches for 357 bugs from the Defects4J benchmarks JustJE2014 (), those from projects Apache Commons Math, Apache Commons Lang, JFreeChart and Joda Time. A patch is said to be test-adequate if it passes all tests, incl. the failing one. As shown by previous work Qi2015 (), a patch may be test-adequate yet incorrect, when it only works on the inputs from the test suite and does not generalize.

5.2.2 Execution protocol

We carried out an experimental procedure for responding each research question. The procedure composed by the following steps.

First, we selected the repair approaches to study according to the addressed research question. Then, we determined the configuration of each approach involved the experiment. For some configuration it was necessary to write a new implementation for particular extension points presented in section 3.

We then created scripts for launching repairs attempts for each configuration. A repair attempt consists on the execution of a repair approach executed with a timeout of 3 hours and a concrete value of random seed. In case of exhaustive approach such as jKali, the seed is not used due it does not have any stochastic sub-component. For each configuration, we run at least 3 repair attempts (i.e., 3 different seeds).

Finally, we collected the results from each repair attempt, grouped the results (i.e., patches and statistics) according to the configuration, and compared the results. Note that, for a fair comparison, when we compare the efficacy of two configurations A and B, the repair attempts use the same seeds. Example, we launched three repair attempts over A using seeds 1, 2 and 3, and another three repair attempts over B using the same seeds 1, 2 and 3.

Note that there are bugs that are repaired in one experiment associated to a research question but not in others. To our knowledge, there are two reasons that make it happen. First, the seeds used could be different, which implies that the navigation through the search space are different. Second, even using the same seed, a configuration itself impacts on the repairability of a bug. For example, different ways of building the ingredient pool (section 3.7) impacts on size of the search space. Additionally, as running our algorithms is expensive in terms of CPU time, we also considered repair attempts that we have executed in previous experiments such as those from the evaluations of Cardumen CardumenArxiv (), DeepRepair white2017dl () and MinImpact Zu2017Test4Repair ().

5.3 Evaluation of Repairability

In this section we focus on the ability of repair approaches over Astor introduced in section 4 to repair bugs from Defects4J and we compare their repairability against other repairs systems.

5.3.1 RQ 1.1 - How many bugs from Defects4J are repaired using the repair approaches built over Astor?

Table 3 displays the bugs from Defects4J repaired by approaches built over Astor framework. In total, 98 bugs out of 357 (27.4%) are repaired by at least one repair approach. Six approaches were executed: jGenProg (49 bugs repaired), jKali (29), jMutation (23), DeepRepair (51), Cardumen (77), and TIBRA (35).

We observe that there are 9 bugs such as Chart-1 and Math-2 that all evaluated repair approaches found at least one test-suite adequate patches. Contrary, 35 bugs were repaired by only one approach. 19 of them, such as Math-101, are repaired by Cardumen, 9 only by DeepRepair (e.g., Math-22), 3 by jGenProg (e.g., Chart-19), 3 by TIBRA (e.g., Chart-23), and one for jMutation (Closure-38).

Project

Bug Id

jGenProg

jKali

jMutation

DeepRepair

Cardumen

TIBRA

# approaches

Chart 1 R R R R R R 6
3 R R R R 4
4 R 1
5 R R R R R 5
6 R R 2
7 R R R R R 5
9 R R 2
11 R R 2
12 R R R R 4
13 R R R R 4
14 R 1
15 R R R R 4
17 R R 2
18 R 1
19 R 1
23 R 1
24 R R 2
25 R R R R R 5
26 R R R R R R 6
Closure 7 R R 2
10 R R R 2
12 R 1
13 R R 2
21 R R R R 3
22 R R R R 4
33 R 1
38 R 1
40 R 1
45 R R 2
46 R R R 3
49 R 1
55 R 1
133 R 1
Lang 7 R R R 3
10 R R R 3
14 R 1
20 R 1
22 R R R 3
24 R R R 3
27 R R R R 4
38 R 1
39 R R R 3
Math 2 R R R R R R 6
5 R R R R 4
6 R R 2
7 R 1
8 R R R R R 5
18 R R 2
20 R R R R R 5
22 R 1

Project

Bug Id

jGenProg

jKali

jMutation

DeepRepair

Cardumen

TIBRA

# repair tool

Math 24 R 1
28 R R R R R R 6
30 R 1
32 R R R R R 5
33 R 1
39 R 1
40 R R R R R 5
41 R 1
44 R R 2
46 R 1
49 R R R R R 5
50 R R R R R R 6
53 R R R 3
56 R R 2
57 R R R 3
58 R R R 3
60 R R R 3
62 R 1
63 R R R 3
64 R 1
69 R 1
70 R R R 3
71 R R 2
72 R 1
73 R R R R 4
74 R R R 3
77 R 1
78 R R R R R 5
79 R R 2
80 R R R R R R 6
81 R R R R R R 6
82 R R R R R R 6
84 R R R R R 5
85 R R R R R R 6
88 R R 2
95 R R R R 4
97 R R 2
98 R 1
101 R 1
104 R R 2
105 R 1
Time 4 R R R 3
7 R 1
9 R 1
11 R R R R R 5
17 R 1
18 R 1
20 R 1
98 49 29 23 51 77 35
Table 3: Bugs from dataset Defects4J repaired by approaches built over Astor. In total, 98 bugs from 5 Java projects were repaired. R means ‘bug with at least one test-suite adequate patch’.

Response to RQ 1.1: The repair approaches built over Astor find test-adequate patches for 98 real bugs out of 357 bugs from Defects4J. The best approach is Cardumen: it finds a test-suite adequate patch for 77 bugs.

Compared with other repair system evaluated over Defects4J, approaches from Astor repair more bugs (98 bugs repaired) from Defects4J than: ssFix xin2017leveraging () (60 bugs repaired), ARJA Arja1712.07804 () (59 bugs), ELIXIR Saha:2017:EEO () (40 bugs), GP-FS gpfl2017 () (37 bugs), JAID Chen2017CPR () (31 bugs), ACS Xiong2017 () (18 bugs), HDRepair le2016history () (15 bugs from xin2017leveraging ()). In particular, Cardumen repairs more that all those approaches: it found test-suite adequate patches for 77 bugs. On the contrary, Nopol nopol () (103 bugs repaired durieux:hal-01480084 ()) repairs more than Astor framework.

5.4 RQ 1.2: What are the bugs uniquely repaired by approaches from Asto?

We consider automated program repair approaches from the literature for which: 1) the evaluation was done over the dataset Defects4J; 2) the identifiers of the repaired bugs from Defect4J are given on the respective paper or included in the appendix.

We found 10 repair systems that fulfill both criteria: Nopol nopol (); durieux:hal-01480084 (), jGenProg defects4j-repair (), DynaMoth Durieux2016DDC (), HDRepair le2016history (), DeepRepair white2017dl (), ACS Xiong2017 (), GP-FS gpfl2017 (), JAID Chen2017CPR (), ssFix xin2017leveraging () and ARJA Arja1712.07804 (). In the case of HDRepair, as neither the identifiers of the repaired bugs nor the actual patches were reported by le2016history (), we considered the results reported by xin2017leveraging () (ssFix’s authors) who executed the approach. We discarded the Java systems JFix (Le2017JSR ()), S3 (Le2017SSS ()), Genesis (Long2017AIC ()) (evaluated over different bug datasets) and ELIXIR Saha:2017:EEO () (repaired ids from Defect4J not publicly available).

Approaches built on Astor framework find test-suite adequate patches for 11 new bugs of Defects4J, for which no other system ever has managed to find a single one. Those uniquely repaired bugs are: 5 from Math (ids: 62, 64, 72, 77, 101), 2 from Time (9, 20), 2 from Chart (11, 23), and 2 from Closure (13, 46).

Response to RQ 1.2: The repair approaches built over Astor find new unique test-adequate patches. Astor repair 11 bugs which have never been repaired previously by any repair system.

In the remain of the section, we evaluates different section implementation for each extension point from 3.

5.5 Evaluation of design of repair approaches

This section focus on evaluating the impact of using different components for the extension points presented in section 3. Each research question targets to one particular extension point and evaluates two or more implementations of it.

5.5.1 RQ 2.1 - What is the impact of using different fault localization tools?

By default, approaches built over Astor use a spectrum-based fault localization library called GZoltar. This experiment evaluates two versions of jGenProg, where each one uses a different implementation of the extension point EP_FL. One uses a component based on GZoltar (default), the other one component based on the spectrum-based fault localization library named CocoSpoon.222https://github.com/SpoonLabs/CoCoSpoon By default both fault localization tools to use the Ochiai ABREU20091780 () metric to compute the suspiciousness values.

We then compared the repairability of both versions. The result of this experiment shows that 20 bugs are repaired by jGenProg using GZoltar and 14 by jGenProg using CoCoSpoon. We found that the difference of bugs repaired is due to technical issues of CocoSpoon: it throws OutOfMemoryError error, which we could not overcome even increasing the memory.

Between the 20 bugs repaired by at least one configuration, the number of patches using the GZoltar configuration is, in average, larger than using the CoCoSpoon configuration: 10.9 vs 5, respectively. For 11 bugs, the number of patches using GZoltar were larger, whereas for 7 bugs they were equivalent. For the remaining 2 bugs (Math-71 and 74), both configurations find test-suite adequate patches, but that one using CoCoSpoon finds a larger number.

Response to RQ 2.1: The extension point EP_FL for implementing the fault localization has an impact on the repairability. jGenProg using GZoltar repairs more bugs than using CoCoSpoon (20 vs 14) and finds a larger or equal number of patches for 18 out of 20 bugs. The reason is that CoCoSpoon fails when instrumenting large buggy programs. This result is aligned to those by Qi et al. mao2012FL (), which showed that using different fault localization techniques over GenProg have an impact on the repair process.

5.5.2 RQ 2.2 - Which code granularity repairs more bugs?

We compared the repairability of two approaches that use different implementations of the extension point EP_MPG for manipulating different granularity of code source. The level of code granularity impacts on the size of the search space and thus in the ease to find a patch. On one hand, Cardumen approach is able to synthesize a fine grained patches by modifying, for example, an expression inside a statement with other expressions inside. On the other hand, the modifications done by jGenProg (or by any approach that extends it such as TIBRA and DeepRepair) are coarse-level: as it works at the level of statements, an entire statement is inserted, deleted or replaced.

Table 3 shows that, 52 bugs are repaired by either Cardumen (expression level) or jGenProg’s family approaches. In total, those repair 77 and 72 bugs, respectively. This means that there are 25 bugs repaired by Cardumen but not by jGenProgs family, and 20 by only this latter family of approaches.

Response to RQ 2.2: The extension point EP_MPG has an impact on repair. We compared the extensions statements and expression implemented in jGenProg and Cardumen, respectively. By applying operators at the level of statements and expressions, 52 patches (72.2% and 67.5%, resp.) are repaired by both jGenProg and Cardumen, respectively. The remaining bugs (20 and 25, resp.) are only repaired using a specific granularity.

The implication is that, for those bugs repaired by both approaches, there are patches that: a) produce similar behaviours w.r.t the test-suite (i.e., passing all tests), and b) the changed codes have different granularities. For example, both jGenProg and Cardumen synthesize the following patch for bug Math-70:

72 -            return solve(min, max);
72 +            return solve(f, min, max);
        }
Listing 2: Patch for Math-70 at class BisectionSolver.java

jGenProg synthesizes that patch by applying the replace operator to the modification point that references to the return statement at line 72 of class BisectionSolver. The replacement return statement (i.e., the ingredient) is taken from line 59 from the same buggy class. Meanwhile, Cardumen arrives to the same patch by replacing a modification point that references to the expression corresponding to the method invocation solve inside the return statement at line 72. The replacement synthesized from a template

solve(_UnivariateRealFunction_0, _double_1, _double_2) mined from the same class BisectionSolver and instantiated using variables in scope at line 72.

One of the 25 bugs repaired by Cardumen but not by jGenProg is Math-101 The patch proposed modifies the expression related to a variable initialization (endIndex) at line 376 from startIndex + n to source.length() on class ComplexFormat.

376 -   int endIndex = startIndex + n;
376 +   int endIndex = source.length();
Listing 3: Patch for Math-101 by Cardumen

Cardumen is able to synthesize the patch by instantiating the template \_String\_0.length() mined from the application under repair. Approaches working at a different (and coarse) granularity are not capable to synthesize that patch: in the case of jGenProg, the statement int endIndex = source.length(); (the ingredient for the fix) does not exist anywhere in the application under repair.

Astor framework provides to developers of approaches the flexibility to manipulate specific code elements at a given granularity level by implementing the extension point EP_MPG. For instance, jMutation implements EP_MPG to manipulate relational and logical binary operator. This allows to target to specific defect classes and to reduce the search space size. That is the case for bug Closure-38, which is only repaired by jMutation.

245 - if (x < 0) && (prev == ’-’) {
245 + if (x <= 0) && (prev == ’-’) {
Listing 4: Patch for Closure-38 by jMutation at class CodeConsumer

5.5.3 RQ 2.3 - Does the use of a threshold on suspiciousness values help to repair faster?

For measuring the degree that the suspiciousness of code impacts on the repairability, we executed two versions of jGenProg, which differ on the implementation of the extension point EP_MPS (section 3.4). Those version use different strategies for selecting a suspicious modification point: a) uniform random; or b) weighted random. The weight of one modification point is related to the suspiciousness value given by the fault localization tool to code source pointed by .

Bug Id Uniform Random Weighted Random
first patch median first patch median
time time to patch time time to patch
Chart-1 1,4 7,9 0,4 10,6
Chart-3 0,4 9,9 0,7 23,6
Chart-5 0,6 1,9 0,3 0,6
Chart-7 1 30,8 0,2 47,4
Chart-12 2,1 6,9 0,3 13,2
Chart-26 6,9 42,3 2,8 23,6
Math-2 4,5 33,9 6,2 26,2
Math-5 21,6 49,4 5,8 9,2
Math-8 6,6 19,5 7 15,6
Math-20 12,8 78,4 32,7 69,2
Math-28 5,8 45,1 6,7 32
Math-40 18,8 107 22,4 156
Math-44 - - 20,2 20,2
Math-49 2,9 47,9 15,8 31,8
Math-50 5,6 27,4 2,3 14,4
Math-53 9,7 70,8 1,9 6,1
Math-60 59,7 59,7 6,6 26,9
Math-70 0,8 2 0,3 0,7
Math-71 32,2 127,8 80,2 80,2
Math-73 1,1 15,5 0,2 4,5
Math-74 12,2 47,1 61,1 69,5
Math-78 3 48,9 3,5 92,3
Math-80 0,3 64,4 2,9 28,7
Math-81 5,8 73,9 0,7 46,5
Math-82 3,4 28,4 0,3 50,9
Math-84 44,4 96,4 81,5 98,3
Math-85 0,2 12,2 1,2 15,4
Math-95 11 99,8 8,4 24,5
Time-4 1,2 34,3 0,4 5,2
Time-11 6,8 46,3 5,9 61,2
Table 4: Comparison of strategies for navigating the suspicious space. Two strategies presented: uniform random and weighted random, where the weight of a suspicious corresponds to the suspiciousness value assigned by the fault localization approach. Time expressed in minutes.

It shows that 30 bugs are repaired using the weighted-random based strategy, whereas 29 are repaired using the uniform random based strategy. The weighted random strategy allows to find faster the first patch for 17 out of 30 bugs (56.7%) whereas the median time for finding a patch is also lower for 19 out of 30 bugs (63.3%).

Response to RQ 2.3: The extension point EP_MPS has an impact on the repair time. We compared jGenProg using weighted random selection based on suspiciousness and jGenProg using uniform random. jGenProg using suspiciousness finds faster test-suite adequate patches for 19 out of 30 bugs.

5.5.4 RQ 2.4 - What are good sets of repair operators?

We compared the repairability of repair approaches built over Astor that use different repair operators spaces, i.e., different set of operators. Repair approaches over Astor use four different operator spaces: 1) IRR-Statements (Insert, Remove, Replace) from jGenProg and its extensions; 2) Suppression from Kali; 3) R-expression from Cardumen; and 4) Relational-Logical-operators from jMutation. Each repair approach implements the operator space that it uses in the extension point EP_OS (section 3.5).

Table 3 displays the bugs repaired by repair approaches over Astor. There are 20 bugs only repaired using GenProg’s operators, that is, insert, remove or delete statement. Math-98 and Time-20 are two of those bugs. On the contrary, there are 19 bugs only repaired using the operator from Cardumen, i.e., replacement of expression. Time-17 is one of those bugs. From the rest of the bugs, 52 are repaired by both operation spaces. jMutation is the only approach built over Astor framework that found one test-suite adequate patch for Closure-38. Results from jKali shows that removing or skipping code allows to obtain test-suite adequate patches for 21 bugs. However, any bug is repair only by jKali: all bugs repaired by jKali are repaired by either Cardumen or an approach which use GenProg’s operators. This happens due to all Kali’s operator are included either on the Cardumen’s or GenProg’s operator spaces: patches that remove statements and addition of return statements can be synthesized by jGenProg, whereas patches that replace an if condition with TRUE or FALSE can be synthesized by Cardumen.

Response to RQ 2.4: The extension point EP_OS for specifying the repair operators impacts the repairability of bugs from Defect4J. We compared approaches using 4 different repair operators spaces. Cardumen, which uses replacement of expressions, repairs 77 bugs, whereas approaches that use the GenProg operators (add, remove, replace) repair in total 72 bugs.

As discussed previously, approaches from Astor repair 11 bugs that were not repaired before by another repair system. However, there are bugs that cannot be repaired by any approach from Astor, but repaired by other systems such us Nopol for bug Math-4. One of the reason is repair operators from GenProg’s family and Cardumen reuse, in different ways, source code present in the application. Neither Cardumen nor jGenProg’s family are not capable of synthesizing a patch whose code ingredient or template was not previously written in the application under repair.

5.5.5 RQ 2.5 - Does prioritizing repair operators impact repairability?

We created four different versions of jGenProg, each one implements a different strategy for selecting repair operators using the extension point EP_OS (section 3.6). The first strategy, used by default by jGenProg, is the uniform random selection, i.e., an uniform probability distributions over operators. Each one of the three other strategies prioritize one single repair operator from the jGenProg’s operator space by assigning a probability of 70% to and a 15% to each remaining operator. For those three strategies, the probability distributions of operators are asymmetric.

Bug Id Number Patches
Baseline Insert Replace Remove
(Uniform) 70% 70% 70%
Chart-1 6 6 6 6
Chart-3 38 44* 32 20
Chart-5 2 2 2 2
Chart-7 6 8 8 6
Chart-12 1 1 1 1
Chart-26 21 9 24* 11
Math-60 1 1 1 1
Math-70 2 2 1 2
Math-71 2 2 0 0
Math-73 10 10 10 10
Math-74 6* 4 1 2
Math-78 4 5* 3 3
Math-80 13* 12 8 6
Math-81 28* 12 26 12
Math-82 9 8 10* 5
Math-84 2 2 0 0
Math-85 5 5 5 3
Math-95 13 14* 11 11
Time-4 15 15 15 15
Time-11 27* 26 23 13
AVG 8.79 7.83 7.79 5.38
Best 4 3 2 0
Table 5: Comparison of strategies for selecting repair operator. The baseline strategy uses uniform random for selecting the operator. The other three strategies we presents prioritize operator Insert, Replace and Remove, respectively, with a probability of 70% to be selected. The remaining operators has a 10% of probability. A * marks the strategy with more number of patches and lower median time to find patches.

The results show that for 2 bugs (Math-71 and Math-84) , the strategy of prioritizing an operator yields to not repair those bugs. For instance, bug Math-71 is repaired by jGenProg using the 2 distributions: uniform and Insert prioritization. The two patches found are related to the insertion of code. When jGenProg gives priority to replace and remove operator, those patches are not found.

For 11 bugs, the number of patches found is different according to the probability distribution used. The uniform distribution allows jGenProg to find more patches for 4 bugs. For example, for Math-74, jGenProg using uniform distribution finds 6 different patches, whereas jGenProg prioritizing insert operator finds 4 patches for that bug. Contrary, jGenProg with Insert and Replace prioritization finds more patches for 3 and 2 bugs respectively, whereas the prioritization of remove operator does not yield a larger number of patches for any bug. For the rest of the bugs, the number of patches does not change.

Table 5 also shows the average number of patches by bug found by each strategy (row AVG). The uniform distribution allows jGenProg to find the largest number of average patches (8.79), whereas the prioritization of remove operator produces the lowest number of patches (5.38).

Response to RQ 2.5: The extension point EP_OS impacts the repairability of bugs. We compare jGenProg using uniform probability distribution over repair operators and three asymmetric probability distributions. The use of asymmetric probability distributions results in jGenProg repairing less bugs and finding, in average, fewer patches than using a uniform probability distribution.

5.6 Design of ingredient-based repair approaches

In this section we study and compare different implementations for the extension points related to ingredient-based repair approaches.

5.6.1 RQ 3.1 - Does a reduced ingredient space allow to repair faster?

In our previous work astor2016 (), we evaluated the extension point EP_IPD by using different strategies for building an ingredient pool. We executed jGenProg using the baseline ingredient pool used by the original GenProg Weimer2009 () (Global scope) and the optimized modes (File and Package scopes), based on the empirical evidences of ingredient’s locations martinez2014icse (); Barr2014PSH (). Table 6 presents the average median time for finding the first patch: 11.1, 16.7 and 41.1 minutes using the configuration File, Package and Global (baseline), respectively.

Bug ID Median time first patch Time reduction
File Package Global File vs App Pack vs App
Math-2 9 21.5 31 71% 30.6%
Math-5 5.4 5.3 27.8 80.7% 80.9%
Math-7 27.9 29.3 168.6 83.4% 82.6%
Math-28 26.4 33.4 46.2 42.8% 27.6%
Math-40 23 52.6 31.8 27.6% -39.6%
Math-44 12.1 10.9 47.1 74.3% 76.8%
Math-49 8.7 20.8 19.7 55.6% -5.6%
Math-50 4.6 5.6 7.4 38.5% 25.1%
Math-53 2 2.2 86.2 97.7% 97.5%
Math-60 - - 51.1 - -
Math-70 0.2 0.3 31.3 99.3% 99%
Math-71 4.9 7.6 -
Math-73 0.4 0.5 15.5 97.2% 96.7%
Math-74 - 67.4 12.2 - % -81.9%
Math-78 2.4 4.1 12.6 80.9% 67.7%
Math-80 10.2 3.7 11 7.3% 66.4%
Math-81 6.6 4.5 3.6 -45.7% -21%
Math-82 10.3 29.9 116.1 91.1% 74.2%
Math-84 36.2 23.1 46.6 22.2% 50.3%
Math-85 18.3 7 46.6 60.7% 85.1%
Math-95 2.3 4.3 9.9 76.7% 56.5%
Total 11.1 16.7 41.1 58.97 % 45.7 %
Table 6: Experimental results on repairing bugs of Math project with 3 different ingredient scope: File, Package and Global. The table shows the median time for finding the first patch (in minutes) using the three ingredient scopes. Last two columns correspond to the percentage of time saved by the optimization (File and Package vs Global).

This result validates the fact that locality-aware repair speeds-up repair Barr2014PSH (); martinez2014icse (). The table also shows the percentage of time saved by the optimization. Compared to the base-line, the File scope allows faster repair for 17 out of 21 bugs. We can see that bug Math-60 could not be repair with the optimized ingredient pool. The reason is that the only valid patch ingredient must be taken from another file (resp package) than the file (resp package) of the buggy statement.

Response to RQ 3.1: The extension point EP_IPD impacts the repair time. We compared jGenProg using File, Package and Global (default by GenProg) scopes for building the ingredient pool. The File ingredient pool allows jGenProg to reduce repair time from 41.1 to 11.1 minutes (73%), without hampering the repairability.

5.6.2 RQ 3.2 - To what extent does the ingredient selection strategy impact repairability?

We evaluated the extension point EP_IS (section 3.8) by comparing two strategies for selecting ingredients from the ingredient pool: 1) Uniform random selected, used by default by jGenProg; and 2) Executable-level similarity ingredient sorting: this strategy sorts ingredients according to the similarity between the ingredients’ parent methods and the method were the candidate patched will be applied. DeepRepair is an extension of jGenProg that uses this former strategy. The results of this evaluation were presented in our previous work that introduces DeepRepair approach white2017dl (). In this experiment, we consider the configuration of DeepRepair named ‘ED’ in white2017dl ().

The results show that jGenProg finds test-suite adequate patches for 48 bugs, whereas DeepRepair finds patched for 40 bugs (all repaired by the jGenProg): 10 from Chart, 7 from Lang and 23 from Math. We also observed that approximately 99%, 25%, and 36% of patches by DeepRepair for Chart, Lang, and Math are not found by jGenProg, which means that DeepRepair is able to find alternative patches.

Response to RQ 3.2: The extension point EP_IS has an impact on the number of bugs repaired. We compared the uniform random ingredient selection strategy and executable-level similarity strategy, implemented by jGenProg and DeepRepair, respectively. jGenProg repairs more bugs than DeepRepair (48 vs 40). However, DeepRepair inspects a portion of the search space not covered by jGenProg, producing new and original test-suite adequate patches.

A deeper analysis about different configuration of the strategy for navigating the ingredient pool can be found in white2017dl ().

5.6.3 RQ 3.3 - To what extent does a the ingredient transformation strategy impact repairability?

We evaluated the extension point EP_IT by comparing the performance of jGenProg using two implementations for that point: 1) No transformation (default by jGenProg and GenProg) and 2) cluster-based ingredient transformation strategy. This latter strategy is proposed by DeepRepair white2017dl (). In that work we executed both jGenProg and DeepRepair using the cluster-based strategy (configuration named ‘RE’ in white2017dl ()).

The results showed that 48 bugs are repaired by jGenProg, 49 bugs by DeepRepair (RE), and 45 bugs by both. That means that the transformation of ingredients using the cluster-based strategy allows to repair 4 bugs that jGenProg could not. Moreover, we found that there are notable differences between DeepRepair and jGenProg patches: 53%, 3%, and 53% of DeepRepair’s patches for Chart, Lang, and Math, respectively, are not found by jGenProg. Note that, due to the transformation of ingredients, the search space of DeepRepair is larger than that one from jGenProg (which is a subset of the former).

For comparing two spaces with the same size, we carried out a new experiment. We used the extension point EP_IT for implementing another new transformation strategy, named Random-variable-replacement, which replaces each variable out-of-the scope from an ingredient by one in-scope randomly chosen. We call TIBRA (section 4.6) to the extension of jGenProg that uses the mentioned strategy. We then compared TIBRA and DeepRepair. TIBRA repairs in total 35 bugs, of them 21 are also repaired by DeepRepair. This means that 28 and 14 bugs are only repaired by DeepRepair (RE) and TIBRA, respectively. This last experiment shows the benefits of using a customized strategy based on cluster of variables names. Moreover, TIBRA repairs 11 bugs (e.g., Math-63) that jGenProg does not repair, showing that the transformation of ingredient allows to find patches for unrepaired bugs.

Response to RQ 3.3: The extension point EP_IT impacts the repairability. We compared three extensions: no-transformation, cluster-based, and random-variable-replacement implemented in jGenProg, DeerRepair and TIBRA, respectively. DeepRepair and TIBRA discover new test-suite adequate patches that cannot by synthesized by jGenProg for 4 and 11 bugs, respectively.

6 Related Work

6.1 Works that extend approaches from Astor

In this section we present the repair approaches and extensions from the bibliography that were built over the Astor framework. Tanikado et al. Tanikado2017NewStrategies () extended jGenProg provided by Astor framework for introducing two novel strategies. One, named similarity-order, which extends extension point EP_IS, chooses ingredients according to code fragment similarities. The second one, named freshness-order, which extends modification EP_MPS, consists on selecting, with a certain priority, modification points whose statements were more recently updated. Wen et al. gpfl2017 () presented a systematic empirical study that explores the influence of fault space on search-based repair techniques. For that experiment, the author created the approach GP-FS, an extension of jGenProg, which receives as input a faulty space. In their experiment, the authors generated several fault spaces with different accuracy, finding that GP-FS is capable of fixing more bugs correctly when fault spaces with high accuracy are fed. White et al. white2017dl () presented DeepRepair, an extension of jGenProg, which navigates the search space guided by method and class similarity measures inferred with deep unsupervised learning. DeepRepair was incorporated to Astor framework as built-in approach.

6.2 Works that execute built-in approaches from Astor

Works from the literature executed repair approaches from Astor framework during the evaluation of their approaches. For example, Yuefei presents and study LiuYuefei2017 () for understanding and generating patches for bugs introduced by third-party library upgrades. The author run jGenProg from Astor to repair the 6 bugs, finding correctly 2 patches for bugs, and a test-suite adequate but yet incorrect patch for another bug. The approach ssFix xin2017leveraging () performs syntactic code search to find existing code from a code database (composed by the application under repair and external applications) that is syntax-related to the context of a bug statement. In their evaluation, the authors executed two approaches from Astor, jGenProg and jKali, using the same machines and configuration that used for executing ssFix.

6.3 Works that compare repairability against that one from built-in approaches from Astor

We have previously executed jGenProg and jKali over bugs from Defects4J JustJE2014 () and analyzed the correctness of the generated patches defects4j-repair (). Note that, the number of repaired bugs we reported in that experiment, executed in 2016, are lower that the results we present in this paper in section 5. The main reason is we have applied several improvements and bugfixings over Astor framework since that experiment.

Other works have used the mentioned evaluation of jGenProg and jKali presented in defects4j-repair () for measuring the improvement introduced by their new repair approaches. For example, Le et al. presented a new repair approach named HDRepair le2016history () which leverages on the development history to effectively guide and drive a program repair process. The approach first mines bug fix patterns from the history of many projects and the then employ existing mutation operators to generate fix candidates for a given buggy program. The approach ACS (Automated Condition Synthesis) Xiong2017 (), targets to insert or modify an “if” condition to repair defects by combining three heuristic ranking techniques that exploit 1) the structure of the buggy program, 2) the document of the buggy program (i.e., Javadoc comments embedded in the source code), and 3) the conditional expressions in existing projects. Yuan and Banzhaf Arja1712.07804 () present ARJA, a genetic-programming based repair approach for automated repair of Java programs. ARJA introduces a test filtering procedure that can speed up the fitness evaluation and three types of rules that can be applied to avoid unnecessary manipulations of the code. ARJA also considers the different representation of ingredient pool introduced by Astor framework astor2016 (). In addition to the evaluation of Defects4J, the authors evaluated the capacity of repair real multi-location bugs over another dataset built by themselves. Saha et al. presented Elixir Saha:2017:EEO () a repair technique which has a fixed set of parameterized program transformation schemas used for synthesized candidate patches. JAID by Chen2017CPR () is a state-based dynamic program analyses which synthesizes patches based on schemas (5 in total). Each schema trigger a fix action when a suspicious state in the system is reached during a computation. JAID has 4 types of fix actions, such as modify the state directly by assignment, and affect the state that is used in an expression.

6.4 Works that analyze patches from built-in approaches from Astor

Other works have analyzed the publicly available patches of jGenProg and jKali from our previous evaluation of repair approaches over Defects4J dataset defects4j-repair (). Motwani et al. motwani2017automated () analyzed the characteristics of the defects that repair approaches (including jGenProg and jKali) can repair. They found that automated repair techniques are less likely to produce patches for defects that required developers to write a lot of code or edit many files. They found that the approaches that target Java code, such as those from Astor, are more likely to produce patches for high-priority defects than the techniques which target C code. Yokoyama et al. Yokoyama2017Evaluating () extracted characteristics of defects from defect reports such as priority and evaluated the performance of repairs approaches against 138 defects in open source Java project included in Defects4J. They found that jGenProg is able to find patch for many high-priority defects (1 Blocker, 2 Critical, and 11 Major). Liu et al liu2017identifying () presented a approach that heuristically determines the correctness of the generated patches, by exploiting the behavior similarity of test case executions. The approach is capable of automatically detecting as incorrect the 47.1% and 52.9% of patches from jGenprog and jKali, respectively. Jiang et al. jiang2017can () analyzed the Defects4J dataset for finding bugs with weak test cases. They results shows that 42 (84.0%) of the 50 defects could be fixed with weak test suites, indicating that, beyond the current techniques have a lot of rooms for improvement, weak test suites may not be the key limiting factor for current techniques.

6.5 Other test-suite based repair approaches

During the last decade, other approaches target other programming languages (such as C) or we evaluated over other datasets rather than Defects4J were presented. Arcuri ArcuriEvolutionary () applies co-evolutionary computation to automatically generate bug fixes for Java program. GenProg Weimer2009 (); LeGoues2012TSEGP (), one of the earliest generate-and-validate techniques, uses genetic programming to search the repair space and generates patches created from existing code from elsewhere in the same program. It has three repair operators: add, replace or remove statements. Other approaches have extended GenProg: for example, AE weimer2013AE () employs a novel deterministic search strategy and uses program equivalence relation to reduce the patch search space. The original implementation Weimer2009 () targets C code and was evaluated against dataset with C bugs such as ManyBugs and IntroClass LeGoues2015MB (). Astor provides a Java version of GenProg called jGenProg which also employs genetic programming for navigating the search space. RSRepair rsrepair () has the same search space as GenProg but uses random search instead, and the empirical evaluation shows that random search can be as effective as genetic programming. Astor is able to execute a Java version of RSRepair by choosing random strategies for the selection of modification points (extension point EP_MPS) and operators (extension point EP_OS). Debroy & Wong debroy2010using () propose a mutation-based repair method inspired from mutation testing. This work combines fault localization with program mutation to exhaustively explore a space of possible patches. Astor includes a Java version of this approach called jMutation. Kali Qi2015 () has recently been proposed to examine the fixability power of simple actions, such as statement removal. As GenProg, Kali targets C code. Astor proposes a Java version of Kali, which includes all transformations proposed by Kali.

Other approaches have proposed new set of repair operators. For instance, PAR Kim2013 (), which shares the same search strategy with GenProg, uses patch templates derived from human-written patches to construct the search space. The PAR tool used the original evaluation is not publicly available. However, it is possible to implement PAR over the Astor framework by implementing the repair operator based on those templates using the extension point EP_OD. The approach SPR spr () uses a set of predefined transformation schemas to construct the search space, and patches are generated by instantiating the schemas with condition synthesis techniques. SPR is publicly available but targets C programs. An extension of SPR, Prophet prophet () applies probabilistic models of correct code learned from successful human patches to prioritize candidate patches so that the correct patches could have higher rankings.

There are approaches that leverage on human written bug fixes. For example, Genesis Long2017AIC () automatically infers code transforms for automatic patch generation. The code transformation used Genesis are automatically infer from previous successful patches. The approach first mines bug fix patterns from the history of many projects and the then employ existing mutation operators to generate fix candidates for a given buggy program. Both approaches need as input, in addition to the buggy program and its test suite, a set of bug fixes. Two approaches leveraged on semantics-based examples. SearchRepair Ke2015RPS () uses a large database of human-written code fragments encore as satisfiability modulo theories (SMT) constraints on their input-output behavior for synthesizing candidates repairs. S3 (Syntax- and Semantic-Guided Repair Synthesis) by Le2017SSS (), a repair synthesis engine that leverages programming-by-examples methodology to synthesize repairs.

Other approaches belong to the family of synthesis-based repair approaches. For example, SemFix Nguyen:2013:SPR () is a constraint based repair approach for C. This approach provides patches for assignments and conditions by combining symbolic execution and code synthesis. Nopol nopol () is also a constraint based method, which focuses on fixing bugs in if conditions and missing preconditions, as Astor, it is implemented for Java and publicly available. DynaMoth Durieux2016DDC () is based on Nopol, but replaces the SMT-based synthesis component of Nopol by a new synthesizer, based on dynamic exploration, that is able to generate richer patches than Nopol e.g., patches on If conditions with method invocations inside their condition. DirectFix directfix () achieves the simplicity of patch generation with a Maximum Satisfiability (MaxSAT) solver to find the most concise patches. Angelix Mechtaev2016 () uses a lightweight repair constraint representation called “angelic forest” to increase the scalability of DirectFix.

6.6 Studies analyzing generated patches

Recent studies have analyzed the patches generated by repair approaches from the literature. The results of those studies show that generated patches may just overfit the available test cases, meaning that they will break untested but desired functionality. For example, Qui et al. Qi2015 () find, using Kali system, that the vast majority of patches produced by GenProg, RSRepair, and AE avoid bugs simply by functionality deletion. A subsequent study by Smith et al. smith2015cure () further confirms that the patches generated by GenProg and RSRepair fail to generalize.

Due to the problematic of test overfitting, recent works by Liu2017IPC (); Zu2017Test4Repair () propose to extend existing automated repair approach such as Nopol, ACS and jGenProg. Those extended approaches generate new test inputs to enhance the test suites and use their behavior similarity to determine patch correctness. For example, Liu reported Liu2017IPC () that their approach, based on patch and test similarity analysis, successfully prevented 56.3% of the incorrect patches to be generated, without blocking any correct patches. Yang et al. presented a framework named Opad (Overfitted PAtch Detection) Yang2017BTC () to detect overfilled patches by enhancing existing test cases using fuzz testing and employing two new test oracles. Opad filters out 75.2% (321/427) overfitted patches generated by GenProg/AE, Kali, and SPR.

7 Conclusion

In this paper we presented Astor, a framework developed in Java for repairing bugs from Java applications that encodes the design space of generate-and-validate repair approaches. The framework contains the implementation of 6 repair approach and provides extension points for facilitating the reuse of components during the implementation of new approaches. The built-in implementations of repair approaches provided by Astor have been already used by researchers during the evaluations of their new repair approaches. Moreover, researchers have already implemented new components for extension points provided by the framework, giving as results new repair approaches and extensions. This paper also presented an evaluation of the approaches provided by Astor, which repair 98 real bugs from Defects4J dataset. We hope that Astor will facilitate the construction of new repair approaches and comparative evaluations in automatic repair. Astor is publicly available at https://github.com/SpoonLabs/astor.

References

  • [1] Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund. An evaluation of similarity coefficients for software fault localization. In Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing, PRDC ’06, pages 39–46, 2006.
  • [2] Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan J.C. van Gemund. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 82(11):1780 – 1792, 2009. SI: TAIC PART 2007 and MUTATION 2007.
  • [3] Andrea Arcuri. Evolutionary repair of faulty software. Appl. Soft Comput., 11(4):3494–3514, June 2011.
  • [4] Earl T. Barr, Yuriy Brun, Premkumar Devanbu, Mark Harman, and Federica Sarro. The plastic surgery hypothesis. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pages 306–317, New York, NY, USA, 2014. ACM.
  • [5] J. Campos, A. Riboira, A. Perez, and R. Abreu. Gzoltar: an eclipse plug-in for testing and debugging. In 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pages 378–381, Sept 2012.
  • [6] Liushan Chen, Yu Pei, and Carlo A. Furia. Contract-based program repair without the contracts. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, pages 637–647, Piscataway, NJ, USA, 2017. IEEE Press.
  • [7] Vidroha Debroy and W. Eric Wong. Using mutation to automatically suggest fixes for faulty programs. In Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation, ICST ’10, pages 65–74, 2010.
  • [8] R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on test data selection: Help for the practicing programmer. Computer, 11(4):34–41, April 1978.
  • [9] T. Durieux, B. Cornu, L. Seinturier, and M. Monperrus. Dynamic patch generation for null pointer exceptions using metaprogramming. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 349–358, Feb 2017.
  • [10] Thomas Durieux, Benjamin Danglot, Zhongxing Yu, Matias Martinez, Simon Urli, and Martin Monperrus. The Patches of the Nopol Automatic Repair System on the Bugs of Defects4J version 1.1.0. Research Report hal-01480084, Université Lille 1 - Sciences et Technologies, 2017.
  • [11] Thomas Durieux and Martin Monperrus. Dynamoth: Dynamic code synthesis for automatic program repair. In Proceedings of the 11th International Workshop on Automation of Software Test, AST ’16, pages 85–91, New York, NY, USA, 2016. ACM.
  • [12] S. Forrest, T.V. Nguyen, W. Weimer, and C. Le Goues. A genetic programming approach to automated software repair. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 947–954. ACM, 2009.
  • [13] Gordon Fraser and Andrea Arcuri. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 416–419. ACM, 2011.
  • [14] C. Le Goues, N. Holtschulte, E. K. Smith, Y. Brun, P. Devanbu, S. Forrest, and W. Weimer. The manybugs and introclass benchmarks for automated repair of c programs. IEEE Transactions on Software Engineering, 41(12):1236–1256, Dec 2015.
  • [15] C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. Genprog: A generic method for automatic software repair. IEEE Transactions on Software Engineering, 38(1):54–72, Jan 2012.
  • [16] R. G. Hamlet. Testing programs with the aid of a compiler. IEEE Trans. Softw. Eng., 3(4):279–290, July 1977.
  • [17] Jiajun Jiang and Yingfei Xiong. Can defects be fixed with weak test suites? an analysis of 50 defects from defects4j. arXiv preprint arXiv:1705.04149, 2017.
  • [18] James A. Jones, Mary Jean Harrold, and John Stasko. Visualization of test information to assist fault localization. In Proceedings of the 24th International Conference on Software Engineering, ICSE ’02, pages 467–477, New York, NY, USA, 2002. ACM.
  • [19] René Just, Darioush Jalali, and Michael D. Ernst. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), pages 437–440, San Jose, CA, USA, July 23–25 2014.
  • [20] Yalin Ke, Kathryn T. Stolee, Claire Le Goues, and Yuriy Brun. Repairing programs with semantic code search (t). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), ASE ’15, pages 295–306, Washington, DC, USA, 2015. IEEE Computer Society.
  • [21] Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. Automatic patch generation learned from human-written patches. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 802–811, Piscataway, NJ, USA, 2013. IEEE Press.
  • [22] John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA, 1992.
  • [23] Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. Jfix: Semantics-based repair of java programs via symbolic pathfinder. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2017, pages 376–379, New York, NY, USA, 2017. ACM.
  • [24] Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. S3: Syntax- and semantic-guided repair synthesis via programming by examples. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pages 593–604, New York, NY, USA, 2017. ACM.
  • [25] Xuan Bach D Le, David Lo, and Claire Le Goues. History driven program repair. In Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on, volume 1, pages 213–224. IEEE, 2016.
  • [26] Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 3–13, Piscataway, NJ, USA, 2012. IEEE Press.
  • [27] Xinyuan Liu, Muhan Zeng, Yingfei Xiong, Lu Zhang, and Gang Huang. Identifying patch correctness in test-based automatic program repair. arXiv preprint arXiv:1706.09120, 2017.
  • [28] Xinyuan Liu, Muhan Zeng, Yingfei Xiong, Lu Zhang, and Gang Huang. Identifying patch correctness in test-based automatic program repair, 2017.
  • [29] Liu, Yuefei. Understanding and generating patches for bugs introduced by third-party library upgrades. Master’s thesis, 2017.
  • [30] Fan Long, Peter Amidon, and Martin Rinard. Automatic inference of code transforms for patch generation. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pages 727–739, New York, NY, USA, 2017. ACM.
  • [31] Fan Long and Martin Rinard. Staged program repair with condition synthesis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 166–178, New York, NY, USA, 2015. ACM.
  • [32] Fan Long and Martin Rinard. Automatic patch generation by learning correct code. SIGPLAN Not., 51(1):298–312, January 2016.
  • [33] Matias Martinez, Thomas Durieux, Romain Sommerard, Jifeng Xuan, and Martin Monperrus. Automatic repair of real bugs in java: A large-scale experiment on the defects4j dataset. Empirical Software Engineering, pages 1–29, 2016.
  • [34] Matias Martinez and Martin Monperrus. Mining software repair models for reasoning on the search space of automated program fixing. Empirical Software Engineering, pages 1–30, 2013.
  • [35] Matias Martinez and Martin Monperrus. Astor: A program repair library for java (demo). In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, pages 441–444, New York, NY, USA, 2016. ACM.
  • [36] Matias Martinez and Martin Monperrus. Open-ended exploration of the program repair search space with mined templates: the next 8935 patches for defects4j, 2017.
  • [37] Matias Martinez, Westley Weimer, and Martin Monperrus. Do the fix ingredients already exist? an empirical inquiry into the redundancy assumptions of program repair approaches. In Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion 2014, pages 492–495, 2014.
  • [38] Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. Directfix: Looking for simple program repairs. In Proceedings of the 37th International Conference on Software Engineering-Volume 1, pages 448–458. IEEE Press, 2015.
  • [39] Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pages 691–701, New York, NY, USA, 2016. ACM.
  • [40] Manish Motwani, Sandhya Sankaranarayanan, René Just, and Yuriy Brun. Do automated program repair techniques repair hard and important bugs? Empirical Software Engineering, pages 1–47, 2017.
  • [41] Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. Semfix: Program repair via semantic analysis. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 772–781, Piscataway, NJ, USA, 2013. IEEE Press.
  • [42] Jeff H. Perkins, Sunghun Kim, Sam Larsen, Saman Amarasinghe, Jonathan Bachrach, Michael Carbin, Carlos Pacheco, Frank Sherwood, Stelios Sidiroglou, Greg Sullivan, Weng-Fai Wong, Yoav Zibin, Michael D. Ernst, and Martin Rinard. Automatically patching errors in deployed software. pages 87–102, 2009.
  • [43] Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. Does genetic programming work well on automated program repair? In Computational and Information Sciences (ICCIS), 2013 Fifth International Conference on, pages 1875–1878. IEEE, 2013.
  • [44] Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 254–265, New York, NY, USA, 2014. ACM.
  • [45] Yuhua Qi, Xiaoguang Mao, Yan Lei, and Chengsong Wang. Using automated program repair for evaluating the effectiveness of fault localization techniques. In Proceedings of the 2013 International Symposium on Software Testing and Analysis, ISSTA 2013, pages 191–201, New York, NY, USA, 2013. ACM.
  • [46] Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015, pages 24–36, New York, NY, USA, 2015. ACM.
  • [47] Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015, pages 24–36, New York, NY, USA, 2015. ACM.
  • [48] Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad. Elixir: Effective object oriented program repair. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, pages 648–659, Piscataway, NJ, USA, 2017. IEEE Press.
  • [49] Edward K Smith, Earl T Barr, Claire Le Goues, and Yuriy Brun. Is the cure worse than the disease? overfitting in automated program repair. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pages 532–543. ACM, 2015.
  • [50] A. Tanikado, H. Yokoyama, M. Yamamoto, S. Sumi, Y. Higo, and S. Kusumoto. New strategies for selecting reuse candidates on automated program repair. In 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), volume 2, pages 266–267, July 2017.
  • [51] W. Weimer, Z.P. Fry, and S. Forrest. Leveraging program equivalence for adaptive program repair: Models and first results. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on, pages 356–366, Nov 2013.
  • [52] Westley Weimer, Stephanie Forrest, Claire Le Goues, and ThanhVu Nguyen. Automatic program repair with evolutionary computation. Communications of the ACM, 53(5):109, May 2010.
  • [53] Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pages 364–374, 2009.
  • [54] Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. An empirical analysis of the influence of fault space on search-based automated program repair, 2017.
  • [55] Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. Sorting and transforming program repair ingredients via deep learning code similarities, 2017.
  • [56] Qi Xin and Steven P Reiss. Leveraging syntax-related code for automated program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 660–670. IEEE, 2017.
  • [57] Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. Precise condition synthesis for program repair. In Proceedings of the 39th International Conference on Software Engineering, ICSE ’17, pages 416–426, Piscataway, NJ, USA, 2017. IEEE Press.
  • [58] Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Clément, Sebastian Lamelas, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Transactions on Software Engineering, 2016.
  • [59] Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan. Better test cases for better automated program repair. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pages 831–841, New York, NY, USA, 2017. ACM.
  • [60] H. Yokoyama, Y. Higo, and S. Kusumoto. Evaluating automated program repair using characteristics of defects. In 2017 8th International Workshop on Empirical Software Engineering in Practice (IWESEP), pages 47–52, March 2017.
  • [61] Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux, and Martin Monperrus. Test Case Generation for Program Repair: A Study of Feasibility and Effectiveness. Technical Report 1703.00198, Arxiv, 2017.
  • [62] Yuan Yuan and Wolfgang Banzhaf. Arja: Automated repair of java programs via multi-objective genetic programming, 2017.
  • [63] Mengshi Zhang, Xia Li, Lingming Zhang, and Sarfraz Khurshid. Boosting spectrum-based fault localization using pagerank. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2017, pages 261–272, New York, NY, USA, 2017. ACM.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
192302
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description