Program that repairs programs: how to achieve 78.3 percent precision in automated program repair

Publié

By Lily Sun, Research Program Manager of Microsoft Research Asia

In February 2017, Microsoft and Cambridge University announced a DeepCoder algorithm that produces programs from problem inputs/outputs. DeepCoder, which operates on a novel yet greatly simplified programming language, cannot handle complex problems—general programming languages are still too hard for DeepCoder to master. So, currently, programmers don’t have to worry about being replaced by machines.

Spotlight: blog post

GraphRAG auto-tuning provides rapid adaptation to new domains

GraphRAG uses LLM-generated knowledge graphs to substantially improve complex Q&A over retrieval-augmented generation (RAG). Discover automatic tuning of GraphRAG for new datasets, making it more accurate and relevant.

But programmers have plenty of other worries, including programming bugs. Could machines assist programmers by taking over the task of bug fixes?

To test this idea, researchers from Peking University, Microsoft Research Asia (MSRA) and University of Electronic Science and Technology of China (UESTC) have developed a new approach, called Accurate Condition System (ACS), to automatically repair defects in software systems without human intervention.

For example, consider the following piece of code from Apache Math, which is used to calculate the least common multiplier from two numbers. This piece of code uses Math.abs to ensure the return value is positive. However, code defects may still return negative results for some input values.

int lcm=Math.abs(mulAndCheck(a/gdc(a,b), b));
 return lcm;

The root cause of this error is that there is one more negative number than there are positive numbers in the range of signed integers. As a result, when the value passed to Math.abs is Integer.MIN_VALUE, Math.abs cannot convert the input into a positive number, causing a negative return. A correct implementation should throw ArithmeticException() at such cases.

We could create a test to capture this fault. The input of the test is a=Integer.MIN_VALUE and b=1, and the expected output is to throw ArithmeticException. Obviously, the program fails the test because no exception will be thrown.

But when we pass this program and the corresponding tests to ACS, the following path is generated, which precisely repairs the defect:

int lcm=Math.abs(mulAndCheck(a/gdc(a,b), b));
 + if (lcm == Integer.MIN_VALUE) {
 +  throw new ArithmeticException();
 + }
 return lcm;

This latest approach stems from a legacy of historic program repair approaches. Since Genprog in 2009, there have been many different approaches offered to repair programs. However, these techniques have a significant problem: typically only a very small portion of generated patches are correct, that is, low precision in patch generation. This is because traditional program repair approaches aim for passing all the tests. However, in real-world software systems, there are only a limited number of tests, and passing the tests does not mean that the program is correct.

For example, current approaches may generate the following patch for the above program:

 int lcm=Math.abs(mulAndCheck(a/gdc(a,b), b));
 + if (b == 1) {
 +  throw new ArithmeticException();
 + }
 return lcm;

Or even the following patch:

 int lcm=Math.abs(mulAndCheck(a/gdc(a,b), b));
 - return lcm;
 + throw new ArithmeticException();

All these patches can pass the test, but are far from being correct. In fact, we can easily construct hundreds of patches to pass the test in this example. Yet few would be of high precision.

Based on the findings of Prof. Martin Rinard’s group at MIT, revealed in an ISSTA 2015 paper, the precisions of mainstream program repair approaches are less than 10 percent. Though some improved approaches have been proposed, such as Prophet and Angelix, the precision of these approaches remains less than 40 percent. In other words, the patches generated by these approaches are mostly incorrect, rendering these approaches difficult to use in practice.

The precision of ACS is a significant improvement over previous approaches. Based on the result over the Defects4J benchmark, ACS produced 23 patches, of which 18 (almost 80 percent) are correct—a significantly better result than existing approaches. In addition, the number of defects correctly repaired by ACS is also the highest among the approaches evaluated on Defects4J.

The key to ACS’s high precision is the use of multiple information sources, especially the “big code” existing on the Internet. Compared with existing techniques, ACS uses three new types of information sources.

  • First, researchers noticed that the principle of locality holds on variable uses, and apply such information to sort the variables to be used in the patches.
  • Second, ACS uses natural language analysis techniques to analyze Javadoc, and then uses the information in Javadoc to filter incorrect patches.
  • Last, and most importantly, ACS performs statistical analysis on the open-source program on the Internet, discovers the conditional probabilities of the operations over the variables and further generates correct patches.

In the above example, ACS first learns from the failed test that a conditional check throwing ArithmeticException is missing, and then uses the principle of locality to determine that variable lcm should be used in the conditional check. Lastly, based on the conditional probability, it determines “==Integer.MIN_VALUE” should be applied to lcm, generating the whole patch.

Left: Shi Han, lead researcher, Microsoft Research Asia; Middle: Lily Sun, Research Program Manager, Microsoft Research Asia; Right: Prof. Yingfei Xiong, Peking University.

The paper describing ACS “Precise Condition Synthesis for Program Repair” has been published at ICSE 2017. The authors include Yingfei Xiong, Jie Wang, Guang Huang and Lu Zhang from Peking University, Runfa Yan from UESTC and Shi Han from Microsoft Research Asia.

Related:

Publications connexes

Lire la suite

Voir tous les articles de blog