
ASE ’22, October 10–14, 2022, Rochester, MI, USA Xueyang Li, Shangqing Liu, Ruitao Feng, Guozhu Meng, Xiaofei Xie, Kai Chen, and Yang Liu
ASE ’22, October 10–14, 2022, Rochester, MI, USA Xueyang Li, Shangqing Liu, Ruitao Feng, Guozhu Meng, Xiaofei Xie, Kai Chen, and Yang Liu
1Broken Code:
2#include<stdio.h>
3#include<stdlib.h>
4int N;
5int main()
6{
7int n,i;
8scanf("%d",&n);
9int A;
10 N=n;
11 A=(int *)malloc(n*sizeof(int));
12 for(i=0;i<n;i++) scanf("%d ",&A[i]);
13 }
14 GCC Feedback:line 12 Error Message:subscripted value is
neither array nor pointer nor vector↩→
Figure 1: The broken code with its compiler message.
For example, DrRepair [
50
] proposed to construct the program-
feedback graph by connecting same identiers in source code and
symbols (e.g., identiers, types, operators) in the compiler feed-
back to encode the semantic correspondence and further utilized
graph attention network to capture relations between program and
message to x the broken program. DrRepair has achieved the state-
of-the-art performance and outperforms previous approaches that
ignore the compiler feedback signicantly. However, through our
in-depth analysis of the feedback produced by the compiler, we nd
that the correspondence between the location of the broken code
and the error message is not completely accurate. A simple example
is illustrated in Figure 1. It shows that the feedback produced by
GCC compiler consists of the reported line number (i.e., line 12 in
Figure 1) and the error messages. The root cause is at line 9 and
the identier
𝐴
should be declared as a pointer type (i.e., “int A”
→
“int
∗
A”). However, the feedback produced by GCC depicts that
there is an error at line 12. The location of the root cause in the
broken program and the line number produced in the feedback are
mismatched, which demonstrates that the error message fails to re-
veal the reason of this error. Hence, the graph constructed based on
the feedback may not capture the essence of errors. Furthermore,
in Figure 1, we also nd that there is no symbol existing in the
feedback and the program-feedback graph cannot be constructed.
Finally, the context (highlighted in blue of Figure 1) can infer that
the identier A is a pointer rather than an integer, but this part of
context information is ignored in current works.
On the other hand, high quality training data is demanding for
learning-based program repair [
43
]. There are two open-source
datasets with compilation errors of C programming language (i.e.,
DeepFix [
19
] and TRACER [
2
]). The DeepFix dataset contains 37,415
correct programs and 6,971 broken programs, which fail to pass
the compilation and TRACER contains 21,994 single-line error pro-
grams
1
. Although the dataset is further augmented [
50
] by a pro-
gram corruption approach, the synthesized code is limited in error
types so that the repair performance will be greatly degraded in
front of arbitrary errors in reality. Additionally, the data for training
a repair model is not yet extensively evaluated, so it is unclear what
types of errors cannot be well learned and the underlying cause.
To address the aforementioned challenges, in this study, we pro-
pose a context-aware program repair technique to x compilation
1
The exact number is mismatched with the reported number in the original paper [
2
],
since we lter out some obvious error samples.
errors. To enrich the diversity of the broken programs, we conduct a
comprehensive analysis on compilation errors from two real-world
programs (i.e., DeepFix and TRACER) and relevant questions in
StackOverow. We summarize these common compilation errors
and obtain 74 compilation errors in terms of syntax and semantics.
We further classify these errors in 5dierent groups. We propose
ne-grained perturbation strategies for each type of tokens in a
program, and develop an automated approach to break programs
with specic errors. In such a manner, we synthesize a dataset with
1,821,275 broken programs in line with the real error scenario.
We further devise a Transformer-based program repair model (i.e.,
TransRepair) that takes as input each line of a broken program, the
context for each line of statements and the error message to locate
the errors and then x them. A pointer mechanism is incorporated
into the model that proves to be eective in solving errors involved
with out-of-vocabulary code tokens. The extensive experiments on
two open-source dataset DeepFix and TRACER have demonstrated
that TransRepair outperforms current state-of-the-art DrRepair in
repair accuracy by 4.66% and 5.7% on DeepFix and TRACER, re-
spectively. The ablation studies for both model components and
training data reveal the importance in lifting the repair ecacy.
The result analysis concludes that our approach performs the best
in xing “statement” errors and gains more advantages for “type
mismatch” and “variable declaration” errors compared to DrRepair.
Contributions. We summarize the main contributions as follows:
•
We empirically analyze the common compilation errors from
two public datasets and StackOverow, concluding 74 concrete
patterns of compilation errors and 5 categories. Based on that, we
further design a number of ne-grained perturbation strategies
to create a dataset of diverse broken problems.
•
We propose a Transformer-based repair model, which takes each
line of a broken program, its context and error messages as in-
put to locate and repair the erroneous code. According to the
best of our knowledge, we are the rst to consider the context
information for repairing the compilation errors.
•
The extensive experiments on two open-source datasets demon-
strate that TransRepair outperforms the state-of-the-art in both
single repair and full repair. Moreover, the ablation and failure
case studies identify the inherent advantages and limits in light
of dierent types of errors.
More details about code, model and experimental results can be
accessed from [
28
] to benet the academia and industry. The rest of
this paper is organized as follows. Section 2 presents an overview
of our approach. Section 3 introduces the data synthesis to con-
struct a corrupted dataset. Section 4 and Section 5 are the detailed
presentation of data parsing and model design. We introduce the
experimental setup and analyze experimental results in Section 6
and Section 7 respectively. Section 8 details the threats to validity
of our work, followed by the related work in Section 9. We conclude
our paper in Section 10.
2 SYSTEM OVERVIEW
In this section, we rst formulate the research problem, then provide
an overview of our approach.
Figure 1: The broken code with its compiler message.
For example, DrRepair [
69
] proposed to construct the program-
feedback graph by connecting same identiers in source code and
symbols (e.g., identiers, types, operators) in the compiler feed-
back to encode the semantic correspondence and further utilized
graph attention network to capture relations between program and
message to x the broken program. DrRepair has achieved the state-
of-the-art performance and outperforms previous approaches that
ignore the compiler feedback signicantly. However, through our
in-depth analysis of the feedback produced by the compiler, we nd
that the correspondence between the location of the broken code
and the error message is not completely accurate. A simple example
is illustrated in Figure 1. It shows that the feedback produced by
GCC compiler consists of the reported line number (i.e., line 12 in
Figure 1) and the error messages. The root cause is at line 9 and
the identier
𝐴
should be declared as a pointer type (i.e., “int A”
→
“int
∗
A”). However, the feedback produced by GCC depicts that
there is an error at line 12. The location of the root cause in the
broken program and the line number produced in the feedback are
mismatched, which demonstrates that the error message fails to re-
veal the reason of this error. Hence, the graph constructed based on
the feedback may not capture the essence of errors. Furthermore,
in Figure 1, we also nd that there is no symbol existing in the
feedback and the program-feedback graph cannot be constructed.
Finally, the context (highlighted in blue of Figure 1) can infer that
the identier A is a pointer rather than an integer, but this part of
context information is ignored in current works.
On the other hand, high quality training data is demanding for
learning-based program repair [
60
]. There are two open-source
datasets with compilation errors of C programming language (i.e.,
DeepFix [
30
] and TRACER [
2
]). The DeepFix dataset contains 37,415
correct programs and 6,971 broken programs, which fail to pass
the compilation and TRACER contains 21,994 single-line error pro-
grams
1
. Although the dataset is further augmented [
69
] by a pro-
gram corruption approach, the synthesized code is limited in error
types so that the repair performance will be greatly degraded in
front of arbitrary errors in reality. Additionally, the data for training
a repair model is not yet extensively evaluated, so it is unclear what
types of errors cannot be well learned and the underlying cause.
To address the aforementioned challenges, in this study, we pro-
pose a context-aware program repair technique to x compilation
1
The exact number is mismatched with the reported number in the original paper [
2
],
since we lter out some obvious error samples.
errors. To enrich the diversity of the broken programs, we conduct a
comprehensive analysis on compilation errors from two real-world
programs (i.e., DeepFix and TRACER) and relevant questions in
StackOverow. We summarize these common compilation errors
and obtain
74
compilation errors in terms of syntax and semantics.
We further classify these errors in
5
dierent groups. We propose
ne-grained perturbation strategies for each type of tokens in a
program, and develop an automated approach to break programs
with specic errors. In such a manner, we synthesize a dataset with
1,821,275
broken programs in line with the real error scenario.
We further devise a Transformer-based program repair model (i.e.,
TransRepair) that takes as input each line of a broken program, the
context for each line of statements and the error message to locate
the errors and then x them. A pointer mechanism is incorporated
into the model that proves to be eective in solving errors involved
with out-of-vocabulary code tokens. The extensive experiments on
two open-source dataset DeepFix and TRACER have demonstrated
that TransRepair outperforms current state-of-the-art DrRepair in
repair accuracy by 4.66% and 5.7% on DeepFix and TRACER, re-
spectively. The ablation studies for both model components and
training data reveal the importance in lifting the repair ecacy.
The result analysis concludes that our approach performs the best
in xing “statement” errors and gains more advantages for “type
mismatch” and “variable declaration” errors compared to DrRepair.
Contributions.
We summarize the main contributions as follows:
•
We empirically analyze the common compilation errors from
two public datasets and StackOverow, concluding 74 concrete
patterns of compilation errors and 5 categories. Based on that, we
further design a number of ne-grained perturbation strategies
to create a dataset of diverse broken problems.
•
We propose a Transformer-based repair model, which takes each
line of a broken program, its context and error messages as in-
put to locate and repair the erroneous code. According to the
best of our knowledge, we are the rst to consider the context
information for repairing the compilation errors.
•
The extensive experiments on two open-source datasets demon-
strate that TransRepair outperforms the state-of-the-art in both
single repair and full repair. Moreover, the ablation and failure
case studies identify the inherent advantages and limits in light
of dierent types of errors.
More details about code, model and experimental results can be
accessed from [
43
] to benet the academia and industry. The rest of
this paper is organized as follows. Section 2 presents an overview
of our approach. Section 3 introduces the data synthesis to con-
struct a corrupted dataset. Section 4 and Section 5 are the detailed
presentation of data parsing and model design. We introduce the
experimental setup and analyze experimental results in Section 6
and Section 7 respectively. Section 8 details the threats to validity
of our work, followed by the related work in Section 9. We conclude
our paper in Section 10.
2 SYSTEM OVERVIEW
In this section, we rst formulate the research problem, then provide
an overview of our approach.