TransRepair Context-aware Program Repair for Compilation Errors

2025-05-06 0 0 1.1MB 13 页 10玖币
侵权投诉
TransRepair: Context-aware Program Repair
for Compilation Errors
Xueyang Li
SKLOIS, IIE, CAS
School of Cybersecurity, UCAS
China
Shangqing Liu
Nanyang Technological University
Singapore
Ruitao Feng
University of New South Wales
Australia
Guozhu Meng
SKLOIS, IIE, CAS
School of Cybersecurity, UCAS
China
Xiaofei Xie
Singapore Management University
Singapore
Kai Chen
SKLOIS, IIE, CAS
School of Cybersecurity, UCAS
BAAI
China
Yang Liu
Nanyang Technological University
Singapore
ABSTRACT
Automatically xing compilation errors can greatly raise the pro-
ductivity of software development, by guiding the novice or AI
programmers to write and debug code. Recently, learning-based
program repair has gained extensive attention and became the state-
of-the-art in practice. But it still leaves plenty of space for improve-
ment. In this paper, we propose an end-to-end solution TransRepair
to locate the error lines and create the correct substitute for a C
program simultaneously. Superior to the counterpart, our approach
takes into account the context of erroneous code and diagnostic
compilation feedback. Then we devise a Transformer-based neural
network to learn the ways of repair from the erroneous code as well
as its context and the diagnostic feedback. To increase the eec-
tiveness of TransRepair, we summarize 5 types and 74 ne-grained
sub-types of compilations errors from two real-world program
datasets and the Internet. Then a program corruption technique is
developed to synthesize a large dataset with 1,821,275 erroneous C
programs. Through the extensive experiments, we demonstrate that
TransRepair outperforms the state-of-the-art in both single repair
accuracy and full repair accuracy. Further analysis sheds light on
the strengths and weaknesses in the contemporary solutions for
future improvement.
CCS CONCEPTS
Software and its engineering Software defect analysis
;
Automatic programming
;
Computing methodologies Ma-
chine translation.
Both authors contributed equally to this research.
Corresponding author.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
ASE ’22, October 10–14, 2022, Rochester, MI, USA
©2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9475-8/22/10.
https://doi.org/10.1145/3551349.3560422
KEYWORDS
Program repair, compilation error, deep learning, context-aware
ACM Reference Format:
Xueyang Li, Shangqing Liu, Ruitao Feng, Guozhu Meng, Xiaofei Xie, Kai
Chen, and Yang Liu. 2022. TransRepair: Context-aware Program Repair for
Compilation Errors. In 37th IEEE/ACM International Conference on Auto-
mated Software Engineering (ASE ’22), October 10–14, 2022, Rochester, MI, USA.
ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3551349.3560422
1 INTRODUCTION
Automated program repair, which aims at xing the underlying er-
rors in a program, plays a critical role in the software development
cycle. Generally, it can be roughly categorized into program logical
error xing and compilation error xing. Compared with the wide-
spread attention on repairing program logical errors [
13
,
36
,
44
,
53
],
the compilation error xing has just gotten into the horizon of
researchers in the past few years [
2
,
30
,
69
]. Besides raising the
productivity of software development, it can also facilitate the AI
programming, such as code generation [
11
,
20
] and binary decom-
pilation [
25
,
39
]. Recent research shows that AI programmers may
produce lots of erroneous code (including compilation errors) as
human novice programmers did [
62
]. However, it is non-trivial yet
to automatically x compilation errors in an undocumented pro-
gram [
21
]. Moreover, the error messages returned by a compiler may
be obscure and cryptic considering the compiler is evolving with
new features and optimization techniques [
61
]. As a consequence,
it is desired and benecial that the program with compilation errors
can be automatically repaired to raise programming productivity
and prompt AI programming.
Automated program repair for compilation errors is a far-from-
settled problem. Prior studies [
2
,
10
,
30
,
56
] directly utilized RNN-
based encoder-decoder framework to take as input the broken
program to generate the exact x. However, the selected model
architecture has the limited learning capacity and drawbacks such
as RNNs struggle with long-range dependencies in a sequence.
Furthermore, other studies [
1
,
54
,
69
] have demonstrated that the
compiler diagnostic feedback is valuable to improve the accuracy.
arXiv:2210.03986v1 [cs.SE] 8 Oct 2022
ASE ’22, October 10–14, 2022, Rochester, MI, USA Xueyang Li, Shangqing Liu, Ruitao Feng, Guozhu Meng, Xiaofei Xie, Kai Chen, and Yang Liu
ASE ’22, October 10–14, 2022, Rochester, MI, USA Xueyang Li, Shangqing Liu, Ruitao Feng, Guozhu Meng, Xiaofei Xie, Kai Chen, and Yang Liu
1Broken Code:
2#include<stdio.h>
3#include<stdlib.h>
4int N;
5int main()
6{
7int n,i;
8scanf("%d",&n);
9int A;
10 N=n;
11 A=(int *)malloc(n*sizeof(int));
12 for(i=0;i<n;i++) scanf("%d ",&A[i]);
13 }
14 GCC Feedback:line 12 Error Message:subscripted value is
neither array nor pointer nor vector
Figure 1: The broken code with its compiler message.
For example, DrRepair [
50
] proposed to construct the program-
feedback graph by connecting same identiers in source code and
symbols (e.g., identiers, types, operators) in the compiler feed-
back to encode the semantic correspondence and further utilized
graph attention network to capture relations between program and
message to x the broken program. DrRepair has achieved the state-
of-the-art performance and outperforms previous approaches that
ignore the compiler feedback signicantly. However, through our
in-depth analysis of the feedback produced by the compiler, we nd
that the correspondence between the location of the broken code
and the error message is not completely accurate. A simple example
is illustrated in Figure 1. It shows that the feedback produced by
GCC compiler consists of the reported line number (i.e., line 12 in
Figure 1) and the error messages. The root cause is at line 9 and
the identier
𝐴
should be declared as a pointer type (i.e., “int A
“int
A”). However, the feedback produced by GCC depicts that
there is an error at line 12. The location of the root cause in the
broken program and the line number produced in the feedback are
mismatched, which demonstrates that the error message fails to re-
veal the reason of this error. Hence, the graph constructed based on
the feedback may not capture the essence of errors. Furthermore,
in Figure 1, we also nd that there is no symbol existing in the
feedback and the program-feedback graph cannot be constructed.
Finally, the context (highlighted in blue of Figure 1) can infer that
the identier A is a pointer rather than an integer, but this part of
context information is ignored in current works.
On the other hand, high quality training data is demanding for
learning-based program repair [
43
]. There are two open-source
datasets with compilation errors of C programming language (i.e.,
DeepFix [
19
] and TRACER [
2
]). The DeepFix dataset contains 37,415
correct programs and 6,971 broken programs, which fail to pass
the compilation and TRACER contains 21,994 single-line error pro-
grams
1
. Although the dataset is further augmented [
50
] by a pro-
gram corruption approach, the synthesized code is limited in error
types so that the repair performance will be greatly degraded in
front of arbitrary errors in reality. Additionally, the data for training
a repair model is not yet extensively evaluated, so it is unclear what
types of errors cannot be well learned and the underlying cause.
To address the aforementioned challenges, in this study, we pro-
pose a context-aware program repair technique to x compilation
1
The exact number is mismatched with the reported number in the original paper [
2
],
since we lter out some obvious error samples.
errors. To enrich the diversity of the broken programs, we conduct a
comprehensive analysis on compilation errors from two real-world
programs (i.e., DeepFix and TRACER) and relevant questions in
StackOverow. We summarize these common compilation errors
and obtain 74 compilation errors in terms of syntax and semantics.
We further classify these errors in 5dierent groups. We propose
ne-grained perturbation strategies for each type of tokens in a
program, and develop an automated approach to break programs
with specic errors. In such a manner, we synthesize a dataset with
1,821,275 broken programs in line with the real error scenario.
We further devise a Transformer-based program repair model (i.e.,
TransRepair) that takes as input each line of a broken program, the
context for each line of statements and the error message to locate
the errors and then x them. A pointer mechanism is incorporated
into the model that proves to be eective in solving errors involved
with out-of-vocabulary code tokens. The extensive experiments on
two open-source dataset DeepFix and TRACER have demonstrated
that TransRepair outperforms current state-of-the-art DrRepair in
repair accuracy by 4.66% and 5.7% on DeepFix and TRACER, re-
spectively. The ablation studies for both model components and
training data reveal the importance in lifting the repair ecacy.
The result analysis concludes that our approach performs the best
in xing “statement” errors and gains more advantages for “type
mismatch” and “variable declaration” errors compared to DrRepair.
Contributions. We summarize the main contributions as follows:
We empirically analyze the common compilation errors from
two public datasets and StackOverow, concluding 74 concrete
patterns of compilation errors and 5 categories. Based on that, we
further design a number of ne-grained perturbation strategies
to create a dataset of diverse broken problems.
We propose a Transformer-based repair model, which takes each
line of a broken program, its context and error messages as in-
put to locate and repair the erroneous code. According to the
best of our knowledge, we are the rst to consider the context
information for repairing the compilation errors.
The extensive experiments on two open-source datasets demon-
strate that TransRepair outperforms the state-of-the-art in both
single repair and full repair. Moreover, the ablation and failure
case studies identify the inherent advantages and limits in light
of dierent types of errors.
More details about code, model and experimental results can be
accessed from [
28
] to benet the academia and industry. The rest of
this paper is organized as follows. Section 2 presents an overview
of our approach. Section 3 introduces the data synthesis to con-
struct a corrupted dataset. Section 4 and Section 5 are the detailed
presentation of data parsing and model design. We introduce the
experimental setup and analyze experimental results in Section 6
and Section 7 respectively. Section 8 details the threats to validity
of our work, followed by the related work in Section 9. We conclude
our paper in Section 10.
2 SYSTEM OVERVIEW
In this section, we rst formulate the research problem, then provide
an overview of our approach.
Figure 1: The broken code with its compiler message.
For example, DrRepair [
69
] proposed to construct the program-
feedback graph by connecting same identiers in source code and
symbols (e.g., identiers, types, operators) in the compiler feed-
back to encode the semantic correspondence and further utilized
graph attention network to capture relations between program and
message to x the broken program. DrRepair has achieved the state-
of-the-art performance and outperforms previous approaches that
ignore the compiler feedback signicantly. However, through our
in-depth analysis of the feedback produced by the compiler, we nd
that the correspondence between the location of the broken code
and the error message is not completely accurate. A simple example
is illustrated in Figure 1. It shows that the feedback produced by
GCC compiler consists of the reported line number (i.e., line 12 in
Figure 1) and the error messages. The root cause is at line 9 and
the identier
𝐴
should be declared as a pointer type (i.e., “int A
“int
A”). However, the feedback produced by GCC depicts that
there is an error at line 12. The location of the root cause in the
broken program and the line number produced in the feedback are
mismatched, which demonstrates that the error message fails to re-
veal the reason of this error. Hence, the graph constructed based on
the feedback may not capture the essence of errors. Furthermore,
in Figure 1, we also nd that there is no symbol existing in the
feedback and the program-feedback graph cannot be constructed.
Finally, the context (highlighted in blue of Figure 1) can infer that
the identier A is a pointer rather than an integer, but this part of
context information is ignored in current works.
On the other hand, high quality training data is demanding for
learning-based program repair [
60
]. There are two open-source
datasets with compilation errors of C programming language (i.e.,
DeepFix [
30
] and TRACER [
2
]). The DeepFix dataset contains 37,415
correct programs and 6,971 broken programs, which fail to pass
the compilation and TRACER contains 21,994 single-line error pro-
grams
1
. Although the dataset is further augmented [
69
] by a pro-
gram corruption approach, the synthesized code is limited in error
types so that the repair performance will be greatly degraded in
front of arbitrary errors in reality. Additionally, the data for training
a repair model is not yet extensively evaluated, so it is unclear what
types of errors cannot be well learned and the underlying cause.
To address the aforementioned challenges, in this study, we pro-
pose a context-aware program repair technique to x compilation
1
The exact number is mismatched with the reported number in the original paper [
2
],
since we lter out some obvious error samples.
errors. To enrich the diversity of the broken programs, we conduct a
comprehensive analysis on compilation errors from two real-world
programs (i.e., DeepFix and TRACER) and relevant questions in
StackOverow. We summarize these common compilation errors
and obtain
74
compilation errors in terms of syntax and semantics.
We further classify these errors in
5
dierent groups. We propose
ne-grained perturbation strategies for each type of tokens in a
program, and develop an automated approach to break programs
with specic errors. In such a manner, we synthesize a dataset with
1,821,275
broken programs in line with the real error scenario.
We further devise a Transformer-based program repair model (i.e.,
TransRepair) that takes as input each line of a broken program, the
context for each line of statements and the error message to locate
the errors and then x them. A pointer mechanism is incorporated
into the model that proves to be eective in solving errors involved
with out-of-vocabulary code tokens. The extensive experiments on
two open-source dataset DeepFix and TRACER have demonstrated
that TransRepair outperforms current state-of-the-art DrRepair in
repair accuracy by 4.66% and 5.7% on DeepFix and TRACER, re-
spectively. The ablation studies for both model components and
training data reveal the importance in lifting the repair ecacy.
The result analysis concludes that our approach performs the best
in xing “statement” errors and gains more advantages for “type
mismatch” and “variable declaration” errors compared to DrRepair.
Contributions.
We summarize the main contributions as follows:
We empirically analyze the common compilation errors from
two public datasets and StackOverow, concluding 74 concrete
patterns of compilation errors and 5 categories. Based on that, we
further design a number of ne-grained perturbation strategies
to create a dataset of diverse broken problems.
We propose a Transformer-based repair model, which takes each
line of a broken program, its context and error messages as in-
put to locate and repair the erroneous code. According to the
best of our knowledge, we are the rst to consider the context
information for repairing the compilation errors.
The extensive experiments on two open-source datasets demon-
strate that TransRepair outperforms the state-of-the-art in both
single repair and full repair. Moreover, the ablation and failure
case studies identify the inherent advantages and limits in light
of dierent types of errors.
More details about code, model and experimental results can be
accessed from [
43
] to benet the academia and industry. The rest of
this paper is organized as follows. Section 2 presents an overview
of our approach. Section 3 introduces the data synthesis to con-
struct a corrupted dataset. Section 4 and Section 5 are the detailed
presentation of data parsing and model design. We introduce the
experimental setup and analyze experimental results in Section 6
and Section 7 respectively. Section 8 details the threats to validity
of our work, followed by the related work in Section 9. We conclude
our paper in Section 10.
2 SYSTEM OVERVIEW
In this section, we rst formulate the research problem, then provide
an overview of our approach.
TransRepair: Context-aware Program Repair for Compilation Errors ASE ’22, October 10–14, 2022, Rochester, MI, USA
Stack
Overflow
DeepFix
Broken
code
Perturbation
strategies
Correct
code
Compiler
Context
analyzer
Transformer
Encoder
Pointer
Decoder
MLP
Repaired
statement
……
l1c1merr
Corrupted
dataset Error line
number
l2c2merr
lncnmerr
Data Synthesis Data Parsing
Diagnostic
feedback
Context
Broken
code
Model Architecture
TRACER
Figure 2: The overview of TransRepair
2.1 Problem Formulation
Following the existing works [
1
,
54
,
69
], TransRepair aims at re-
pairing the program compilation errors by learning the program
semantics through deep learning techniques. Formally, given a bro-
ken program
𝑝
from a dataset
𝐷
(i.e.,
𝑝𝐷
), where
𝑝=(𝑙1, 𝑙2, ..., 𝑙𝑛)
,
𝑛
is the total number of lines in
𝑝
. Its diagnostic feedback provided
by a compiler is dened as a list of
(𝑖err, 𝑚err),
where
𝑖err
is the
reported line number, and
𝑚err
is the error message. Since the line
number in the diagnostic feedback may not match the line of the
root cause in a broken program (shown in Figure 1), the goal of
TransRepair is to learn a function
𝑓
from the dataset
𝐷
that takes
(𝑝, 𝑖err, 𝑚err)
as input and identies the location
𝑘
of the erroneous
code
𝑙𝑘
where
𝑘∈ {
1
, ..., 𝑛}
, and a repaired version of this statement
(i.e.,𝑙
𝑘). The formulation can be expressed as 𝑙
𝑘=𝑓(𝑝, 𝑖err, 𝑚err).
2.2 Approach Overview
Figure 2 presents the overview of our approach and it consists of
three sequential modules–data synthesis,data parsing and model ar-
chitecture. In the data synthesis, we rst empirically summarize the
common compilation errors from multiple error sources including
DeepFix, TRACER and a self-curated dataset from StackOverow.
We further design a set of perturbation strategies based on the
summarized compilation errors to corrupt the correct programs
from DeepFix and construct a new high-quality dataset
𝐷
that is
in line with the real scenario. For each broken program
𝑝
in the
constructed dataset, we compile it to obtain the diagnostic feedback
(i.e.,
(𝑖err, 𝑚err)
) provided by the compiler. Furthermore, we design
a context analyzer to extract the context of each line of code to
facilitate learning the context by the model. We take each line
𝑙𝑖
,
its context
𝑐𝑖
as well as the diagnostic feedback
(𝑖err, 𝑚err)
as the
input of the Transformer encoder to learn vector representations.
We further apply a fully-connected feedforward network (MLP) to
locate the line with error, and a pointer-based Transformer decoder
to generate a repair for the error code.
3 DATA SYNTHESIS
In this section, we introduce our data synthesis module that aims
at corrupting the correct program by the summarized perturbation
strategies to construct a high-quality corrupted dataset in line with
the real scenario.
3.1 Taxonomy of Compilation Errors
High quality data (e.g., large number, good diversity and accu-
rate error triage) makes a model better learn the repair rules. The
study [
69
] summarizes common compilation errors for Java, C and
C++ programming languages from DeepDelta [
54
], DeepFix [
30
]
and SPoC [
42
] respectively. Then ve types of errors are specied
as well as the corresponding corruption rules for broken code syn-
thesis. However, as we observe, there are more types of compilation
errors that appear in reality but not in the their datasets.
In this study, we construct our own dataset by manually analyz-
ing 6,971 erroneous programs in DeepFix and 21,994 programs in
TRACER. Furthermore, we conduct an intensive search in Stack-
Overow to include more diverse errors. Specically, to obtain a
collection of compilation errors, we retrieve the data on StackOver-
ow with the keywords “[syntax-error] [c]” or “[compile-error] [c]”
and get 200 questions ranked by “Highest score”
2
. All the programs
as well as their error messages in StackOverow are enclosed into
our dataset.
Manual analysis.
We recruited four experts, all of whom have
more than ve years of programming experience, to analyze the col-
lected program errors from DeepFix, TRACER and StackOverow.
First, we normalize the error messages by removing the specic
information such as identier name and line number, and group
them with the same normalized messages into distinct clusters.
Then, we spend about six man months to identify the type of errors,
and whether an error message is accurate, for example, in revealing
the causes of code errors. Specically, we divide these clusters into
four analysis tasks and assign one expert with two of them. Every
error message is analyzed by two experts for cross validation. If a
disagreement occurs, a third expert will be involved to make the
nal decision.
The compiler usually conducts the syntax analysis and semantic
analysis to ensure the correction of a program. For example, the
mistakenly spell of reserved words can incur a syntax error and
using a variable without declaration produces a semantic error.
As aforementioned, we manually analyze the collected erroneous
programs and distill a list of 74 error patterns in total. As shown in
Table 1, we further cluster these patterns into ve categories within
the syntax and semantic analysis phases. This taxonomy is built
mainly based on the principles of compiler [
4
] and the analysis
objects in each phase. In particular, a compiler will check whether
the program complies with the context-free grammar of C in syn-
tax analysis and produce syntax errors if failed. As observed in the
dataset, there are two types of errors-structure error and statement
error, signicantly varying in inuence scope and repair strategies.
Structure error denes the misuse or absence of delimiter(s) (e.g.,
“{”, “}”, “;”) in a statement or a block. It may propagate the inuence
to the entire program when a brace, for example, is missing. On the
2The queried results are as of April, 2022.
摘要:

TransRepair:Context-awareProgramRepairforCompilationErrorsXueyangLi∗SKLOIS,IIE,CASSchoolofCybersecurity,UCASChinaShangqingLiu∗NanyangTechnologicalUniversitySingaporeRuitaoFengUniversityofNewSouthWalesAustraliaGuozhuMeng†SKLOIS,IIE,CASSchoolofCybersecurity,UCASChinaXiaofeiXieSingaporeManagementUniver...

展开>> 收起<<
TransRepair Context-aware Program Repair for Compilation Errors.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.1MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注