Novice Type Error Diagnosis with Natural Language Models Chuqin Geng120000000235631596 Haolin Ye1000000027402617X Yixuan

2025-05-02 0 0 739.81KB 19 页 10玖币
侵权投诉
Novice Type Error Diagnosis with Natural
Language Models
Chuqin Geng1,2[0000000235631596], Haolin Ye1[000000027402617X], Yixuan
Li1[0000000193495476], Tianyu Han1[000000016582165X], Brigitte
Pientka1[0000000225494276], and Xujie Si1,2,3[0000000237392269]
1McGill University
2Mila - Quebec Institute
3CIFAR AI Research Chair
{chuqin.geng,haolin.ye,yixuan.li,tianyu.han2}@mail.mcgill.ca
{bpientka,xsi}.cs.mcgill.ca
Abstract. Strong static type systems help programmers eliminate many
errors without much burden of supplying type annotations. However, this
flexibility makes it highly non-trivial to diagnose ill-typed programs, es-
pecially for novice programmers. Compared to classic constraint solving
and optimization-based approaches, the data-driven approach has shown
great promise in identifying the root causes of type errors with higher ac-
curacy. Instead of relying on hand-engineered features, this work explores
natural language models for type error localization, which can be trained
in an end-to-end fashion without requiring any features. We demonstrate
that, for novice type error diagnosis, the language model-based approach
significantly outperforms the previous state-of-the-art data-driven ap-
proach. Specifically, our model could predict type errors correctly 62%
of the time, outperforming the state-of-the-art Nate’s data-driven model
by 11%, in a more rigorous accuracy metric. Furthermore, we also apply
structural probes to explain the performance difference between different
language models.
Keywords: Type Error Diagnosis ·Language Model ·Natural Language
Processing ·Type System
1 Introduction
Diagnosing type errors has received much attention from both industry and
academia due to its potential of reducing efforts in computer software develop-
ment. Existing approaches such as standard compilers with type systems, report
type errors through type checking and constraint analysis. Thus, they merely
point to locations where the constraint inconsistencies can occur and such lo-
cations might be far away from the true error source. Moreover, type error lo-
calization would require programmers to understand the functionality of type
systems and check which part of the code contradicts their intent. Languages
such as C and Java force programmers to write annotations which make the
arXiv:2210.03682v1 [cs.PL] 7 Oct 2022
2 Authors Suppressed Due to Excessive Length
code neat. It also makes it easier to find the roots of type errors. Strongly typed
functional languages such as OCaml and Haskell, however, programmers need
not bother with annotations, since type systems automatically synthesize the
types. The absence of type annotation comes at a price: novices could easily get
lost in debugging their programs and locations of constraint inconsistencies from
error messages can be misleading.
Joosten et al [7] suggests that beginners usually pay more attention to un-
derlined error locations rather than error messages themselves when fixing pro-
grams. Therefore it is necessary to ameliorate the localizing performance of these
type systems. Let us consider an OCaml ill-type program in 1a. Although the
programmer intends to write a function that sums up all numbers from a list,
they mistakenly put the empty list, [], at line 3 as a base case. This should
instead be 0 as shown in Fig 1b. The compiler identifies the type error in line 5
saying that the head of the list, h, has type list rather than integer as required
by the integer addition operator.
1let rec sum Lis t xs =
2match xs with
3| [] -> [] (* root cause *)
4| h :: [] -> h
5| h :: t -> :
h + su mLi st t (* mi slead in g c ompla int *)
6this expression has type ’a list but was expected of type int
7
(a) an ill-typed OCaml program that aims to sum all the elements from a list
1match xs with
2| [] -> 0 (* <= c orr ect fix *)
3| h :: [] -> h
4| h :: t -> h + sumList t
(b) the fixed version of the OCaml code above
Fig. 1: A simple example of OCaml type error and its relevant fix.
This illustrates that programmer’s intent plays an important role in localizing
type errors. To tackle this issue, Nate [18] proposes to use data-driven models
to diagnose type errors. In this way, programmers’ intent can be learned and
incorporated into machine learning models. Nate ’s best model could achieve
over 90% accuracy in diagnosing type errors. Although this is an exciting result,
Nate ’s models are evaluated with a rather loose metric and heavily rely on a
considerable amount of hand-designed feature engineering. In addition, these fea-
tures are designed in an ad-hoc fashion which prevents them from being directly
applied to other language compilers.
Novice Type Error Diagnosis with Natural Language Models 3
Our approach adopts transformer-based language models to avoid consider-
able feature engineering. As we treat programs as natural language texts, these
models do not rely on any knowledge or features about the specific program-
ming language, thus they can be easily applied on any language. This method
may seem to ignore the syntactic structure of a given programming language.
However, we use structural probes [11] to demonstrate the structure is embed-
ded implicitly in the deep learning models’ vector geometry in Section 4. We
also propose a more rigorous metric, and show language models outperform not
only standard OCaml compiler and constraints-based approaches but also the
state-of-the-art Nate ’s models under the new metric.
Transformer-based models have achieved great success in a wide range of do-
mains in computer science including natural language processing. BERT [4] and
GPT [15,1], popular transformer variants, have shown incredible capability of
understanding natural languages. Together with its pre-training and fine-tuning
paradigm, these models can transfer knowledge learned from a large text corpus
to many downstream tasks such as token classification and next sentence predic-
tion. Empirical results suggest that the performance of these language models
even exceeds the human level in several benchmarks. In this work, we show how
to take advantage of these powerful language models to localize type errors.
First, we process programs as if they were natural language text and decompose
the processed programs at the term or subterm level into token sequences so
that they can be fed to language models. This allows us to turn the type er-
ror diagnosis problem into a token classification problem. In this way, language
models can learn how to localize type errors in an end-to-end fashion.
Contributions We propose a natural language model-based approach to the
type error localization problem. Our main contributions are as follows:
Without any feature engineering or constraints analysis, we apply different
language models including BERT, CodeBERT, and Bidirectional LSTM to
type error localization.
We study training methodology such as positive/negative transfer to im-
prove our models’ performance. Instead of using a loose evaluation metric as
proposed in previous work, we define a more rigorous, yet realistic, accuracy
metric of type error diagnosis.
Empirical results suggest that our best model can correctly predict expres-
sions responsible for type error 62% of the time, 24 points higher than SHEr-
rLoc and 11 points higher than the state-of-the-art Nate tool.
We study the interpretability of our models using structural probes and
identify the link between language models’ performance with their ability of
encoding structural information of programs such as AST.
We start by presenting the baseline, our model architecture and the structural
probe in Section 2. Section 3introduces the dataset and evaluation metric, while
Section 4presents the experiential results and our discussion. Then, Section 5
gives an overview of related work. Finally, Section 6concludes the whole paper
and proposes some directions for future work.
4 Authors Suppressed Due to Excessive Length
2 Approach
In this section, we introduce deep learning-based language models including
RNN, BERT, and CodeBERT. We take advantage of the pre-training and fine-
tuning paradigm of language models and show how to transform the type error
diagnosis problem to a token classification problem, a common downstream task
in fine-tuning. We also present the structural probe which allows us to find the
embedded structural information of programs from models’ vector geometry.
2.1 Language models
Deep learning has achieved great success in modelling languages since the inven-
tion of recurrent neural networks (RNNs) [9]. RNNs adopt “internal memory”
to retain information of prior states to facilitate the computation of the current
state. Unlike traditional deep neural networks, the output of RNNs depends on
the prior elements within the sequence which make them ideal for processing
sequential inputs such as natural languages and programs.
In this study, we also choose a bidirectional long-short term memory (Bidi-
rectional LSTM) [16] as our baseline model. However, RNNs are known to have
several drawbacks such as a lack of parallelization and weak long-range depen-
dencies. These two limitations are later addressed by the self-attention mecha-
nism introduced by the transformer. Self attention [20] is an attention mechanism
relating different positions of a single sequence in order to compute a represen-
tation of the sequence. Transformers also follow an encoder-decoder architecture
as other successful neural sequential models. Both its encoder and decoder have
been studied and shown great capabilities for modelling natural languages and
solving many downstream tasks.
BERT, which stands for Bidirectional Encoder Representations from Trans-
formers takes advantage of the encoder part of the transformer while the GPT-n
series are based on the decoder. In this work, we focus on BERT rather than
GPT-3 [1] for several reasons. First, BERT requires a fine-tuning process which
alters the pre-trained model for specific downstream tasks. This fits our formal-
ization of treating type error diagnosis as a downstream task. Second, the size
of GPT-3 is enormous compared to BERT, making it hard to train and infer.
Third, BERT is an open-source tool and easily available for users to access while
GPT-3 is not open-sourced.
2.2 The pre-training and fine-tuning scheme
The pre-training and fine-tuning scheme allows machine learning models to apply
knowledge gained from solving one task to different yet related tasks. Compared
to fine-tuning, pre-training is more essential as it determines what knowledge
is learned and stored in machine learning models. As a result, there are some
recent works on improving the pre-training scheme of language models.
BERT stands out by proposing two critical unsupervised tasks during pre-
training - Masked Language Modeling (MLM) and Next Sentence Prediction
摘要:

NoviceTypeErrorDiagnosiswithNaturalLanguageModelsChuqinGeng1;2[0000000235631596],HaolinYe1[000000027402617X],YixuanLi1[0000000193495476],TianyuHan1[000000016582165X],BrigittePientka1[0000000225494276],andXujieSi1;2;3[0000000237392269]1McGillUniversity2Mila-QuebecInstitute3CIFARAIResearchChairfchuqin...

展开>> 收起<<
Novice Type Error Diagnosis with Natural Language Models Chuqin Geng120000000235631596 Haolin Ye1000000027402617X Yixuan.pdf

共19页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:19 页 大小:739.81KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 19
客服
关注