Novice Type Error Diagnosis with Natural Language Models Chuqin Geng120000000235631596 Haolin Ye1000000027402617X Yixuan

2025-05-02 0 0 739.81KB 19 页 10玖币

侵权投诉

Novice Type Error Diagnosis with Natural

Language Models

Chuqin Geng1,2[0000−0002−3563−1596], Haolin Ye1[0000−0002−7402−617X], Yixuan

Li1[0000−0001−9349−5476], Tianyu Han1[0000−0001−6582−165X], Brigitte

Pientka1[0000−0002−2549−4276], and Xujie Si1,2,3[0000−0002−3739−2269]

1McGill University

2Mila - Quebec Institute

3CIFAR AI Research Chair

{chuqin.geng,haolin.ye,yixuan.li,tianyu.han2}@mail.mcgill.ca

{bpientka,xsi}.cs.mcgill.ca

Abstract. Strong static type systems help programmers eliminate many

errors without much burden of supplying type annotations. However, this

ﬂexibility makes it highly non-trivial to diagnose ill-typed programs, es-

pecially for novice programmers. Compared to classic constraint solving

and optimization-based approaches, the data-driven approach has shown

great promise in identifying the root causes of type errors with higher ac-

curacy. Instead of relying on hand-engineered features, this work explores

natural language models for type error localization, which can be trained

in an end-to-end fashion without requiring any features. We demonstrate

that, for novice type error diagnosis, the language model-based approach

signiﬁcantly outperforms the previous state-of-the-art data-driven ap-

proach. Speciﬁcally, our model could predict type errors correctly 62%

of the time, outperforming the state-of-the-art Nate’s data-driven model

by 11%, in a more rigorous accuracy metric. Furthermore, we also apply

structural probes to explain the performance diﬀerence between diﬀerent

language models.

Keywords: Type Error Diagnosis ·Language Model ·Natural Language

Processing ·Type System

1 Introduction

Diagnosing type errors has received much attention from both industry and

academia due to its potential of reducing eﬀorts in computer software develop-

ment. Existing approaches such as standard compilers with type systems, report

type errors through type checking and constraint analysis. Thus, they merely

point to locations where the constraint inconsistencies can occur and such lo-

cations might be far away from the true error source. Moreover, type error lo-

calization would require programmers to understand the functionality of type

systems and check which part of the code contradicts their intent. Languages

such as C and Java force programmers to write annotations which make the

arXiv:2210.03682v1 [cs.PL] 7 Oct 2022

2 Authors Suppressed Due to Excessive Length

code neat. It also makes it easier to ﬁnd the roots of type errors. Strongly typed

functional languages such as OCaml and Haskell, however, programmers need

not bother with annotations, since type systems automatically synthesize the

types. The absence of type annotation comes at a price: novices could easily get

lost in debugging their programs and locations of constraint inconsistencies from

error messages can be misleading.

Joosten et al [7] suggests that beginners usually pay more attention to un-

derlined error locations rather than error messages themselves when ﬁxing pro-

grams. Therefore it is necessary to ameliorate the localizing performance of these

type systems. Let us consider an OCaml ill-type program in 1a. Although the

programmer intends to write a function that sums up all numbers from a list,

they mistakenly put the empty list, [], at line 3 as a base case. This should

instead be 0 as shown in Fig 1b. The compiler identiﬁes the type error in line 5

saying that the head of the list, h, has type list rather than integer as required

by the integer addition operator.

1let rec sum Lis t xs =

2match xs with

3| [] -> [] (* root cause *)

4| h :: [] -> h

5| h :: t -> :

h + su mLi st t (* mi slead in g c ompla int *)

6this expression has type ’a list but was expected of type int

(a) an ill-typed OCaml program that aims to sum all the elements from a list

1match xs with

2| [] -> 0 (* <= c orr ect fix *)

3| h :: [] -> h

4| h :: t -> h + sumList t

(b) the ﬁxed version of the OCaml code above

Fig. 1: A simple example of OCaml type error and its relevant ﬁx.

This illustrates that programmer’s intent plays an important role in localizing

type errors. To tackle this issue, Nate [18] proposes to use data-driven models

to diagnose type errors. In this way, programmers’ intent can be learned and

incorporated into machine learning models. Nate ’s best model could achieve

over 90% accuracy in diagnosing type errors. Although this is an exciting result,

Nate ’s models are evaluated with a rather loose metric and heavily rely on a

considerable amount of hand-designed feature engineering. In addition, these fea-

tures are designed in an ad-hoc fashion which prevents them from being directly

applied to other language compilers.

Novice Type Error Diagnosis with Natural Language Models 3

Our approach adopts transformer-based language models to avoid consider-

able feature engineering. As we treat programs as natural language texts, these

models do not rely on any knowledge or features about the speciﬁc program-

ming language, thus they can be easily applied on any language. This method

may seem to ignore the syntactic structure of a given programming language.

However, we use structural probes [11] to demonstrate the structure is embed-

ded implicitly in the deep learning models’ vector geometry in Section 4. We

also propose a more rigorous metric, and show language models outperform not

only standard OCaml compiler and constraints-based approaches but also the

state-of-the-art Nate ’s models under the new metric.

Transformer-based models have achieved great success in a wide range of do-

mains in computer science including natural language processing. BERT [4] and

GPT [15,1], popular transformer variants, have shown incredible capability of

understanding natural languages. Together with its pre-training and ﬁne-tuning

paradigm, these models can transfer knowledge learned from a large text corpus

to many downstream tasks such as token classiﬁcation and next sentence predic-

tion. Empirical results suggest that the performance of these language models

even exceeds the human level in several benchmarks. In this work, we show how

to take advantage of these powerful language models to localize type errors.

First, we process programs as if they were natural language text and decompose

the processed programs at the term or subterm level into token sequences so

that they can be fed to language models. This allows us to turn the type er-

ror diagnosis problem into a token classiﬁcation problem. In this way, language

models can learn how to localize type errors in an end-to-end fashion.

Contributions We propose a natural language model-based approach to the

type error localization problem. Our main contributions are as follows:

•Without any feature engineering or constraints analysis, we apply diﬀerent

language models including BERT, CodeBERT, and Bidirectional LSTM to

type error localization.

•We study training methodology such as positive/negative transfer to im-

prove our models’ performance. Instead of using a loose evaluation metric as

proposed in previous work, we deﬁne a more rigorous, yet realistic, accuracy

metric of type error diagnosis.

•Empirical results suggest that our best model can correctly predict expres-

sions responsible for type error 62% of the time, 24 points higher than SHEr-

rLoc and 11 points higher than the state-of-the-art Nate tool.

•We study the interpretability of our models using structural probes and

identify the link between language models’ performance with their ability of

encoding structural information of programs such as AST.

We start by presenting the baseline, our model architecture and the structural

probe in Section 2. Section 3introduces the dataset and evaluation metric, while

Section 4presents the experiential results and our discussion. Then, Section 5

gives an overview of related work. Finally, Section 6concludes the whole paper

and proposes some directions for future work.

4 Authors Suppressed Due to Excessive Length

2 Approach

In this section, we introduce deep learning-based language models including

RNN, BERT, and CodeBERT. We take advantage of the pre-training and ﬁne-

tuning paradigm of language models and show how to transform the type error

diagnosis problem to a token classiﬁcation problem, a common downstream task

in ﬁne-tuning. We also present the structural probe which allows us to ﬁnd the

embedded structural information of programs from models’ vector geometry.

2.1 Language models

Deep learning has achieved great success in modelling languages since the inven-

tion of recurrent neural networks (RNNs) [9]. RNNs adopt “internal memory”

to retain information of prior states to facilitate the computation of the current

state. Unlike traditional deep neural networks, the output of RNNs depends on

the prior elements within the sequence which make them ideal for processing

sequential inputs such as natural languages and programs.

In this study, we also choose a bidirectional long-short term memory (Bidi-

rectional LSTM) [16] as our baseline model. However, RNNs are known to have

several drawbacks such as a lack of parallelization and weak long-range depen-

dencies. These two limitations are later addressed by the self-attention mecha-

nism introduced by the transformer. Self attention [20] is an attention mechanism

relating diﬀerent positions of a single sequence in order to compute a represen-

tation of the sequence. Transformers also follow an encoder-decoder architecture

as other successful neural sequential models. Both its encoder and decoder have

been studied and shown great capabilities for modelling natural languages and

solving many downstream tasks.

BERT, which stands for Bidirectional Encoder Representations from Trans-

formers takes advantage of the encoder part of the transformer while the GPT-n

series are based on the decoder. In this work, we focus on BERT rather than

GPT-3 [1] for several reasons. First, BERT requires a ﬁne-tuning process which

alters the pre-trained model for speciﬁc downstream tasks. This ﬁts our formal-

ization of treating type error diagnosis as a downstream task. Second, the size

of GPT-3 is enormous compared to BERT, making it hard to train and infer.

Third, BERT is an open-source tool and easily available for users to access while

GPT-3 is not open-sourced.

2.2 The pre-training and ﬁne-tuning scheme

The pre-training and ﬁne-tuning scheme allows machine learning models to apply

knowledge gained from solving one task to diﬀerent yet related tasks. Compared

to ﬁne-tuning, pre-training is more essential as it determines what knowledge

is learned and stored in machine learning models. As a result, there are some

recent works on improving the pre-training scheme of language models.

BERT stands out by proposing two critical unsupervised tasks during pre-

training - Masked Language Modeling (MLM) and Next Sentence Prediction

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NoviceTypeErrorDiagnosiswithNaturalLanguageModelsChuqinGeng1;2[0000000235631596],HaolinYe1[000000027402617X],YixuanLi1[0000000193495476],TianyuHan1[000000016582165X],BrigittePientka1[0000000225494276],andXujieSi1;2;3[0000000237392269]1McGillUniversity2Mila-QuebecInstitute3CIFARAIResearchChairfchuqin...

展开>> 收起<<

Novice Type Error Diagnosis with Natural Language Models Chuqin Geng120000000235631596 Haolin Ye1000000027402617X Yixuan.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Novice Type Error Diagnosis with Natural Language Models Chuqin Geng120000000235631596 Haolin Ye1000000027402617X Yixuan

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: