Novice Type Error Diagnosis with Natural Language Models 3
Our approach adopts transformer-based language models to avoid consider-
able feature engineering. As we treat programs as natural language texts, these
models do not rely on any knowledge or features about the specific program-
ming language, thus they can be easily applied on any language. This method
may seem to ignore the syntactic structure of a given programming language.
However, we use structural probes [11] to demonstrate the structure is embed-
ded implicitly in the deep learning models’ vector geometry in Section 4. We
also propose a more rigorous metric, and show language models outperform not
only standard OCaml compiler and constraints-based approaches but also the
state-of-the-art Nate ’s models under the new metric.
Transformer-based models have achieved great success in a wide range of do-
mains in computer science including natural language processing. BERT [4] and
GPT [15,1], popular transformer variants, have shown incredible capability of
understanding natural languages. Together with its pre-training and fine-tuning
paradigm, these models can transfer knowledge learned from a large text corpus
to many downstream tasks such as token classification and next sentence predic-
tion. Empirical results suggest that the performance of these language models
even exceeds the human level in several benchmarks. In this work, we show how
to take advantage of these powerful language models to localize type errors.
First, we process programs as if they were natural language text and decompose
the processed programs at the term or subterm level into token sequences so
that they can be fed to language models. This allows us to turn the type er-
ror diagnosis problem into a token classification problem. In this way, language
models can learn how to localize type errors in an end-to-end fashion.
Contributions We propose a natural language model-based approach to the
type error localization problem. Our main contributions are as follows:
•Without any feature engineering or constraints analysis, we apply different
language models including BERT, CodeBERT, and Bidirectional LSTM to
type error localization.
•We study training methodology such as positive/negative transfer to im-
prove our models’ performance. Instead of using a loose evaluation metric as
proposed in previous work, we define a more rigorous, yet realistic, accuracy
metric of type error diagnosis.
•Empirical results suggest that our best model can correctly predict expres-
sions responsible for type error 62% of the time, 24 points higher than SHEr-
rLoc and 11 points higher than the state-of-the-art Nate tool.
•We study the interpretability of our models using structural probes and
identify the link between language models’ performance with their ability of
encoding structural information of programs such as AST.
We start by presenting the baseline, our model architecture and the structural
probe in Section 2. Section 3introduces the dataset and evaluation metric, while
Section 4presents the experiential results and our discussion. Then, Section 5
gives an overview of related work. Finally, Section 6concludes the whole paper
and proposes some directions for future work.