
(Gu et al.,2022), which are common in the legal
domain. Conveniently, domain adaptation using
large public datasets is an ideal setting for the legal
domain with abundant unlabelled data (from pub-
lic forums) and limited labelled data. To this end,
we introduce prefix domain adaptation, which per-
forms domain adaptation for prompt tuning to im-
prove fewshot performance on various legal tasks.
Overall, our main contributions are as follows:
•
We introduce prefix adaptation, a method
of domain adaptation using a prompt-based
learning approach.
•
We show empirically that performance and
calibration of prefix adaptation matches or
exceeds LEGAL-BERT in fewshot settings
while only tuning approximately 0.1% of the
model parameters.
•
We contribute two new datasets to facilitate
different legal NLP tasks on the questions
asked by laypersons, towards the ultimate ob-
jective of helping make legal services more
accessible to the public.
2 Related Works
Forums-based Datasets
Public forums have
been used extensively as sources of data for ma-
chine learning. Sites like Stack Overflow and
Quora have been used for duplicate question de-
tection (Wang et al.,2020;Sharma et al.,2019).
Additionally, many prior works have used posts
from specific sub-communities (called a "subred-
dit") on Reddit for NLP tasks, likely due to the
diversity of communities and large amount of data
provided. Barnes et al. (2021) used a large number
of internet memes from multiple meme-related sub-
reddits to predict how likely a meme is to be popu-
lar. Other works, such as Basaldella et al. (2020),
label posts from biomedical subreddits for biomed-
ical entity linking. Similar to the legal judgement
prediction task, Lourie et al. (2021) suggest using
"crowdsourced data" from Reddit to perform eth-
ical judgement prediction; that is, they use votes
from the "r/AmITheAsshole" subreddit to classify
who is "in the wrong" for a given real-life anecdote.
We explore using data from Stack Exchange and
Reddit, which has been vastly underexplored in
previous works for the legal domain.
Full Domain Adaptation
Previous works such
as BioBERT (Lee et al.,2019) and SciBERT (Belt-
agy et al.,2019) have shown positive results while
domain adapting models. In the industry, compa-
nies often use full domain adaptation for legal appli-
cations
1
.Chalkidis et al. (2020) introduce LEGAL-
BERT, a BERT-like model domain adapted for legal
tasks. They show improvements across various le-
gal tasks by training on a domain-specific corpus.
Zheng et al. (2021) also perform legal domain ada-
pation, using the Harvard Law case corpus, show-
ing better performance in the CaseHOLD multiple-
choice question answering task. Unlike existing
works, we perform domain adaptation parameter-
efficiently, showing similar performance in a few-
shot setting. We compare our approach against
LEGAL-BERT as a strong baseline.
Parameter-efficient Learning
Language mod-
els have scaled to over billions of parameters
(He et al.,2021;Brown et al.,2020), making re-
search memory and storage intensive. Recently,
parameter-efficient training methods—techniques
that focus on tuning a small percentage of the pa-
rameters in a neural network—have been a promi-
nent research topic in natural language processing.
More recently, prefix tuning (Li and Liang,2021)
has attracted much attention due to its simplicity,
ease of implementation, and effectiveness. In this
paper, we use P-Tuning v2 (Liu et al.,2022), which
includes an implementation of prefix tuning.
Previously, Gu et al. (2022) explored improv-
ing prefix tuning’s fewshot performance with pre-
training by rewriting downstream tasks for a multi-
ple choice answering task (in their "unified PPT"),
and synthesizing multiple choice pre-training data
(from OpenWebText). Unlike them, we focus on
domain adaptation and not general pre-training.
We show a much simpler method of prompt pre-
training using the masked language modelling
(MLM) task while preserving the format of down-
stream tasks. Ge et al. (2022) domain adapt contin-
uous prompts (not prefix tuning) to improve perfor-
mance with vision-transformer models for different
image types (e.g., "clipart", "photo", or "product").
Zhang et al. (2021) domain adapt an adapter
(Houlsby et al.,2019), which is another type of
parameter-efficient training method where small
neural networks put between layers of the large
language model are trained. Vu et al. (2022) ex-
plored the transferability of prompts between tasks.
They trained a general prompt for the "prefix LM"
1
https://vectorinstitute.ai/2020/04/02/how-thomson-
reuters-uses-nlp-to-enable-knowledge-workers-to-make-
faster-and-more-accurate-business-decisions/