Parameter-Efficient Legal Domain Adaptation Jonathan Li1 Rohan Bhambhoria12 Xiaodan Zhu12 1Ingenuity Labs Queens University

2025-05-02 0 0 1.67MB 11 页 10玖币
侵权投诉
Parameter-Efficient Legal Domain Adaptation
Jonathan Li1, Rohan Bhambhoria1,2, Xiaodan Zhu1,2
1Ingenuity Labs, Queen’s University
2Department of Electrical and Computer Engineering, Queen’s University
{jxl, r.bhambhoria, xiaodan.zhu}queensu.ca
Abstract
Seeking legal advice is often expensive. Re-
cent advancements in machine learning for
solving complex problems can be leveraged
to help make legal services more accessible
to the public. However, real-life applica-
tions encounter significant challenges. State-
of-the-art language models are growing in-
creasingly large, making parameter-efficient
learning increasingly important. Unfortu-
nately, parameter-efficient methods perform
poorly with small amounts of data (Gu et al.,
2022), which are common in the legal domain
(where data labelling costs are high). To ad-
dress these challenges, we propose parameter-
efficient legal domain adaptation, which uses
vast unsupervised legal data from public le-
gal forums to perform legal pre-training. This
method exceeds or matches the fewshot per-
formance of existing models such as LEGAL-
BERT (Chalkidis et al.,2020) on various legal
tasks while tuning only approximately 0.1% of
model parameters. Additionally, we show that
our method can achieve calibration compara-
ble to existing methods across several tasks.
To the best of our knowledge, this work is
among the first to explore parameter-efficient
methods of tuning language models in the le-
gal domain.
1 Introduction
Seeking legal advice from lawyers can be expen-
sive. However, a machine learning system that
can help answer legal questions could greatly aid
laypersons in making informed legal decisions. Ex-
isting legal forums, such as Legal Advice Reddit
and Law Stack Exchange, are valuable data sources
for various legal tasks. On one hand, they provide
good sources of labelled data, such as mapping
legal questions to their areas of law (for classifi-
cation), as shown in Figure 1. On the other hand,
they contain hundreds of thousands of legal ques-
tions that can be leveraged for domain adaptation.
Furthermore, questions on these forums can serve
Figure 1: Example classification task using legal ques-
tions from Legal Advice Subreddit (top) and Law Stack
Exchange (bottom). Reddit data is generally more in-
formal than Stack Exchange.
as a starting point for tasks that do not have labels
found directly in the dataset, such as classifying the
severity of a legal question. In this paper, we show
that this vast unlabeled corpus can improve perfor-
mance on question classification, opening up the
possibility of studying other tasks on these public
legal forums.
In the past few years, large language models
have shown effectiveness in legal tasks (Chalkidis
et al.,2022). A widespread method used to train
these models is finetuning. Although finetuning is
very effective, it is prohibitively expensive; train-
ing all the parameters requires large amounts of
memory and requires a full copy of the language
model to be saved for each task. Recently, prefix
tuning (Li and Liang,2021;Liu et al.,2022) has
shown great promise by tuning under 1% of the
parameters and still achieving comparable perfor-
mance to finetuning. Unfortunately, prefix tuning
performs poorly in low-data (i.e., fewshot) settings
arXiv:2210.13712v2 [cs.CL] 4 Nov 2022
(Gu et al.,2022), which are common in the legal
domain. Conveniently, domain adaptation using
large public datasets is an ideal setting for the legal
domain with abundant unlabelled data (from pub-
lic forums) and limited labelled data. To this end,
we introduce prefix domain adaptation, which per-
forms domain adaptation for prompt tuning to im-
prove fewshot performance on various legal tasks.
Overall, our main contributions are as follows:
We introduce prefix adaptation, a method
of domain adaptation using a prompt-based
learning approach.
We show empirically that performance and
calibration of prefix adaptation matches or
exceeds LEGAL-BERT in fewshot settings
while only tuning approximately 0.1% of the
model parameters.
We contribute two new datasets to facilitate
different legal NLP tasks on the questions
asked by laypersons, towards the ultimate ob-
jective of helping make legal services more
accessible to the public.
2 Related Works
Forums-based Datasets
Public forums have
been used extensively as sources of data for ma-
chine learning. Sites like Stack Overflow and
Quora have been used for duplicate question de-
tection (Wang et al.,2020;Sharma et al.,2019).
Additionally, many prior works have used posts
from specific sub-communities (called a "subred-
dit") on Reddit for NLP tasks, likely due to the
diversity of communities and large amount of data
provided. Barnes et al. (2021) used a large number
of internet memes from multiple meme-related sub-
reddits to predict how likely a meme is to be popu-
lar. Other works, such as Basaldella et al. (2020),
label posts from biomedical subreddits for biomed-
ical entity linking. Similar to the legal judgement
prediction task, Lourie et al. (2021) suggest using
"crowdsourced data" from Reddit to perform eth-
ical judgement prediction; that is, they use votes
from the "r/AmITheAsshole" subreddit to classify
who is "in the wrong" for a given real-life anecdote.
We explore using data from Stack Exchange and
Reddit, which has been vastly underexplored in
previous works for the legal domain.
Full Domain Adaptation
Previous works such
as BioBERT (Lee et al.,2019) and SciBERT (Belt-
agy et al.,2019) have shown positive results while
domain adapting models. In the industry, compa-
nies often use full domain adaptation for legal appli-
cations
1
.Chalkidis et al. (2020) introduce LEGAL-
BERT, a BERT-like model domain adapted for legal
tasks. They show improvements across various le-
gal tasks by training on a domain-specific corpus.
Zheng et al. (2021) also perform legal domain ada-
pation, using the Harvard Law case corpus, show-
ing better performance in the CaseHOLD multiple-
choice question answering task. Unlike existing
works, we perform domain adaptation parameter-
efficiently, showing similar performance in a few-
shot setting. We compare our approach against
LEGAL-BERT as a strong baseline.
Parameter-efficient Learning
Language mod-
els have scaled to over billions of parameters
(He et al.,2021;Brown et al.,2020), making re-
search memory and storage intensive. Recently,
parameter-efficient training methods—techniques
that focus on tuning a small percentage of the pa-
rameters in a neural network—have been a promi-
nent research topic in natural language processing.
More recently, prefix tuning (Li and Liang,2021)
has attracted much attention due to its simplicity,
ease of implementation, and effectiveness. In this
paper, we use P-Tuning v2 (Liu et al.,2022), which
includes an implementation of prefix tuning.
Previously, Gu et al. (2022) explored improv-
ing prefix tuning’s fewshot performance with pre-
training by rewriting downstream tasks for a multi-
ple choice answering task (in their "unified PPT"),
and synthesizing multiple choice pre-training data
(from OpenWebText). Unlike them, we focus on
domain adaptation and not general pre-training.
We show a much simpler method of prompt pre-
training using the masked language modelling
(MLM) task while preserving the format of down-
stream tasks. Ge et al. (2022) domain adapt contin-
uous prompts (not prefix tuning) to improve perfor-
mance with vision-transformer models for different
image types (e.g., "clipart", "photo", or "product").
Zhang et al. (2021) domain adapt an adapter
(Houlsby et al.,2019), which is another type of
parameter-efficient training method where small
neural networks put between layers of the large
language model are trained. Vu et al. (2022) ex-
plored the transferability of prompts between tasks.
They trained a general prompt for the "prefix LM"
1
https://vectorinstitute.ai/2020/04/02/how-thomson-
reuters-uses-nlp-to-enable-knowledge-workers-to-make-
faster-and-more-accurate-business-decisions/
(Raffel et al.,2020) objective on the Colossal Clean
Crawled Corpus (Raffel et al.,2020). They do not
study the efficacy of their general-purpose prompt
in fewshot scenarios. Though we use a similar un-
supervised language modelling task (Devlin et al.,
2019), we aim to train a domain adapted prompt
and not a general-purpose prompt.
3 Background
Legal Forums
Seeking legal advice from a
lawyer can be incredibly expensive. However, pub-
lic legal forums are incredibly accessible to layper-
sons to ask legal questions. One popular commu-
nity is the Legal Advice Reddit community (2M+
members), where users can freely ask personal le-
gal questions. Typically, the questions asked on the
Legal Advice Subreddit are written informally and
receive informal answers. Another forum is the
Law Stack Exchange, a community for questions
about the law. Questions are more formal than on
Reddit. Additionally, users are not allowed to ask
about a specific case and must ask about law more
hypothetically, as specified in the rules.
In particular, data from the Legal Advice Subred-
dit is especially helpful in training machine learn-
ing models to help laypersons in law, as questions
are in the format and language that regular people
would write in (see Figure 1). We run experiments
on Law Stack Exchange (LSE) for comprehensive-
ness, though we believe that the non-personal na-
ture of LSE data makes it less valuable than Reddit
data in helping laypersons.
Prefix Tuning
As language models grow very
large, storage and memory constraints make train-
ing impractical or very expensive. Deep prefix
tuning addresses these issues by prepending contin-
uous prompts to the transformer. These continuous
prefix prompts, which are prepended to each atten-
tion layer in the model, and a task-specific linear
head (such as a classification head) are trained.
More formally, for each attention layer
Li
(as
per Vaswani et al.,2017) in BERT’s encoder, we
append some trainable prefixes
Pk
(trained key pre-
fix) and
Pv
(trained value prefix) with length
n
to
the key and value matrices for some initial prompts:
Li=Attn(xW (i)
q,
Cat(P(i)
k, xW (i)
k),
Cat(P(i)
v, xW (i)
v))
(1)
With
W(i)
{q,k,v }
representing the respective query,
key, or value matrices for the attention at layer
i
,
and
x
denoting the input to layer
i
. Here, we as-
sume single-headed attention for simplicity. Here,
the
Cat
function concatenates the two matrices
along the dimension corresponding to the sequence
length.
Note that in Equation 1we do not need to left-
pad any query values, as the shape of the query
matrix does not need to match that of the key and
value matrices.
Expected Calibration Error
First suggested in
Pakdaman Naeini et al. (2015) and later used for
neural networks in Guo et al. (2017), expected
calibration error (ECE) can determine how well
a model is calibrated. In other words, ECE eval-
uates how closely a model’s logit weights reflect
the actual accuracy for that prediction. Calibration
is important for two main reasons. First, having
a properly calibrated model reduces misuse of the
model; if output logits accurately reflect their real-
world likelihood, then software systems using such
models can better handle cases where the model
is uncertain. Second, better calibration improves
the interpretability of a model as we can better un-
derstand how confident a model is under different
scenarios (Guo et al.,2017). Bhambhoria et al.
(2022) used ECE in the legal domain, where it is
especially important due the high-stakes nature of
legal decision making.
4 Methods
Here we outline our approach and other baselines
for comparison.
RoBERTA
To establish a baseline, we train
RoBERTa (Liu et al.,2019) for downstream tasks
using full model tuning (referred to as "full finetun-
ing"). In addition to the state of the art performance
that RoBERTa achieves in many general NLP tasks,
it has also shown very strong performance in le-
gal tasks (Shaheen et al.,2020;Bhambhoria et al.,
2022). Unlike some transformer models, RoBERTa
has an encoder-only architecture, and is normally
pre-trained on the masked language modelling task
(Devlin et al.,2019). We evaluate the model on
both of its size variants, RoBERTa-base (approxi-
mately 125M parameters) and RoBERTa-large (ap-
proximately 335M parameters).
LEGAL-BERT
We evaluate the effectiveness
of our approach against LEGAL-BERT, a fully
摘要:

.plpEAaAldpobraAmatAypcadEpamhipUapaadmdmpaupmtatmadupm'upE-udlaptmomrapdipmtu?tmotfmyAm?aautp-.mw?AAmr.maAAl.aauoaAUplaEAmadopcAralarpcpmildEU?aAlpmyamAAlamymw?AAmr.maAAl.aauylDcmli-upE-udlapmDapdipmiau?fi?AAm.?irph-.alpralAAsamycAypcpiAarAa.doaAmADUAm.aAAiaAdrAmapiApmrAEAma.amEpruamAcAplmamyodl.dc...

展开>> 收起<<
Parameter-Efficient Legal Domain Adaptation Jonathan Li1 Rohan Bhambhoria12 Xiaodan Zhu12 1Ingenuity Labs Queens University.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.67MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注