Parameter-Efﬁcient Legal Domain Adaptation Jonathan Li1 Rohan Bhambhoria12 Xiaodan Zhu12 1Ingenuity Labs Queens University

2025-05-02 0 0 1.67MB 11 页 10玖币

侵权投诉

Parameter-Efﬁcient Legal Domain Adaptation

Jonathan Li1, Rohan Bhambhoria1,2, Xiaodan Zhu1,2

1Ingenuity Labs, Queen’s University

2Department of Electrical and Computer Engineering, Queen’s University

{jxl, r.bhambhoria, xiaodan.zhu}queensu.ca

Abstract

Seeking legal advice is often expensive. Re-

cent advancements in machine learning for

solving complex problems can be leveraged

to help make legal services more accessible

to the public. However, real-life applica-

tions encounter signiﬁcant challenges. State-

of-the-art language models are growing in-

creasingly large, making parameter-efﬁcient

learning increasingly important. Unfortu-

nately, parameter-efﬁcient methods perform

poorly with small amounts of data (Gu et al.,

2022), which are common in the legal domain

(where data labelling costs are high). To ad-

dress these challenges, we propose parameter-

efﬁcient legal domain adaptation, which uses

vast unsupervised legal data from public le-

gal forums to perform legal pre-training. This

method exceeds or matches the fewshot per-

formance of existing models such as LEGAL-

BERT (Chalkidis et al.,2020) on various legal

tasks while tuning only approximately 0.1% of

model parameters. Additionally, we show that

our method can achieve calibration compara-

ble to existing methods across several tasks.

To the best of our knowledge, this work is

among the ﬁrst to explore parameter-efﬁcient

methods of tuning language models in the le-

gal domain.

1 Introduction

Seeking legal advice from lawyers can be expen-

sive. However, a machine learning system that

can help answer legal questions could greatly aid

laypersons in making informed legal decisions. Ex-

isting legal forums, such as Legal Advice Reddit

and Law Stack Exchange, are valuable data sources

for various legal tasks. On one hand, they provide

good sources of labelled data, such as mapping

legal questions to their areas of law (for classiﬁ-

cation), as shown in Figure 1. On the other hand,

they contain hundreds of thousands of legal ques-

tions that can be leveraged for domain adaptation.

Furthermore, questions on these forums can serve

Figure 1: Example classiﬁcation task using legal ques-

tions from Legal Advice Subreddit (top) and Law Stack

Exchange (bottom). Reddit data is generally more in-

formal than Stack Exchange.

as a starting point for tasks that do not have labels

found directly in the dataset, such as classifying the

severity of a legal question. In this paper, we show

that this vast unlabeled corpus can improve perfor-

mance on question classiﬁcation, opening up the

possibility of studying other tasks on these public

legal forums.

In the past few years, large language models

have shown effectiveness in legal tasks (Chalkidis

et al.,2022). A widespread method used to train

these models is ﬁnetuning. Although ﬁnetuning is

very effective, it is prohibitively expensive; train-

ing all the parameters requires large amounts of

memory and requires a full copy of the language

model to be saved for each task. Recently, preﬁx

tuning (Li and Liang,2021;Liu et al.,2022) has

shown great promise by tuning under 1% of the

parameters and still achieving comparable perfor-

mance to ﬁnetuning. Unfortunately, preﬁx tuning

performs poorly in low-data (i.e., fewshot) settings

arXiv:2210.13712v2 [cs.CL] 4 Nov 2022

(Gu et al.,2022), which are common in the legal

domain. Conveniently, domain adaptation using

large public datasets is an ideal setting for the legal

domain with abundant unlabelled data (from pub-

lic forums) and limited labelled data. To this end,

we introduce preﬁx domain adaptation, which per-

forms domain adaptation for prompt tuning to im-

prove fewshot performance on various legal tasks.

Overall, our main contributions are as follows:

•

We introduce preﬁx adaptation, a method

of domain adaptation using a prompt-based

learning approach.

•

We show empirically that performance and

calibration of preﬁx adaptation matches or

exceeds LEGAL-BERT in fewshot settings

while only tuning approximately 0.1% of the

model parameters.

•

We contribute two new datasets to facilitate

different legal NLP tasks on the questions

asked by laypersons, towards the ultimate ob-

jective of helping make legal services more

accessible to the public.

2 Related Works

Forums-based Datasets

Public forums have

been used extensively as sources of data for ma-

chine learning. Sites like Stack Overﬂow and

Quora have been used for duplicate question de-

tection (Wang et al.,2020;Sharma et al.,2019).

Additionally, many prior works have used posts

from speciﬁc sub-communities (called a "subred-

dit") on Reddit for NLP tasks, likely due to the

diversity of communities and large amount of data

provided. Barnes et al. (2021) used a large number

of internet memes from multiple meme-related sub-

reddits to predict how likely a meme is to be popu-

lar. Other works, such as Basaldella et al. (2020),

label posts from biomedical subreddits for biomed-

ical entity linking. Similar to the legal judgement

prediction task, Lourie et al. (2021) suggest using

"crowdsourced data" from Reddit to perform eth-

ical judgement prediction; that is, they use votes

from the "r/AmITheAsshole" subreddit to classify

who is "in the wrong" for a given real-life anecdote.

We explore using data from Stack Exchange and

Reddit, which has been vastly underexplored in

previous works for the legal domain.

Full Domain Adaptation

Previous works such

as BioBERT (Lee et al.,2019) and SciBERT (Belt-

agy et al.,2019) have shown positive results while

domain adapting models. In the industry, compa-

nies often use full domain adaptation for legal appli-

cations

.Chalkidis et al. (2020) introduce LEGAL-

BERT, a BERT-like model domain adapted for legal

tasks. They show improvements across various le-

gal tasks by training on a domain-speciﬁc corpus.

Zheng et al. (2021) also perform legal domain ada-

pation, using the Harvard Law case corpus, show-

ing better performance in the CaseHOLD multiple-

choice question answering task. Unlike existing

works, we perform domain adaptation parameter-

efﬁciently, showing similar performance in a few-

shot setting. We compare our approach against

LEGAL-BERT as a strong baseline.

Parameter-efﬁcient Learning

Language mod-

els have scaled to over billions of parameters

(He et al.,2021;Brown et al.,2020), making re-

search memory and storage intensive. Recently,

parameter-efﬁcient training methods—techniques

that focus on tuning a small percentage of the pa-

rameters in a neural network—have been a promi-

nent research topic in natural language processing.

More recently, preﬁx tuning (Li and Liang,2021)

has attracted much attention due to its simplicity,

ease of implementation, and effectiveness. In this

paper, we use P-Tuning v2 (Liu et al.,2022), which

includes an implementation of preﬁx tuning.

Previously, Gu et al. (2022) explored improv-

ing preﬁx tuning’s fewshot performance with pre-

training by rewriting downstream tasks for a multi-

ple choice answering task (in their "uniﬁed PPT"),

and synthesizing multiple choice pre-training data

(from OpenWebText). Unlike them, we focus on

domain adaptation and not general pre-training.

We show a much simpler method of prompt pre-

training using the masked language modelling

(MLM) task while preserving the format of down-

stream tasks. Ge et al. (2022) domain adapt contin-

uous prompts (not preﬁx tuning) to improve perfor-

mance with vision-transformer models for different

image types (e.g., "clipart", "photo", or "product").

Zhang et al. (2021) domain adapt an adapter

(Houlsby et al.,2019), which is another type of

parameter-efﬁcient training method where small

neural networks put between layers of the large

language model are trained. Vu et al. (2022) ex-

plored the transferability of prompts between tasks.

They trained a general prompt for the "preﬁx LM"

https://vectorinstitute.ai/2020/04/02/how-thomson-

reuters-uses-nlp-to-enable-knowledge-workers-to-make-

faster-and-more-accurate-business-decisions/

(Raffel et al.,2020) objective on the Colossal Clean

Crawled Corpus (Raffel et al.,2020). They do not

study the efﬁcacy of their general-purpose prompt

in fewshot scenarios. Though we use a similar un-

supervised language modelling task (Devlin et al.,

2019), we aim to train a domain adapted prompt

and not a general-purpose prompt.

3 Background

Legal Forums

Seeking legal advice from a

lawyer can be incredibly expensive. However, pub-

lic legal forums are incredibly accessible to layper-

sons to ask legal questions. One popular commu-

nity is the Legal Advice Reddit community (2M+

members), where users can freely ask personal le-

gal questions. Typically, the questions asked on the

Legal Advice Subreddit are written informally and

receive informal answers. Another forum is the

Law Stack Exchange, a community for questions

about the law. Questions are more formal than on

Reddit. Additionally, users are not allowed to ask

about a speciﬁc case and must ask about law more

hypothetically, as speciﬁed in the rules.

In particular, data from the Legal Advice Subred-

dit is especially helpful in training machine learn-

ing models to help laypersons in law, as questions

are in the format and language that regular people

would write in (see Figure 1). We run experiments

on Law Stack Exchange (LSE) for comprehensive-

ness, though we believe that the non-personal na-

ture of LSE data makes it less valuable than Reddit

data in helping laypersons.

Preﬁx Tuning

As language models grow very

large, storage and memory constraints make train-

ing impractical or very expensive. Deep preﬁx

tuning addresses these issues by prepending contin-

uous prompts to the transformer. These continuous

preﬁx prompts, which are prepended to each atten-

tion layer in the model, and a task-speciﬁc linear

head (such as a classiﬁcation head) are trained.

More formally, for each attention layer

(as

per Vaswani et al.,2017) in BERT’s encoder, we

append some trainable preﬁxes

(trained key pre-

ﬁx) and

(trained value preﬁx) with length

the key and value matrices for some initial prompts:

Li=Attn(xW (i)

Cat(P(i)

k, xW (i)

k),

Cat(P(i)

v, xW (i)

v))

(1)

With

W(i)

{q,k,v }

representing the respective query,

key, or value matrices for the attention at layer

and

denoting the input to layer

. Here, we as-

sume single-headed attention for simplicity. Here,

the

Cat

function concatenates the two matrices

along the dimension corresponding to the sequence

length.

Note that in Equation 1we do not need to left-

pad any query values, as the shape of the query

matrix does not need to match that of the key and

value matrices.

Expected Calibration Error

First suggested in

Pakdaman Naeini et al. (2015) and later used for

neural networks in Guo et al. (2017), expected

calibration error (ECE) can determine how well

a model is calibrated. In other words, ECE eval-

uates how closely a model’s logit weights reﬂect

the actual accuracy for that prediction. Calibration

is important for two main reasons. First, having

a properly calibrated model reduces misuse of the

model; if output logits accurately reﬂect their real-

world likelihood, then software systems using such

models can better handle cases where the model

is uncertain. Second, better calibration improves

the interpretability of a model as we can better un-

derstand how conﬁdent a model is under different

scenarios (Guo et al.,2017). Bhambhoria et al.

(2022) used ECE in the legal domain, where it is

especially important due the high-stakes nature of

legal decision making.

4 Methods

Here we outline our approach and other baselines

for comparison.

RoBERTA

To establish a baseline, we train

RoBERTa (Liu et al.,2019) for downstream tasks

using full model tuning (referred to as "full ﬁnetun-

ing"). In addition to the state of the art performance

that RoBERTa achieves in many general NLP tasks,

it has also shown very strong performance in le-

gal tasks (Shaheen et al.,2020;Bhambhoria et al.,

2022). Unlike some transformer models, RoBERTa

has an encoder-only architecture, and is normally

pre-trained on the masked language modelling task

(Devlin et al.,2019). We evaluate the model on

both of its size variants, RoBERTa-base (approxi-

mately 125M parameters) and RoBERTa-large (ap-

proximately 335M parameters).

LEGAL-BERT

We evaluate the effectiveness

of our approach against LEGAL-BERT, a fully

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

.plpEAaAldpobraAmatAypcadEpamhipUapaadmdmpaupmtatmadupm'upE-udlaptmomrapdipmtu?tmotfmyAm?aautp-.mw?AAmr.maAAl.aauoaAUplaEAmadopcAralarpcpmildEU?aAlpmyamAAlamymw?AAmr.maAAl.aauylDcmli-upE-udlapmDapdipmiau?fi?AAm.?irph-.alpralAAsamycAypcpiAarAa.doaAmADUAm.aAAiaAdrAmapiApmrAEAma.amEpruamAcAplmamyodl.dc...

展开>> 收起<<

Parameter-Efﬁcient Legal Domain Adaptation Jonathan Li1 Rohan Bhambhoria12 Xiaodan Zhu12 1Ingenuity Labs Queens University.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Parameter-Efﬁcient Legal Domain Adaptation Jonathan Li1 Rohan Bhambhoria12 Xiaodan Zhu12 1Ingenuity Labs Queens University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: