On the Transformation of Latent Space in Fine-Tuned NLP Models WARNING This paper contains model outputs which may be disturbing to the reader Nadir DurraniµHassan SajjadºFahim DalviµFiroj Alamµ

2025-04-29 0 0 6.42MB 22 页 10玖币

侵权投诉

On the Transformation of Latent Space in Fine-Tuned NLP Models

WARNING: This paper contains model outputs which may be disturbing to the reader

Nadir Durrani♢Hassan Sajjad♣∗Fahim Dalvi♢Firoj Alam♢

♢Qatar Computing Research Institute, Hamad Bin Khalifa University, Qatar

♣Faculty of Computer Science, Dalhousie University, Canada

{ndurrani,faimaduddin, ﬁalam}@hbku.edu.qa, hsajjad@dal.ca

Abstract

We study the evolution of latent space in ﬁne-

tuned NLP models. Different from the com-

monly used probing-framework, we opt for

an unsupervised method to analyze represen-

tations. More speciﬁcally, we discover latent

concepts in the representational space using

hierarchical clustering. We then use an align-

ment function to gauge the similarity between

the latent space of a pre-trained model and its

ﬁne-tuned version. We use traditional linguis-

tic concepts to facilitate our understanding and

also study how the model space transforms to-

wards task-speciﬁc information. We perform a

thorough analysis, comparing pre-trained and

ﬁne-tuned models across three models and

three downstream tasks. The notable ﬁnd-

ings of our work are: i) the latent space of

the higher layers evolve towards task-speciﬁc

concepts, ii) whereas the lower layers retain

generic concepts acquired in the pre-trained

model, iii) we discovered that some concepts

in the higher layers acquire polarity towards

the output class, and iv) that these concepts can

be used for generating adversarial triggers.

1 Introduction

The revolution of deep learning models in NLP can

be attributed to transfer learning from pre-trained

language models. Contextualized representations

learned within these models capture rich linguis-

tic knowledge that can be leveraged towards novel

tasks e.g. classiﬁcation of COVID-19 tweets (Alam

et al.,2021;Valdes et al.,2021), disease prediction

(Rasmy et al.,2020) or natural language under-

standing tasks such as SQUAD (Rajpurkar et al.,

2016) and GLUE (Wang et al.,2018).

Despite their success, the opaqueness of deep

neural networks remain a cause of concern and has

spurred a new area of research to analyze these

models. A large body of work analyzed the knowl-

edge learned within representations of pre-trained

∗

This work was carried out while the author was at QCRI.

models (Belinkov et al.,2017;Conneau et al.,2018;

Liu et al.,2019;Tenney et al.,2019;Durrani et al.,

2019;Rogers et al.,2020) and showed the pres-

ence of core-linguistic knowledge in various parts

of the network. Although transfer learning using

pre-trained models has become ubiquitous, very

few papers (Merchant et al.,2020;Mosbach et al.,

2020;Durrani et al.,2021) have analyzed the rep-

resentations of the ﬁne-tuned models. Given their

massive usability, interpreting ﬁne-tuned models

and highlighting task-speciﬁc peculiarities is crit-

ical for their deployment in real-word scenarios,

where it is important to ensure fairness and trust

when applying AI solutions.

In this paper, we focus on analyzing ﬁne-tuned

models and investigate: how does the latent space

evolve in a ﬁne-tuned model? Different from the

commonly used probing-framework of training a

post-hoc classiﬁer (Belinkov et al.,2017;Dalvi

et al.,2019a), we opt for an unsupervised method

to analyze the latent space of pre-trained models.

More speciﬁcally, we cluster contextualized rep-

resentations in high dimensional space using hi-

erarchical clustering and term these clusters as

the Encoded Concepts (Dalvi et al.,2022). We

then analyze how these encoded concepts evolve

as the models are ﬁne-tuned towards a downstream

task. Speciﬁcally, we target the following ques-

tions: i) how do the latent spaces compare between

base

and the ﬁne-tuned models? ii) how does the

presence of core-linguistic concepts change during

transfer learning? and iii) how is the knowledge of

downstream tasks structured in a ﬁne-tuned model?

We use an alignment function (Sajjad et al.,

2022) to compare the concepts encoded in the ﬁne-

tuned models with: i) the concepts encoded in their

pre-trained base models, ii) the human-deﬁned con-

cepts (e.g. parts-of-speech tags or semantic prop-

erties), and iii) the labels of the downstream task

towards which the model is ﬁne-tuned.

We use “base” and “pre-trained” models interchangeably.

arXiv:2210.12696v1 [cs.CL] 23 Oct 2022

Figure 1: Comparing encoded concepts of a model across different layers with: i) the concepts encoded its base

model (dashed lines), ii) human-deﬁned concepts (e.g. POS tags or semantic properties), and iii) task speciﬁc

concepts (e.g. positive or negative sentiment class).

We carried out our study using three pre-trained

transformer language models; BERT (Devlin et al.,

2019), XLM-RoBERTa (Conneau et al.,2020)

and ALBERT (Lan et al.,2019), analyzing how

their representation space evolves as they are ﬁne-

tuned towards the task of Sentiment Analysis (SST-

2, Socher et al.,2013), Natural Language Infer-

ence (MNLI, Williams et al.,2018) and Hate

Speech Detection (HSD, Mathew et al.,2020). Our

analysis yields interesting insights such as:

•

The latent space of the models substantially

evolve from their base versions after ﬁne-

tuning.

•

The latent space representing core-linguistic

concepts is limited to the lower layers in the

ﬁne-tuned models, contrary to the base models

where it is distributed across the network.

•

We found task-speciﬁc polarity concepts in

the higher layers of the Sentiment Analysis

and Hate Speech Detection tasks.

•

These polarized concepts can be used as trig-

gers to generate adversarial examples.

•

Compared to BERT and XLM, the representa-

tional space in ALBERT changes signiﬁcantly

during ﬁne-tuning.

2 Methodology

Our work builds on the Latent Concept Analysis

method (Dalvi et al.,2022) for interpreting repre-

sentational spaces of neural network models. We

cluster contextualized embeddings to discover En-

coded Concepts in the model and study the evo-

lution of the latent space in the ﬁne-tuned model

by aligning the encoded concepts of the ﬁne-tuned

model to: i) their pre-trained version, ii) the human-

deﬁned concepts and iii) the task-speciﬁc concepts

(for the task the pre-trained model is ﬁne-tuned on).

Figure 1presents an overview of our approach. In

the following, we deﬁne the scope of Concept and

discuss each step of our approach in detail.

2.1 Concept

We deﬁne concept as a group of words that are clus-

tered together based on any linguistic relation such

as lexical, semantic, syntactic, morphological etc.

Formally, consider

Ct(n)

as a concept consisting

of a unique set of words

{w1, w2, . . . , wJ}

where

is the number of words in

is a concept

identiﬁer, and

is the concept type which can be

an encoded concept (

), a human-deﬁned concept

(

pos ∶verbs, sem ∶loc, . . .

) and a class-based

concept (sst ∶ +ive, hsd ∶toxic, . . . ).

Encoded Concepts:

Figure 2shows a few ex-

amples of the encoded concepts discovered in the

(a) Nouns ending with “y” (b) Named Entities – TV (c) Racial Slurs

Figure 2: Examples of encoded concepts. The size of a speciﬁc word is based on its frequency in the cluster,

deﬁned by the number of times different contextual representations of a word were grouped in the same cluster.

BERT model, where the concept is deﬁned by a

group based on nouns ending with “y” (Figures 2a)

or a group based on TV related named entities

(Figure 2b). Similarly, Figure 2c is a concept rep-

resenting racial slurs in a BERT model tuned for

Hate Speech Detection (HSD) task. We denote

this concept as

Cec(bert-hsd-layer10-c227)=

{paki, nigger, mudslime, redneck . . . }

, i.e. the

concept was discovered in the layer 10 of the BERT-

HSD model and c227 is the concept number.

Human Concepts:

Each individual tag in

the human-deﬁned concepts such as parts-

of-speech (POS), semantic tagging (SEM)

represents a concept

. For example,

Cpos(JJR)={greener, taller, happier, . . . }

deﬁnes a concept containing comparative adjec-

tives in the POS tagging task,

Csem(M OY )=

{January, F ebruary, . . . , December}

deﬁnes

a concept containing months of the year in the

semantic tagging task.

Task-speciﬁc Concepts:

Another kind of con-

cept that we use in this work is the task-speciﬁc

concepts where the concept represents afﬁnity of

its members with respect to the task labels. Con-

sider a sentiment classiﬁcation task with two labels

“positive” and “negative”. We deﬁne

Csst(+ve)

a concept containing words when they only appear

in sentences that are labeled positive. Similarly, we

deﬁne

Chsd(toxic)

as a concept that contain words

that only appear in the sentences that were marked

as toxic.

2.2 Latent Concept Discovery

A vector representation in the neural network

model is composed of feature attributes of the input

words. We group the encoded vector representa-

tions using a clustering approach discussed below.

The underlying clusters, that we term as the en-

coded concepts, are then matched with the human-

deﬁned concepts using an alignment function.

Formally, consider a pre-trained model

with

layers:

{l1, l2, . . . , lL}

. Given a dataset

{w1, w2, ..., wN}

, we generate feature vectors, a

sequence of latent representations:

−→zl=

{zl

1,...,zl

n}2

by doing a forward-pass on the data

for any given layer

. Our goal is to cluster rep-

resentations

, from task-speciﬁc training data to

obtain encoded concepts.

We use agglomerative hierarchical cluster-

ing (Gowda and Krishna,1978), which assigns

each word to its individual cluster and iteratively

combines the clusters based on Ward’s minimum

variance criterion, using intra-cluster variance. Dis-

tance between two vector representations is cal-

culated with the squared Euclidean distance. The

algorithm terminates when the required

clusters

(i.e. encoded concepts) are formed, where

is a

hyper-parameter. Each encoded concept represents

a latent relationship between the words present in

the cluster.

2.3 Alignment

Once we have obtained a set of encoded concepts

in the base (pre-trained) and ﬁne-tuned models, we

want to align them to study how the latent space

has evolved during transfer learning. Sajjad et al.

(2022) calibrated representational space in trans-

former models with different linguistic concepts to

generate their explanations. We extend their align-

ment function to align latent spaces within a model

and its ﬁne-tuned version. Given a concept

C1(n)

with

number of words, we consider it to be

aligned (

Λθ

) with concept

C2(m)

, if they satisfy

Each element

denotes contextualized word representa-

tion for the corresponding word wiin the sentence.

the following constraint:

Λθ(C1, C2)=1,if ∑w∈C1∑w′∈C2δ(w,w′)

J≥θ

0,otherwise,

(1)

where Kronecker function δ(w, w′)is deﬁned as

δ(w, w′)=1,if w=w′

0,otherwise

Human-deﬁned Concepts

The function can be

used to draw a mapping between concepts differ-

ent types of discussed in Section 2.1. To inves-

tigate how the transfer learning impacts human-

deﬁned knowledge, we align the latent space to

the human-deﬁned concepts such as

Cpos(N N)

Cchunking (P P ).

Task Concepts

Lastly, we compare the encoded

concepts with the task-speciﬁc concepts. Here,

we use the alignment function to mark afﬁnity of

an encoded concept. For the Sentiment Analy-

sis task, let a task-speciﬁc concept

Csst(+ve)=

{w+

1, w+

2, . . . , w+

deﬁned by a set words that

only appeared in positively labeled sentences

S={s+

1, s+

2, . . . , s+

. We call a concept

Cec =

{x1, x2, . . . , xn}

aligned to

Csst(+ve)

and mark

it positive if all words (

≥θ

) in the encoded con-

cept appeared in positively labeled sentences. Note

that here a word represents an instance based on

its contextualized embedding. We similarly align

Cec

with

Csst(−ve)

to discover negative polarity

concepts.

3 Experimental Setup

3.1 Models and Tasks

We experimented with three popular transformer

architectures namely: BERT-base-cased (Devlin

et al.,2019), XLM-RoBERTa (Conneau et al.,

2020) and ALBERT (v2) (Lan et al.,2019) using

the base versions (13 layers and 768 dimensions).

To carryout the analysis, we ﬁne-tuned the base

models for the tasks of sentiment analysis using

the Stanford sentiment treebank dataset (SST-2,

Socher et al.,2013), natural language inference

(MNLI, Williams et al.,2018) and the Hate Speech

Detection task (HSD, Mathew et al.,2020).

3.2 Clustering

We used the task-speciﬁc training data for cluster-

ing using both the base (pre-trained) and ﬁne-tuned

models. This enables to accurately compare the

representational space generated by the same data.

We do a forward-pass over both base and ﬁne-tuned

models to generate contextualized feature vectors

of words in the data and run agglomerative hier-

archical clustering over these vectors. We do this

for every layer independently, obtaining

clus-

ters (a.k.a encoded concepts) for both base and

ﬁne-tuned models. We used

K=600

for our

experiments.

We carried out preliminary experi-

ments (all the BERT-base-cased experiments) using

K=200,400,...,1000

and all our experiments

using

K=600

and

K=1000

. We found that our

results are not sensitive to these parameters and

the patterns are consistent with different cluster

settings (please see Appendix B).

3.3 Human-deﬁned Concepts

We experimented with traditional tasks that are de-

ﬁned to capture core-linguistic concepts such as

word morphology: part-of-speech tagging using

the Penn TreeBank data (Marcus et al.,1993), syn-

tax: chunking tagging using CoNLL 2000 shared

task dataset (Tjong Kim Sang and Buchholz,2000),

CCG super tagging using CCG Tree-bank (Hock-

enmaier,2006) and semantic tagging using the Par-

allel Meaning Bank data (Abzianidze et al.,2017).

We trained BERT-based sequence taggers for each

of the above tasks and annotate the task-speciﬁc

training data. Each core-linguistic task serves as

a human-deﬁned concept that is aligned with en-

coded concepts to measure the representation of

linguistic knowledge in the latent space. Appendix

Apresents the details on human deﬁned concepts,

data stats and tagger accuracy.

3.4 Alignment Threshold

We consider an encoded concept to be aligned with

another concept, if it has at least 95%

match in

the number of words. We only consider concepts

that have more than 5 word-types. Note that the

encoded concepts are based on contextualized em-

We use NeuroX toolkit (Dalvi et al.,2019b) to extract

contextualized representations.

We experimented with ELbow (Thorndike,1953) and

Silhouette (Rousseeuw,1987) methods to ﬁnd the optimal

number of clusters, but could not observe a reliable pattern.

Selecting between

600 −1000

clusters gives the right bal-

ance to avoid over-clustering (many small clusters) and under-

clustering (a few large clusters).

Using an overlap of

≥95%

provides a very tight thresh-

old, allowing only

of noise. Our patterns were consistent

at lower and higher thresholds.

Figure 3: Comparing encoded concepts of base models with their SST ﬁne-tuned versions. X-axis = base model,

Y-axis = ﬁne-tuned model. Each cell in the matrix represents a percentage (aligned concepts/total concepts in a

layer) between the base and ﬁne-tuned models. Darker color means higher percentage. Detailed plots with actual

overlap values are provided in the Appendix.

bedding where a word has different embeddings

depending on the context.

4 Analysis

Language model pre-training has been shown to

capture rich linguistic features (Tenney et al.,2019;

Belinkov et al.,2020) that are redundantly dis-

tributed across the network (Dalvi et al.,2020;

Durrani et al.,2020). We analyze how the repre-

sentational space transforms when tuning towards

a downstream task: i) how much knowledge is car-

ried forward and ii) how it is redistributed, using

our alignment framework.

4.1 Comparing Base and Fine-tuned Models

How do the latent spaces compare between base

and ﬁne-tuned models? We measure the overlap

between the concepts encoded in the different lay-

ers of the base and ﬁne-tuned models to guage the

extent of transformation. Figures 3compares the

concepts in the base BERT, XLM-RoBERTa and

ALBERT models versus their ﬁne-tuned variants

on the SST-2 task.

We observe a high overlap

in concepts in the lower layers of the model that

starts decreasing as we go deeper in the network,

completely diminishing towards the end. We con-

jecture that

the lower layers of the model retain

generic language concepts learned in the base

model, where as the higher layers are now learn-

ing task-speciﬁc concepts.7

Note, however, that

the lower layers also do not completely align be-

tween the models, which shows that all the lay-

6Please see all results in Appendix C.1.

Our next results comparing the latent space with human-

deﬁned language concepts (Section 4.2) and the task speciﬁc

concepts (Section 4.3) reinforces this hypothesis.

ers go through substantial changes during transfer

learning.

Comparing Architectures:

The spread of the

shaded area along the x-axis, particularly in XLM-

R, reﬂects that some higher layer latent concepts in

the base model have shifted towards the lower lay-

ers of the ﬁne-tuned model. The latent space in the

higher layers now reﬂect task-speciﬁc knowledge

which was not present in the base model. ALBERT

shows a strikingly different pattern with only the

ﬁrst 2-3 layers exhibiting an overlap with base con-

cepts. This could be attributed to the fact that AL-

BERT shares parameters across layers while the

other models have separate parameters for every

layer. ALBERT has less of a luxury to preserve

previous knowledge and therefore its space trans-

forms signiﬁcantly towards the downstream task.

Notice that the overlap is comparatively smaller

(38% vs. 52% and 46% compared to BERT and

XLM-R respectively), even in the embedding layer,

where the words are primarily grouped based on

lexical similarity.

4.2 Presence of Linguistic Concepts in the

Latent Space

How does the presence of core-linguistic concepts

change during transfer learning? To validate our

hypothesis that generic language concepts are now

predominantly retained in the lower half, we an-

alyze how the linguistic concepts spread across

the layers in the pre-trained and ﬁne-tuned models

by aligning the latent space to the human-deﬁned

concepts. Figure 4shows that the latent space of

the models capture POS concepts (e.g., determin-

ers, past-tense verbs, superlative adjectives etc.)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OntheTransformationofLatentSpaceinFine-TunedNLPModelsWARNING:ThispapercontainsmodeloutputswhichmaybedisturbingtothereaderNadirDurraniµHassanSajjadº˜FahimDalviµFirojAlamµµQatarComputingResearchInstitute,HamadBinKhalifaUniversity,QatarºFacultyofComputerScience,DalhousieUniversity,Canada{ndurrani,faima...

展开>> 收起<<

On the Transformation of Latent Space in Fine-Tuned NLP Models WARNING This paper contains model outputs which may be disturbing to the reader Nadir DurraniµHassan SajjadºFahim DalviµFiroj Alamµ.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

On the Transformation of Latent Space in Fine-Tuned NLP Models WARNING This paper contains model outputs which may be disturbing to the reader Nadir DurraniµHassan SajjadºFahim DalviµFiroj Alamµ

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: