On the Transformation of Latent Space in Fine-Tuned NLP Models WARNING This paper contains model outputs which may be disturbing to the reader Nadir DurraniµHassan SajjadºFahim DalviµFiroj Alamµ

2025-04-29 0 0 6.42MB 22 页 10玖币
侵权投诉
On the Transformation of Latent Space in Fine-Tuned NLP Models
WARNING: This paper contains model outputs which may be disturbing to the reader
Nadir DurraniHassan SajjadFahim DalviFiroj Alam
Qatar Computing Research Institute, Hamad Bin Khalifa University, Qatar
Faculty of Computer Science, Dalhousie University, Canada
{ndurrani,faimaduddin, fialam}@hbku.edu.qa, hsajjad@dal.ca
Abstract
We study the evolution of latent space in fine-
tuned NLP models. Different from the com-
monly used probing-framework, we opt for
an unsupervised method to analyze represen-
tations. More specifically, we discover latent
concepts in the representational space using
hierarchical clustering. We then use an align-
ment function to gauge the similarity between
the latent space of a pre-trained model and its
fine-tuned version. We use traditional linguis-
tic concepts to facilitate our understanding and
also study how the model space transforms to-
wards task-specific information. We perform a
thorough analysis, comparing pre-trained and
fine-tuned models across three models and
three downstream tasks. The notable find-
ings of our work are: i) the latent space of
the higher layers evolve towards task-specific
concepts, ii) whereas the lower layers retain
generic concepts acquired in the pre-trained
model, iii) we discovered that some concepts
in the higher layers acquire polarity towards
the output class, and iv) that these concepts can
be used for generating adversarial triggers.
1 Introduction
The revolution of deep learning models in NLP can
be attributed to transfer learning from pre-trained
language models. Contextualized representations
learned within these models capture rich linguis-
tic knowledge that can be leveraged towards novel
tasks e.g. classification of COVID-19 tweets (Alam
et al.,2021;Valdes et al.,2021), disease prediction
(Rasmy et al.,2020) or natural language under-
standing tasks such as SQUAD (Rajpurkar et al.,
2016) and GLUE (Wang et al.,2018).
Despite their success, the opaqueness of deep
neural networks remain a cause of concern and has
spurred a new area of research to analyze these
models. A large body of work analyzed the knowl-
edge learned within representations of pre-trained
This work was carried out while the author was at QCRI.
models (Belinkov et al.,2017;Conneau et al.,2018;
Liu et al.,2019;Tenney et al.,2019;Durrani et al.,
2019;Rogers et al.,2020) and showed the pres-
ence of core-linguistic knowledge in various parts
of the network. Although transfer learning using
pre-trained models has become ubiquitous, very
few papers (Merchant et al.,2020;Mosbach et al.,
2020;Durrani et al.,2021) have analyzed the rep-
resentations of the fine-tuned models. Given their
massive usability, interpreting fine-tuned models
and highlighting task-specific peculiarities is crit-
ical for their deployment in real-word scenarios,
where it is important to ensure fairness and trust
when applying AI solutions.
In this paper, we focus on analyzing fine-tuned
models and investigate: how does the latent space
evolve in a fine-tuned model? Different from the
commonly used probing-framework of training a
post-hoc classifier (Belinkov et al.,2017;Dalvi
et al.,2019a), we opt for an unsupervised method
to analyze the latent space of pre-trained models.
More specifically, we cluster contextualized rep-
resentations in high dimensional space using hi-
erarchical clustering and term these clusters as
the Encoded Concepts (Dalvi et al.,2022). We
then analyze how these encoded concepts evolve
as the models are fine-tuned towards a downstream
task. Specifically, we target the following ques-
tions: i) how do the latent spaces compare between
base
1
and the fine-tuned models? ii) how does the
presence of core-linguistic concepts change during
transfer learning? and iii) how is the knowledge of
downstream tasks structured in a fine-tuned model?
We use an alignment function (Sajjad et al.,
2022) to compare the concepts encoded in the fine-
tuned models with: i) the concepts encoded in their
pre-trained base models, ii) the human-defined con-
cepts (e.g. parts-of-speech tags or semantic prop-
erties), and iii) the labels of the downstream task
towards which the model is fine-tuned.
1
We use “base” and “pre-trained” models interchangeably.
arXiv:2210.12696v1 [cs.CL] 23 Oct 2022
Figure 1: Comparing encoded concepts of a model across different layers with: i) the concepts encoded its base
model (dashed lines), ii) human-defined concepts (e.g. POS tags or semantic properties), and iii) task specific
concepts (e.g. positive or negative sentiment class).
We carried out our study using three pre-trained
transformer language models; BERT (Devlin et al.,
2019), XLM-RoBERTa (Conneau et al.,2020)
and ALBERT (Lan et al.,2019), analyzing how
their representation space evolves as they are fine-
tuned towards the task of Sentiment Analysis (SST-
2, Socher et al.,2013), Natural Language Infer-
ence (MNLI, Williams et al.,2018) and Hate
Speech Detection (HSD, Mathew et al.,2020). Our
analysis yields interesting insights such as:
The latent space of the models substantially
evolve from their base versions after fine-
tuning.
The latent space representing core-linguistic
concepts is limited to the lower layers in the
fine-tuned models, contrary to the base models
where it is distributed across the network.
We found task-specific polarity concepts in
the higher layers of the Sentiment Analysis
and Hate Speech Detection tasks.
These polarized concepts can be used as trig-
gers to generate adversarial examples.
Compared to BERT and XLM, the representa-
tional space in ALBERT changes significantly
during fine-tuning.
2 Methodology
Our work builds on the Latent Concept Analysis
method (Dalvi et al.,2022) for interpreting repre-
sentational spaces of neural network models. We
cluster contextualized embeddings to discover En-
coded Concepts in the model and study the evo-
lution of the latent space in the fine-tuned model
by aligning the encoded concepts of the fine-tuned
model to: i) their pre-trained version, ii) the human-
defined concepts and iii) the task-specific concepts
(for the task the pre-trained model is fine-tuned on).
Figure 1presents an overview of our approach. In
the following, we define the scope of Concept and
discuss each step of our approach in detail.
2.1 Concept
We define concept as a group of words that are clus-
tered together based on any linguistic relation such
as lexical, semantic, syntactic, morphological etc.
Formally, consider
Ct(n)
as a concept consisting
of a unique set of words
{w1, w2, . . . , wJ}
where
J
is the number of words in
Ct
,
n
is a concept
identifier, and
t
is the concept type which can be
an encoded concept (
ec
), a human-defined concept
(
pos verbs, sem loc, . . .
) and a class-based
concept (sst ∶ +ive, hsd toxic, . . . ).
Encoded Concepts:
Figure 2shows a few ex-
amples of the encoded concepts discovered in the
(a) Nouns ending with “y” (b) Named Entities – TV (c) Racial Slurs
Figure 2: Examples of encoded concepts. The size of a specific word is based on its frequency in the cluster,
defined by the number of times different contextual representations of a word were grouped in the same cluster.
BERT model, where the concept is defined by a
group based on nouns ending with “y” (Figures 2a)
or a group based on TV related named entities
(Figure 2b). Similarly, Figure 2c is a concept rep-
resenting racial slurs in a BERT model tuned for
Hate Speech Detection (HSD) task. We denote
this concept as
Cec(bert-hsd-layer10-c227)=
{paki, nigger, mudslime, redneck . . . }
, i.e. the
concept was discovered in the layer 10 of the BERT-
HSD model and c227 is the concept number.
Human Concepts:
Each individual tag in
the human-defined concepts such as parts-
of-speech (POS), semantic tagging (SEM)
represents a concept
C
. For example,
Cpos(JJR)={greener, taller, happier, . . . }
defines a concept containing comparative adjec-
tives in the POS tagging task,
Csem(M OY )=
{January, F ebruary, . . . , December}
defines
a concept containing months of the year in the
semantic tagging task.
Task-specific Concepts:
Another kind of con-
cept that we use in this work is the task-specific
concepts where the concept represents affinity of
its members with respect to the task labels. Con-
sider a sentiment classification task with two labels
“positive” and “negative”. We define
Csst(+ve)
as
a concept containing words when they only appear
in sentences that are labeled positive. Similarly, we
define
Chsd(toxic)
as a concept that contain words
that only appear in the sentences that were marked
as toxic.
2.2 Latent Concept Discovery
A vector representation in the neural network
model is composed of feature attributes of the input
words. We group the encoded vector representa-
tions using a clustering approach discussed below.
The underlying clusters, that we term as the en-
coded concepts, are then matched with the human-
defined concepts using an alignment function.
Formally, consider a pre-trained model
M
with
L
layers:
{l1, l2, . . . , lL}
. Given a dataset
W=
{w1, w2, ..., wN}
, we generate feature vectors, a
sequence of latent representations:
WM
zl=
{zl
1,...,zl
n}2
by doing a forward-pass on the data
for any given layer
l
. Our goal is to cluster rep-
resentations
zl
, from task-specific training data to
obtain encoded concepts.
We use agglomerative hierarchical cluster-
ing (Gowda and Krishna,1978), which assigns
each word to its individual cluster and iteratively
combines the clusters based on Ward’s minimum
variance criterion, using intra-cluster variance. Dis-
tance between two vector representations is cal-
culated with the squared Euclidean distance. The
algorithm terminates when the required
K
clusters
(i.e. encoded concepts) are formed, where
K
is a
hyper-parameter. Each encoded concept represents
a latent relationship between the words present in
the cluster.
2.3 Alignment
Once we have obtained a set of encoded concepts
in the base (pre-trained) and fine-tuned models, we
want to align them to study how the latent space
has evolved during transfer learning. Sajjad et al.
(2022) calibrated representational space in trans-
former models with different linguistic concepts to
generate their explanations. We extend their align-
ment function to align latent spaces within a model
and its fine-tuned version. Given a concept
C1(n)
with
J
number of words, we consider it to be
θ
-
aligned (
Λθ
) with concept
C2(m)
, if they satisfy
2
Each element
zi
denotes contextualized word representa-
tion for the corresponding word wiin the sentence.
the following constraint:
Λθ(C1, C2)=1,if wC1wC2δ(w,w)
Jθ
0,otherwise,
(1)
where Kronecker function δ(w, w)is defined as
δ(w, w)=1,if w=w
0,otherwise
Human-defined Concepts
The function can be
used to draw a mapping between concepts differ-
ent types of discussed in Section 2.1. To inves-
tigate how the transfer learning impacts human-
defined knowledge, we align the latent space to
the human-defined concepts such as
Cpos(N N)
or
Cchunking (P P ).
Task Concepts
Lastly, we compare the encoded
concepts with the task-specific concepts. Here,
we use the alignment function to mark affinity of
an encoded concept. For the Sentiment Analy-
sis task, let a task-specific concept
Csst(+ve)=
{w+
1, w+
2, . . . , w+
n}
defined by a set words that
only appeared in positively labeled sentences
S={s+
1, s+
2, . . . , s+
n}
. We call a concept
Cec =
{x1, x2, . . . , xn}
aligned to
Csst(+ve)
and mark
it positive if all words (
θ
) in the encoded con-
cept appeared in positively labeled sentences. Note
that here a word represents an instance based on
its contextualized embedding. We similarly align
Cec
with
Csst(ve)
to discover negative polarity
concepts.
3 Experimental Setup
3.1 Models and Tasks
We experimented with three popular transformer
architectures namely: BERT-base-cased (Devlin
et al.,2019), XLM-RoBERTa (Conneau et al.,
2020) and ALBERT (v2) (Lan et al.,2019) using
the base versions (13 layers and 768 dimensions).
To carryout the analysis, we fine-tuned the base
models for the tasks of sentiment analysis using
the Stanford sentiment treebank dataset (SST-2,
Socher et al.,2013), natural language inference
(MNLI, Williams et al.,2018) and the Hate Speech
Detection task (HSD, Mathew et al.,2020).
3.2 Clustering
We used the task-specific training data for cluster-
ing using both the base (pre-trained) and fine-tuned
models. This enables to accurately compare the
representational space generated by the same data.
We do a forward-pass over both base and fine-tuned
models to generate contextualized feature vectors
3
of words in the data and run agglomerative hier-
archical clustering over these vectors. We do this
for every layer independently, obtaining
K
clus-
ters (a.k.a encoded concepts) for both base and
fine-tuned models. We used
K=600
for our
experiments.
4
We carried out preliminary experi-
ments (all the BERT-base-cased experiments) using
K=200,400,...,1000
and all our experiments
using
K=600
and
K=1000
. We found that our
results are not sensitive to these parameters and
the patterns are consistent with different cluster
settings (please see Appendix B).
3.3 Human-defined Concepts
We experimented with traditional tasks that are de-
fined to capture core-linguistic concepts such as
word morphology: part-of-speech tagging using
the Penn TreeBank data (Marcus et al.,1993), syn-
tax: chunking tagging using CoNLL 2000 shared
task dataset (Tjong Kim Sang and Buchholz,2000),
CCG super tagging using CCG Tree-bank (Hock-
enmaier,2006) and semantic tagging using the Par-
allel Meaning Bank data (Abzianidze et al.,2017).
We trained BERT-based sequence taggers for each
of the above tasks and annotate the task-specific
training data. Each core-linguistic task serves as
a human-defined concept that is aligned with en-
coded concepts to measure the representation of
linguistic knowledge in the latent space. Appendix
Apresents the details on human defined concepts,
data stats and tagger accuracy.
3.4 Alignment Threshold
We consider an encoded concept to be aligned with
another concept, if it has at least 95%
5
match in
the number of words. We only consider concepts
that have more than 5 word-types. Note that the
encoded concepts are based on contextualized em-
3
We use NeuroX toolkit (Dalvi et al.,2019b) to extract
contextualized representations.
4
We experimented with ELbow (Thorndike,1953) and
Silhouette (Rousseeuw,1987) methods to find the optimal
number of clusters, but could not observe a reliable pattern.
Selecting between
600 1000
clusters gives the right bal-
ance to avoid over-clustering (many small clusters) and under-
clustering (a few large clusters).
5
Using an overlap of
95%
provides a very tight thresh-
old, allowing only
5%
of noise. Our patterns were consistent
at lower and higher thresholds.
Figure 3: Comparing encoded concepts of base models with their SST fine-tuned versions. X-axis = base model,
Y-axis = fine-tuned model. Each cell in the matrix represents a percentage (aligned concepts/total concepts in a
layer) between the base and fine-tuned models. Darker color means higher percentage. Detailed plots with actual
overlap values are provided in the Appendix.
bedding where a word has different embeddings
depending on the context.
4 Analysis
Language model pre-training has been shown to
capture rich linguistic features (Tenney et al.,2019;
Belinkov et al.,2020) that are redundantly dis-
tributed across the network (Dalvi et al.,2020;
Durrani et al.,2020). We analyze how the repre-
sentational space transforms when tuning towards
a downstream task: i) how much knowledge is car-
ried forward and ii) how it is redistributed, using
our alignment framework.
4.1 Comparing Base and Fine-tuned Models
How do the latent spaces compare between base
and fine-tuned models? We measure the overlap
between the concepts encoded in the different lay-
ers of the base and fine-tuned models to guage the
extent of transformation. Figures 3compares the
concepts in the base BERT, XLM-RoBERTa and
ALBERT models versus their fine-tuned variants
on the SST-2 task.
6
We observe a high overlap
in concepts in the lower layers of the model that
starts decreasing as we go deeper in the network,
completely diminishing towards the end. We con-
jecture that
the lower layers of the model retain
generic language concepts learned in the base
model, where as the higher layers are now learn-
ing task-specific concepts.7
Note, however, that
the lower layers also do not completely align be-
tween the models, which shows that all the lay-
6Please see all results in Appendix C.1.
7
Our next results comparing the latent space with human-
defined language concepts (Section 4.2) and the task specific
concepts (Section 4.3) reinforces this hypothesis.
ers go through substantial changes during transfer
learning.
Comparing Architectures:
The spread of the
shaded area along the x-axis, particularly in XLM-
R, reflects that some higher layer latent concepts in
the base model have shifted towards the lower lay-
ers of the fine-tuned model. The latent space in the
higher layers now reflect task-specific knowledge
which was not present in the base model. ALBERT
shows a strikingly different pattern with only the
first 2-3 layers exhibiting an overlap with base con-
cepts. This could be attributed to the fact that AL-
BERT shares parameters across layers while the
other models have separate parameters for every
layer. ALBERT has less of a luxury to preserve
previous knowledge and therefore its space trans-
forms significantly towards the downstream task.
Notice that the overlap is comparatively smaller
(38% vs. 52% and 46% compared to BERT and
XLM-R respectively), even in the embedding layer,
where the words are primarily grouped based on
lexical similarity.
4.2 Presence of Linguistic Concepts in the
Latent Space
How does the presence of core-linguistic concepts
change during transfer learning? To validate our
hypothesis that generic language concepts are now
predominantly retained in the lower half, we an-
alyze how the linguistic concepts spread across
the layers in the pre-trained and fine-tuned models
by aligning the latent space to the human-defined
concepts. Figure 4shows that the latent space of
the models capture POS concepts (e.g., determin-
ers, past-tense verbs, superlative adjectives etc.)
摘要:

OntheTransformationofLatentSpaceinFine-TunedNLPModelsWARNING:ThispapercontainsmodeloutputswhichmaybedisturbingtothereaderNadirDurraniµHassanSajjadº˜FahimDalviµFirojAlamµµQatarComputingResearchInstitute,HamadBinKhalifaUniversity,QatarºFacultyofComputerScience,DalhousieUniversity,Canada{ndurrani,faima...

展开>> 收起<<
On the Transformation of Latent Space in Fine-Tuned NLP Models WARNING This paper contains model outputs which may be disturbing to the reader Nadir DurraniµHassan SajjadºFahim DalviµFiroj Alamµ.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:6.42MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注