An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks Changlong Yu1Tianyi Xiao1Lingpeng Kong2Yangqiu Song1Wilfred Ng1

2025-04-30 0 0 298.08KB 7 页 10玖币
侵权投诉
An Empirical Revisiting of Linguistic Knowledge Fusion in
Language Understanding Tasks
Changlong Yu1Tianyi Xiao1Lingpeng Kong2Yangqiu Song1Wilfred Ng1
1HKUST, Hong Kong 2The University of Hong Kong, Hong Kong
{cyuaq, yqsong, wilfred}@cse.ust.hk, txiao@connect.ust.hk, lpk@cs.hku.hk
Abstract
Though linguistic knowledge emerges during
large-scale language model pretraining, recent
work attempt to explicitly incorporate human-
defined linguistic priors into task-specific fine-
tuning. Infusing language models with syn-
tactic or semantic knowledge from parsers has
shown improvements on many language under-
standing tasks. To further investigate the ef-
fectiveness of structural linguistic priors, we
conduct empirical study of replacing parsed
graphs or trees with trivial ones (rarely carry-
ing linguistic knowledge e.g., balanced tree)
for tasks in the GLUE benchmark. Encod-
ing with trivial graphs achieves competitive
or even better performance in fully-supervised
and few-shot settings. It reveals that the gains
might not be significantly attributed to explicit
linguistic priors but rather to more feature in-
teractions brought by fusion layers. Hence
we call for attention to using trivial graphs as
necessary baselines to design advanced knowl-
edge fusion methods in the future.
1 Introduction
Recently large-scale pretrained language mod-
els (Devlin et al.,2019;Liu et al.,2019;Raffel
et al.,2020) have shown to gain linguistic knowl-
edge from unlabeled corpus and achieve strong per-
formance on many downstream natural language
processing (NLP) tasks. Though probing analysis
indicate that, to some extent, they can implicitly
capture syntactic or semantic structures (Hewitt
and Manning,2019;Goldberg,2019;Tenney et al.,
2018;Hou and Sachan,2021), whether they can
further benefit from more explicit linguistic knowl-
edge remains an open problem. Attempts have
been made to inject syntactic biases into language
model pretraining (Kuncoro et al.,2020;Wang
et al.,2021;Xu et al.,2021b) or infuse finetun-
ing with semantic information (Zhang et al.,2020a;
Wu et al.,2021), and positive results are reported
on downstream tasks.
However, the concerns about the effect or vi-
ability of linguistic knowledge have been raised.
On the one hand, the performance gains highly
rely on the availability of human-annotated depen-
dency parsers (Sachan et al.,2021) or oracle se-
mantic graphs (Prange et al.,2022), which limits
the real-world applications. Developing accurate
semantic graph parsers is yet challenging (Oepen
et al.,2019;Bai et al.,2022). On the other hand,
incorporating trees induced from pretrained lan-
guage models (Wu et al.,2020) can outperform
the ones fused with dependency-parsed trees for
aspect-level sentiment analysis (Dai et al.,2021).
This discovery is in line with the similar findings
of trivial trees for tree-LSTM encoders in sequence
modeling tasks (Shi et al.,2018). In this work,
we push the envelop and answer the following two
questions. Do knowledge fusion methods in Wu
et al. (2021) benefit from trivial graphs that contain
no linguistic information? If that’s the case, where
might the performance gains come from?
With the above questions, we empirically re-
visit the effectiveness of linguistic knowledge fu-
sion in language understanding tasks. Motivated
by Shi et al. (2018), we compare the perfor-
mance between original dependency-parsed trees
and balanced trees for syntax fusion, and com-
pare the results between parsed semantic graphs
and sequential graphs for semantic fusion. To
our surprise, trivial graphs outperform syntactic
trees or semantic graphs in full-supervised set-
ting and achieve competitive results in few-shot
setting. All the evidence again shows that the
linguistic inductive bias might not be the ma-
jor contributor of consistent improvements over
baselines. Additional analysis gives some clues
that the possible reasons are extra model param-
eters and feature interactions from fusion mod-
ules. This work encourages future research to add
trivial graphs as necessary baselines when design-
ing more advanced knowledge fusion methods for
arXiv:2210.13002v1 [cs.CL] 24 Oct 2022
downstream tasks. Our experimental code is avail-
able at
https://github.com/HKUST-KnowComp/
revisit-nlu-linguistic-knowledge.
2 Study Design
In this section, we briefly introduce two linguistic
graphs, i.e., syntactic dependency trees and seman-
tic graphs. As a comparison, we manually con-
struct two trivial graphs to infuse with task-specific
finetuning.
2.1 Linguistic Graph
Graphs have intuitively represented various lin-
guistic phenomena in natural languages including
sentence structures (Chomsky,1957) and mean-
ings (Koller et al.,2019).
Syntactic Dependency Tree.
Syntactic trees are
one of the most commonly-used linguistic struc-
tures and have long been shown useful for many
NLP tasks. Syntactic dependency mainly models
head-dependent relations between words. Depen-
dency parsers parse the sentence into tree struc-
tures, which are further incorporated into LMs via
syntax-aware attention (Nguyen et al.,2019) or
graph neural networks (GNN Sachan et al. 2021).
Semantic Graphs.
Different from syntactic de-
pendency, semantic graphs aim to map sentences
to high-order meaning representations with more
complex structures. Normally semantics concern
about predicate-argument relations, where predi-
cates evoke relations of various arity and arguments
filled with semantic roles that are related to each
specific predicate.
1
One example is shown in Fig-
ure 1, and the characteristics of semantic graphs are
the following: 1) Argument sharing leads to nodes
whose in-degrees are more than one. 2) Some to-
kens do not contribute to meaning and not appear
in the graphs. 3) There exist multiple roots. Com-
plex semantic structures enable them to capture
information that is not explicit in the single-rooted
syntactic trees. Semantics could be formalized by
different frameworks with respect to special lin-
guistic assumptions. Some representative semantic
formalisms are AMR (Abstract Meaning Represen-
tation, Banarescu et al.,2013) and UCCA (Abend
and Rappoport,2013). Recently Wu et al. (2021)
proposed semantics-infused finetuning (SIFT) to
infuse DM (DELPH-IN Minimal Recursion Se-
mantics, Ivanova et al.,2012) graphs and achieved
1
We refer the readers to the ACL tutorial Koller et al.
(2019) for detailed explanations.
        
 




        


 


 

Figure 1: An example of dependency tree (blue) and
DM semantic graph (red).
consistent improvements over RoBERTa (Liu et al.,
2019) baselines on the GLUE (Wang et al.,2019).
DM graphs (Ivanova et al.,2012) define 59 types
to characterize predicate-argument relationships.
In order to investigate the effect of different seman-
tic relations, we consider to only keep six common
relation types, which appear in most parsed graphs,
named
skeleton graphs
. These relations include
ARG1
,
ARG2
,
ARG3
,
ARG4
,
compound
and
BV
. We are
interested in whether downstream tasks would still
benefit from the core semantics instead of the entire
linguistic graphs.
2.2 Trivial Graph
Though linguistic graphs convey useful structures,
high-quality parsers are not easily available due to
limited annotated graph banks (Oepen et al.,2019).
If structure priors are unavailable, Shi et al. (2018)
demonstrated that trivial trees, such as gumbel tree
outperform syntactic trees when they are incorpo-
rated into tree LSTM encoders (Tai et al.,2015).
However, infusing trivial linguistic graphs with
pretrained transformer models has not been ex-
plored. Similarly, we also create two types of triv-
ial trees or graphs, which rarely contain linguistic
inductive bias, to reproduce knowledge fusion ex-
periments in Wu et al. (2021).
Binary Balanced Tree.
Compared with syntactic
trees, binary balanced trees are shallower and pos-
sibly easier to propagate information from leaves
to the root. We assume GNN layers might benefit
from the shallowness of balanced trees.
Sequential Bidirectional Graph.
As the most nat-
ural and straight-forward way, tokens in the sen-
tence are connected in the sequential order, which
combines left-to-right and right-to-left chains. By
doing so, GNN layers only aggregate local informa-
tion rather than potentially long dependency from
linguistic graphs.
摘要:

AnEmpiricalRevisitingofLinguisticKnowledgeFusioninLanguageUnderstandingTasksChanglongYu1TianyiXiao1LingpengKong2YangqiuSong1WilfredNg11HKUST,HongKong2TheUniversityofHongKong,HongKong{cyuaq,yqsong,wilfred}@cse.ust.hk,txiao@connect.ust.hk,lpk@cs.hku.hkAbstractThoughlinguisticknowledgeemergesduringlarg...

收起<<
An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks Changlong Yu1Tianyi Xiao1Lingpeng Kong2Yangqiu Song1Wilfred Ng1.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:7 页 大小:298.08KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注