An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks Changlong Yu1Tianyi Xiao1Lingpeng Kong2Yangqiu Song1Wilfred Ng1

2025-04-30 1 0 298.08KB 7 页 10玖币

侵权投诉

An Empirical Revisiting of Linguistic Knowledge Fusion in

Language Understanding Tasks

Changlong Yu1Tianyi Xiao1Lingpeng Kong2Yangqiu Song1Wilfred Ng1

1HKUST, Hong Kong 2The University of Hong Kong, Hong Kong

{cyuaq, yqsong, wilfred}@cse.ust.hk, txiao@connect.ust.hk, lpk@cs.hku.hk

Abstract

Though linguistic knowledge emerges during

large-scale language model pretraining, recent

work attempt to explicitly incorporate human-

deﬁned linguistic priors into task-speciﬁc ﬁne-

tuning. Infusing language models with syn-

tactic or semantic knowledge from parsers has

shown improvements on many language under-

standing tasks. To further investigate the ef-

fectiveness of structural linguistic priors, we

conduct empirical study of replacing parsed

graphs or trees with trivial ones (rarely carry-

ing linguistic knowledge e.g., balanced tree)

for tasks in the GLUE benchmark. Encod-

ing with trivial graphs achieves competitive

or even better performance in fully-supervised

and few-shot settings. It reveals that the gains

might not be signiﬁcantly attributed to explicit

linguistic priors but rather to more feature in-

teractions brought by fusion layers. Hence

we call for attention to using trivial graphs as

necessary baselines to design advanced knowl-

edge fusion methods in the future.

1 Introduction

Recently large-scale pretrained language mod-

els (Devlin et al.,2019;Liu et al.,2019;Raffel

et al.,2020) have shown to gain linguistic knowl-

edge from unlabeled corpus and achieve strong per-

formance on many downstream natural language

processing (NLP) tasks. Though probing analysis

indicate that, to some extent, they can implicitly

capture syntactic or semantic structures (Hewitt

and Manning,2019;Goldberg,2019;Tenney et al.,

2018;Hou and Sachan,2021), whether they can

further beneﬁt from more explicit linguistic knowl-

edge remains an open problem. Attempts have

been made to inject syntactic biases into language

model pretraining (Kuncoro et al.,2020;Wang

et al.,2021;Xu et al.,2021b) or infuse ﬁnetun-

ing with semantic information (Zhang et al.,2020a;

Wu et al.,2021), and positive results are reported

on downstream tasks.

However, the concerns about the effect or vi-

ability of linguistic knowledge have been raised.

On the one hand, the performance gains highly

rely on the availability of human-annotated depen-

dency parsers (Sachan et al.,2021) or oracle se-

mantic graphs (Prange et al.,2022), which limits

the real-world applications. Developing accurate

semantic graph parsers is yet challenging (Oepen

et al.,2019;Bai et al.,2022). On the other hand,

incorporating trees induced from pretrained lan-

guage models (Wu et al.,2020) can outperform

the ones fused with dependency-parsed trees for

aspect-level sentiment analysis (Dai et al.,2021).

This discovery is in line with the similar ﬁndings

of trivial trees for tree-LSTM encoders in sequence

modeling tasks (Shi et al.,2018). In this work,

we push the envelop and answer the following two

questions. Do knowledge fusion methods in Wu

et al. (2021) beneﬁt from trivial graphs that contain

no linguistic information? If that’s the case, where

might the performance gains come from?

With the above questions, we empirically re-

visit the effectiveness of linguistic knowledge fu-

sion in language understanding tasks. Motivated

by Shi et al. (2018), we compare the perfor-

mance between original dependency-parsed trees

and balanced trees for syntax fusion, and com-

pare the results between parsed semantic graphs

and sequential graphs for semantic fusion. To

our surprise, trivial graphs outperform syntactic

trees or semantic graphs in full-supervised set-

ting and achieve competitive results in few-shot

setting. All the evidence again shows that the

linguistic inductive bias might not be the ma-

jor contributor of consistent improvements over

baselines. Additional analysis gives some clues

that the possible reasons are extra model param-

eters and feature interactions from fusion mod-

ules. This work encourages future research to add

trivial graphs as necessary baselines when design-

ing more advanced knowledge fusion methods for

arXiv:2210.13002v1 [cs.CL] 24 Oct 2022

downstream tasks. Our experimental code is avail-

able at

https://github.com/HKUST-KnowComp/

revisit-nlu-linguistic-knowledge.

2 Study Design

In this section, we brieﬂy introduce two linguistic

graphs, i.e., syntactic dependency trees and seman-

tic graphs. As a comparison, we manually con-

struct two trivial graphs to infuse with task-speciﬁc

ﬁnetuning.

2.1 Linguistic Graph

Graphs have intuitively represented various lin-

guistic phenomena in natural languages including

sentence structures (Chomsky,1957) and mean-

ings (Koller et al.,2019).

Syntactic Dependency Tree.

Syntactic trees are

one of the most commonly-used linguistic struc-

tures and have long been shown useful for many

NLP tasks. Syntactic dependency mainly models

head-dependent relations between words. Depen-

dency parsers parse the sentence into tree struc-

tures, which are further incorporated into LMs via

syntax-aware attention (Nguyen et al.,2019) or

graph neural networks (GNN Sachan et al. 2021).

Semantic Graphs.

Different from syntactic de-

pendency, semantic graphs aim to map sentences

to high-order meaning representations with more

complex structures. Normally semantics concern

about predicate-argument relations, where predi-

cates evoke relations of various arity and arguments

ﬁlled with semantic roles that are related to each

speciﬁc predicate.

One example is shown in Fig-

ure 1, and the characteristics of semantic graphs are

the following: 1) Argument sharing leads to nodes

whose in-degrees are more than one. 2) Some to-

kens do not contribute to meaning and not appear

in the graphs. 3) There exist multiple roots. Com-

plex semantic structures enable them to capture

information that is not explicit in the single-rooted

syntactic trees. Semantics could be formalized by

different frameworks with respect to special lin-

guistic assumptions. Some representative semantic

formalisms are AMR (Abstract Meaning Represen-

tation, Banarescu et al.,2013) and UCCA (Abend

and Rappoport,2013). Recently Wu et al. (2021)

proposed semantics-infused ﬁnetuning (SIFT) to

infuse DM (DELPH-IN Minimal Recursion Se-

mantics, Ivanova et al.,2012) graphs and achieved

We refer the readers to the ACL tutorial Koller et al.

(2019) for detailed explanations.

        

 









        





 





 



Figure 1: An example of dependency tree (blue) and

DM semantic graph (red).

consistent improvements over RoBERTa (Liu et al.,

2019) baselines on the GLUE (Wang et al.,2019).

DM graphs (Ivanova et al.,2012) deﬁne 59 types

to characterize predicate-argument relationships.

In order to investigate the effect of different seman-

tic relations, we consider to only keep six common

relation types, which appear in most parsed graphs,

named

skeleton graphs

. These relations include

ARG1

ARG2

ARG3

ARG4

compound

and

. We are

interested in whether downstream tasks would still

beneﬁt from the core semantics instead of the entire

linguistic graphs.

2.2 Trivial Graph

Though linguistic graphs convey useful structures,

high-quality parsers are not easily available due to

limited annotated graph banks (Oepen et al.,2019).

If structure priors are unavailable, Shi et al. (2018)

demonstrated that trivial trees, such as gumbel tree

outperform syntactic trees when they are incorpo-

rated into tree LSTM encoders (Tai et al.,2015).

However, infusing trivial linguistic graphs with

pretrained transformer models has not been ex-

plored. Similarly, we also create two types of triv-

ial trees or graphs, which rarely contain linguistic

inductive bias, to reproduce knowledge fusion ex-

periments in Wu et al. (2021).

Binary Balanced Tree.

Compared with syntactic

trees, binary balanced trees are shallower and pos-

sibly easier to propagate information from leaves

to the root. We assume GNN layers might beneﬁt

from the shallowness of balanced trees.

Sequential Bidirectional Graph.

As the most nat-

ural and straight-forward way, tokens in the sen-

tence are connected in the sequential order, which

combines left-to-right and right-to-left chains. By

doing so, GNN layers only aggregate local informa-

tion rather than potentially long dependency from

linguistic graphs.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AnEmpiricalRevisitingofLinguisticKnowledgeFusioninLanguageUnderstandingTasksChanglongYu1TianyiXiao1LingpengKong2YangqiuSong1WilfredNg11HKUST,HongKong2TheUniversityofHongKong,HongKong{cyuaq,yqsong,wilfred}@cse.ust.hk,txiao@connect.ust.hk,lpk@cs.hku.hkAbstractThoughlinguisticknowledgeemergesduringlarg...

展开>> 收起<<

An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks Changlong Yu1Tianyi Xiao1Lingpeng Kong2Yangqiu Song1Wilfred Ng1.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks Changlong Yu1Tianyi Xiao1Lingpeng Kong2Yangqiu Song1Wilfred Ng1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: