An Empirical Revisiting of Linguistic Knowledge Fusion in
Language Understanding Tasks
Changlong Yu1Tianyi Xiao1Lingpeng Kong2Yangqiu Song1Wilfred Ng1
1HKUST, Hong Kong 2The University of Hong Kong, Hong Kong
{cyuaq, yqsong, wilfred}@cse.ust.hk, txiao@connect.ust.hk, lpk@cs.hku.hk
Abstract
Though linguistic knowledge emerges during
large-scale language model pretraining, recent
work attempt to explicitly incorporate human-
defined linguistic priors into task-specific fine-
tuning. Infusing language models with syn-
tactic or semantic knowledge from parsers has
shown improvements on many language under-
standing tasks. To further investigate the ef-
fectiveness of structural linguistic priors, we
conduct empirical study of replacing parsed
graphs or trees with trivial ones (rarely carry-
ing linguistic knowledge e.g., balanced tree)
for tasks in the GLUE benchmark. Encod-
ing with trivial graphs achieves competitive
or even better performance in fully-supervised
and few-shot settings. It reveals that the gains
might not be significantly attributed to explicit
linguistic priors but rather to more feature in-
teractions brought by fusion layers. Hence
we call for attention to using trivial graphs as
necessary baselines to design advanced knowl-
edge fusion methods in the future.
1 Introduction
Recently large-scale pretrained language mod-
els (Devlin et al.,2019;Liu et al.,2019;Raffel
et al.,2020) have shown to gain linguistic knowl-
edge from unlabeled corpus and achieve strong per-
formance on many downstream natural language
processing (NLP) tasks. Though probing analysis
indicate that, to some extent, they can implicitly
capture syntactic or semantic structures (Hewitt
and Manning,2019;Goldberg,2019;Tenney et al.,
2018;Hou and Sachan,2021), whether they can
further benefit from more explicit linguistic knowl-
edge remains an open problem. Attempts have
been made to inject syntactic biases into language
model pretraining (Kuncoro et al.,2020;Wang
et al.,2021;Xu et al.,2021b) or infuse finetun-
ing with semantic information (Zhang et al.,2020a;
Wu et al.,2021), and positive results are reported
on downstream tasks.
However, the concerns about the effect or vi-
ability of linguistic knowledge have been raised.
On the one hand, the performance gains highly
rely on the availability of human-annotated depen-
dency parsers (Sachan et al.,2021) or oracle se-
mantic graphs (Prange et al.,2022), which limits
the real-world applications. Developing accurate
semantic graph parsers is yet challenging (Oepen
et al.,2019;Bai et al.,2022). On the other hand,
incorporating trees induced from pretrained lan-
guage models (Wu et al.,2020) can outperform
the ones fused with dependency-parsed trees for
aspect-level sentiment analysis (Dai et al.,2021).
This discovery is in line with the similar findings
of trivial trees for tree-LSTM encoders in sequence
modeling tasks (Shi et al.,2018). In this work,
we push the envelop and answer the following two
questions. Do knowledge fusion methods in Wu
et al. (2021) benefit from trivial graphs that contain
no linguistic information? If that’s the case, where
might the performance gains come from?
With the above questions, we empirically re-
visit the effectiveness of linguistic knowledge fu-
sion in language understanding tasks. Motivated
by Shi et al. (2018), we compare the perfor-
mance between original dependency-parsed trees
and balanced trees for syntax fusion, and com-
pare the results between parsed semantic graphs
and sequential graphs for semantic fusion. To
our surprise, trivial graphs outperform syntactic
trees or semantic graphs in full-supervised set-
ting and achieve competitive results in few-shot
setting. All the evidence again shows that the
linguistic inductive bias might not be the ma-
jor contributor of consistent improvements over
baselines. Additional analysis gives some clues
that the possible reasons are extra model param-
eters and feature interactions from fusion mod-
ules. This work encourages future research to add
trivial graphs as necessary baselines when design-
ing more advanced knowledge fusion methods for
arXiv:2210.13002v1 [cs.CL] 24 Oct 2022