Entity Tracking via Effective Use of Multi-Task Learning Model and Mention-guided Decoding Janvijay SinghFan BaiZhen Wang

2025-04-26 0 0 323.51KB 9 页 10玖币
侵权投诉
Entity Tracking via Effective Use of Multi-Task Learning Model and
Mention-guided Decoding
Janvijay SinghFan BaiZhen Wang
School of Interactive Computing, Georgia Institute of Technology
Department of Computer Science and Engineering, The Ohio State University
iamjanvijay@gatech.edu
fan.bai@cc.gatech.edu
wang.9215@osu.edu
Abstract
Cross-task knowledge transfer via multi-
task learning has recently made remarkable
progress in general NLP tasks. However, en-
tity tracking on the procedural text has not
benefited from such knowledge transfer be-
cause of its distinct formulation, i.e., track-
ing the event flow while following structural
constraints. State-of-the-art entity tracking
approaches either design complicated model
architectures or rely on task-specific pre-
training to achieve good results. To this
end, we propose MEET, a Multi-task learning-
enabled entity Tracking approach, which uti-
lizes knowledge gained from general domain
tasks to improve entity tracking. Specifically,
MEET first fine-tunes T5, a pre-trained multi-
task learning model, with entity tracking-
specialized QA formats, and then employs our
customized decoding strategy to satisfy the
structural constraints. MEET achieves state-
of-the-art performances on two popular entity
tracking datasets, even though it does not re-
quire any task-specific architecture design or
pre-training.1
1 Introduction
Pre-trained language models have revolutionized
the NLP field in recent years (Devlin et al.,2019;
Liu et al.,2019;Brown et al.,2020) and also
become more versatile with the novel encoder-
decoder architecture (Raffel et al.,2020;Lewis
et al.,2020), which allows them to handle differ-
ent types of NLP tasks without further architec-
tural changes. This versatility inherently facilitates
cross-task knowledge transfer via multi-task learn-
ing (Raffel et al.,2020;Aribandi et al.,2022), and
thus helps push the boundary of many popular NLP
tasks such as question answering (Khashabi et al.,
2020) and semantic parsing (Xie et al.,2022). How-
ever, entity tracking, which tracks the states and
1
Our code and data are available at
https://github.
com/iamjanvijay/MeeT.
Procedural text: How is hydroelectricity generated?
1. Water flows downwards thanks to gravity.!
2. Enters the dam at high pressure.!
3. Moving water spins the turbines in power plant.!
4. …
Entity Tracking Input
Query Entity: Water
Water
step 1
exist
step 2
move
step 3
move
State Prediction Location Prediction
Entity Tracking Output
Water
step 1
unknown
step 2
dam
step 3
turbine
Mention-guided
CRF decoding
State Prediction
T5
step 3
step 2
step 1 exist
move
move
exist
destroy
move
unknown
turbine
Location Prediction
T5
step 3
step 2
step 1
MeeT
Figure 1: Overview of MEET (Multi-task learning-
enabled entity Tracking). MEET utilizes the multi-task
learning in T5 to boost entity tracking performance,
with a customized decoding strategy addressing the
structural constraints in state prediction (e.g., "move"
cannot happen after "destroy").
locations of an entity throughout the procedural
text, like scientific processes or recipes, has not
been impacted by this multi-task learning wave for
two main reasons. First, entity tracking requires
the model to make step-wise predictions while sat-
isfying structural constraints (e.g., an entity cannot
be "moved" after being "destroyed" in the previ-
ous steps). This requirement is usually tackled by
designing task-specific architectures (Gupta and
Durrett,2019b;Tang et al.,2020;Huang et al.,
2021), and those generic multi-task models with
the encoder-decoder architecture cannot address
it easily. Second, understanding procedural text
arXiv:2210.06444v2 [cs.CL] 12 Feb 2023
requires domain-specific knowledge, which usu-
ally does not exist in general domain tasks that
multi-task learning models are trained on, so it is
not clear how effective the knowledge transfer will
be given this domain gap (Zhang et al.,2021;Bai
et al.,2021;Shi et al.,2022).
In this paper, we study how entity tracking
can benefit from the current multi-task learn-
ing paradigm and present
MEET
, a
M
ulti-task
learning-
e
nabled
e
ntity
T
racking approach. This
approach includes two parts. The first part fine-
tunes T5 (Raffel et al.,2020), a model that has
been pre-trained on a diverse set of NLP tasks and
has shown great cross-task generalizability. Here,
we design entity tracking-specialized QA formats
to accommodate the need to make step-specific
predictions, while facilitating effective knowledge
transfer from T5. The second part resolves con-
flicted state predictions under structural constraints.
We use a customized offline CRF inference algo-
rithm, where the main idea is to emphasize the
predictions of steps, in which the query entity is
explicitly mentioned, because the fine-tuned model
performs better in those cases (Table 5). On two
benchmark datasets, ProPara (Dalvi et al.,2018)
and Recipes (Bosselut et al.,2018), our MEET out-
performs previous state-of-the-art methods, which
require extra domain-specific pre-training or data
augmentation. We verify the importance of multi-
task learning in T5 and our proposed decoding
strategy through careful analyses and ablation stud-
ies.
To sum up, our contributions are three-fold: (1)
Our work is the first to explore cross-task knowl-
edge transfer for entity tracking on procedural text;
(2) Our proposed approach, MEET, effectively uses
the off-the-shelf pre-trained multi-task learning
model T5 with a customized decoding strategy,
and thus achieves state-of-the-art performance on
two benchmark datasets; (3) Our comprehensive
analyses verify the benefits of multi-task learning
on entity tracking.
2 Related Work
Tracking the progression of an entity within proce-
dural text, such as cooking recipes (Bosselut et al.,
2018) or scientific protocols (Tamari et al.,2021;
Le et al.,2022;Bai et al.,2022), is challenging as
it calls for a model to understand both superficial
and intrinsic dynamics of the process. Recent work
on entity tracking can be divided into two lines.
One focuses on designing task-specific fine-tuning
architectures to ensure that the model makes step-
grounded predictions while following the structural
constraints. For instance, Rajaby Faghihi and Kord-
jamshidi (2021) introduce time-stamp embeddings
into RoBERTa (Liu et al.,2019) to encode the index
of the query step. Gupta and Durrett (2019b) frame
entity tracking as a structured prediction problem
and use a CRF layer to promote global consistency
under those structural constraints. In our case, we
show that, with QA formulation, simply appending
the index of the query step to the question and in-
dexing the procedure produces step-specific predic-
tions. Moreover, we propose a customized offline
CRF-decoding strategy for structural constraints
to compensate for the fact that it is hard to jointly
train T5, our backbone LM, with a CRF layer, like
in previous methods.
The other line of work focuses on domain-
specific knowledge transfer (Zhang et al.,2021;
Bai et al.,2021;Shi et al.,2022;Ma et al.,2022).
Concretely, LEMON (Shi et al.,2022) achieves
great performance by performing in-domain pre-
training on 1 million procedural paragraphs. CGLI
(Ma et al.,2022) shows that adding high-quality
pseudo-labeled data (generated via self-training)
during fine-tuning can also boost the model perfor-
mance. In contrast, our work explores how entity
tracking can benefit from out-of-domain knowl-
edge via using off-the-shelf pre-trained multi-task
learning models.
3 Method
In this section, we present
MEET
, a
M
ulti-task
learning-
e
nabled
e
ntity
T
racking approach. Here,
we first review the problem definition, and then lay
out the details of MEET.
3.1 Problem Definition
Entity tracking aims at monitoring the status of
an entity throughout a procedure. The input of
this task contains two items: 1) a procedural para-
graph
P
, composed of a sequence of sentences
{s1, s2, ..., sT}
; and 2) a procedure-specific query
entity
e
. Given the input, our goal is to predict
the state and location of the query entity at each
timestamp of the procedure (see an example from
the ProPara dataset in Figure 1).
摘要:

EntityTrackingviaEffectiveUseofMulti-TaskLearningModelandMention-guidedDecodingJanvijaySingh|FanBai|ZhenWang}|SchoolofInteractiveComputing,GeorgiaInstituteofTechnology}DepartmentofComputerScienceandEngineering,TheOhioStateUniversityiamjanvijay@gatech.edufan.bai@cc.gatech.eduwang.9215@osu.eduAbstract...

展开>> 收起<<
Entity Tracking via Effective Use of Multi-Task Learning Model and Mention-guided Decoding Janvijay SinghFan BaiZhen Wang.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:323.51KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注