Modular Approach to Machine Reading Comprehension Mixture of Task-Aware Experts Anirudha Rayasam

2025-05-06 0 0 773.87KB 9 页 10玖币
侵权投诉
Modular Approach to Machine Reading Comprehension: Mixture of
Task-Aware Experts
Anirudha Rayasam
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
arayasam@andrew.cmu.edu
Anusha Kamath
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
akamath1@andrew.cmu.edu
Gabriel Bayomi Tinoco Kalejaiye
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
gbayomi@andrew.cmu.edu
Abstract
In this work we present a Mixture of Task-
Aware Experts Network for Machine Reading
Comprehension on a relatively small dataset.
We particularly focus on the issue of common-
sense learning, enforcing the common ground
knowledge by specifically training different
expert networks to capture different kinds of
relationships between each passage, question
and choice triplet. Moreover, we take inspi-
ration on the recent advancements of multi-
task and transfer learning by training each net-
work a relevant focused task. By making
the mixture-of-networks aware of a specific
goal by enforcing a task and a relationship,
we achieve state-of-the-art results and reduce
over-fitting.
1 Introduction
Teaching a computer to read and comprehend
human languages is a challenging task for ma-
chines as this requires the understanding of nat-
ural languages and the ability to reason over vari-
ous clues. The task of Machine Reading Compre-
hension (MRC) is a useful benchmark to demon-
strate natural language understanding. In recent
years, several datasets have been created to focus
on answering questions as a way to evaluate ma-
chine comprehension. The machine is first pre-
sented with a piece of text such as a news article or
a story and is then expected to answer one or mul-
tiple questions related to the text. One of the big
challenges of the field is to provide a system that
is able to infer relationships that are not entirely
based on the passage: reaching commonsense.
Powerful approaches have been explored to
solve this issue by applying attention mechanisms
over passage, choice and answer. However, by try-
ing to understand the three different aspects con-
comitantly, they usually fail to capture the dis-
tinct commonsense relationship between question-
choice or passage-choice pairs.
We propose the expert networks QC (question-
choice) and PC(passage-choice) which specifi-
cally are trained to learn the different structures
necessary to answer the questions. Our goal is to
achieve commonsense knowledge by directly en-
forcing the network to build question-choice and
passage-choice relationships. For a human being,
regardless of the passage, it should be clear that
“fireworks” is a more likely answer than “water”
for the question “how did the fire start?” unless
the passage clearly states the contrary. Addition-
ally, inspired by the efficacy of multi-task learn-
ing techniques, we propose a Task-Aware Expert
Training, where each network is trained on a dif-
ferent but relevant task in order to improve their
overall inference capabilities.
2 Related work
A lot of the work in Neural Machine comprehen-
sion focuses on how to extract the required infor-
mation from the given passage. Recent approaches
have had enormous success, for instance: R-net
(Wang et al.,2017b) and Match-LSTM (Wang and
Jiang,2016). However this might not be a good
solution for MRC when there is a need of gener-
ating additional text not included in the passage or
the question, augmenting information in multiple
passage spans and the question as and when re-
quired. The work from (Weston et al.,2014), (Tan
et al.,2017), (Cui et al.,2016) are examples of this
change of paradigm. An interesting adaptation in-
volves using single or multiple turns of reasoning
to effectively exploit the relation among queries,
documents, and answers. (Trischler et al.,2016),
(Hermann et al.,2015), (Shen et al.,2016), (Xu
et al.,2017), (Gupta et al.,2019), (Anirudha et al.,
2014b) and (Larionov et al.,2018) are great exam-
arXiv:2210.01750v1 [cs.CL] 4 Oct 2022
ples of recent relevant strategies. Some more rele-
vant work is available in (Anirudha et al.,2014a,c)
One intuitive line of work in Machine com-
prehension uses common sense knowledge along
with the comprehension text to generate answers.
Common-sense, knowledge increases the accu-
racy of machine comprehension systems. The
challenge is to find a way to include this addi-
tional data and improve the system’s performance.
There are many possible common-sense knowl-
edge sources. Generally, script knowledge which
is sequences of events that describe typical human
actions in an everyday situations is used.
The work from (Lin et al.,2017a) shows
how a multi-knowledge reasoning method, which
explores heterogeneous knowledge relationships,
can be powerful for commonsense MRC. This ap-
proach is achieved by combining different kinds
of structures for knowledge: narrative knowledge,
entity semantic knowledge and sentiment coherent
knowledge. By using data mining techniques, they
provide a model with cost-based inference rules
as an encoding mechanism for knowledge. Later,
they are able to produce a multi-knowledge rea-
soning model that has the ability to select which
inference rule to use for each context.
Another interesting approach comes from
(Wang et al.,2017a) where the authors proposed
the usage of Conditional Generative Adversarial
Network (CGAN) to tackle the problem of insuffi-
cient data for reading comprehension tasks by gen-
erating additional fake sentences and the proposed
generator is conditioned by the referred context,
achieving state-of-the-art results.
The work by (Wang,2018a) assesses how a
Three-Way Attentive Network (TriAN) with the
inclusion of commonsense knowledge benefits
multiple choice reading comprehension. The com-
bination of attention mechanisms have shown to
strongly improve performance for reading com-
prehension. In addition to that, commonsense
knowledge can help in inferring nontrivial im-
plicit events within the comprehension passage.
The work by (Lin et al.,2017b) focuses on rea-
soning with heterogeneous commonsense knowl-
edge. They use three kinds of commonsense
knowledge, causal relations, semantic relations
like co-reference,associative relations and lastly
sentiment knowledge - sentiment coherence (pos-
itivity and negativity) between two elements. In
human reasoning process, not all inference rules
have the same possibility to be applied, because
the more reasonable inference will be proposed
more likely. They use attention to weigh the in-
ferences based on the nature of the rule and the
given context. Their attention mechanism mod-
els the possibility that an inference rule is applied
during the inference from a premise document to a
hypothesis by considering the relatedness between
elements and knowledge category, as well as the
relatedness between two elements. They answer
the comprehension task by summarizing over all
valid inference rules.
Although Mixture of Experts models are widely
used for NLP tasks, it is still underused for Ma-
chine Reading Comprehension tasks. Recent work
includes the Language Model paper from (Le
et al.,2016) which introduces an LSTM-based
mixture method for the dynamic integration of
a group of word prediction experts in order to
achieve conditional language model which excels
simultaneously at several subtasks. Moreover, the
work from (Xiong et al.,2017) also includes a
sparse mixture of experts layer for a Question-
Answering task, which is inherited from the previ-
ous work of (Shazeer et al.,2017) on a Sparsely-
Gated Mixture-of-Experts Layer. The success
of the aforementioned approaches show the opti-
mism about introducing Mixture of Experts Deep
Learning for Machine Reading Comprehension.
3 Model description
We experiment with a mixture of experts model
to tackle the task of commonsense machine com-
prehension. The model is inspired from ana-
lyzing the errors made by a triple attention net-
work from (Wang,2018b) which achieves state-
of-the-art results by using a Three-Way Attentive
Network (TriAN) with the inclusion of common-
sense knowledge in the form of relational em-
beddings obtained from the ConceptNet knowl-
edge graph which is a large-scale graph of com-
monsense knowledge consisting of over 21 million
edges and 8 million nodes.
A training example in the commonsense com-
prehension task consists of a passage (P), question
(Q), answer (A) and label y which is 0 or 1. P,
Q and A are all sequences of words. For a word
Piin the given passage, the input representation
of Piis the concatenation of several vectors: Pre-
trained GloVe embeddings, Part-of-speech (POS)
embedding, named-entity embedding (NE), Con-
摘要:

ModularApproachtoMachineReadingComprehension:MixtureofTask-AwareExpertsAnirudhaRayasamLanguageTechnologiesInstituteSchoolofComputerScienceCarnegieMellonUniversityarayasam@andrew.cmu.eduAnushaKamathLanguageTechnologiesInstituteSchoolofComputerScienceCarnegieMellonUniversityakamath1@andrew.cmu.eduGabr...

展开>> 收起<<
Modular Approach to Machine Reading Comprehension Mixture of Task-Aware Experts Anirudha Rayasam.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:773.87KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注