Modular Approach to Machine Reading Comprehension Mixture of Task-Aware Experts Anirudha Rayasam

2025-05-06 0 0 773.87KB 9 页 10玖币

侵权投诉

Modular Approach to Machine Reading Comprehension: Mixture of

Task-Aware Experts

Anirudha Rayasam

Language Technologies Institute

School of Computer Science

Carnegie Mellon University

arayasam@andrew.cmu.edu

Anusha Kamath

Language Technologies Institute

School of Computer Science

Carnegie Mellon University

akamath1@andrew.cmu.edu

Gabriel Bayomi Tinoco Kalejaiye

Language Technologies Institute

School of Computer Science

Carnegie Mellon University

gbayomi@andrew.cmu.edu

Abstract

In this work we present a Mixture of Task-

Aware Experts Network for Machine Reading

Comprehension on a relatively small dataset.

We particularly focus on the issue of common-

sense learning, enforcing the common ground

knowledge by speciﬁcally training different

expert networks to capture different kinds of

relationships between each passage, question

and choice triplet. Moreover, we take inspi-

ration on the recent advancements of multi-

task and transfer learning by training each net-

work a relevant focused task. By making

the mixture-of-networks aware of a speciﬁc

goal by enforcing a task and a relationship,

we achieve state-of-the-art results and reduce

over-ﬁtting.

1 Introduction

Teaching a computer to read and comprehend

human languages is a challenging task for ma-

chines as this requires the understanding of nat-

ural languages and the ability to reason over vari-

ous clues. The task of Machine Reading Compre-

hension (MRC) is a useful benchmark to demon-

strate natural language understanding. In recent

years, several datasets have been created to focus

on answering questions as a way to evaluate ma-

chine comprehension. The machine is ﬁrst pre-

sented with a piece of text such as a news article or

a story and is then expected to answer one or mul-

tiple questions related to the text. One of the big

challenges of the ﬁeld is to provide a system that

is able to infer relationships that are not entirely

based on the passage: reaching commonsense.

Powerful approaches have been explored to

solve this issue by applying attention mechanisms

over passage, choice and answer. However, by try-

ing to understand the three different aspects con-

comitantly, they usually fail to capture the dis-

tinct commonsense relationship between question-

choice or passage-choice pairs.

We propose the expert networks QC (question-

choice) and PC(passage-choice) which speciﬁ-

cally are trained to learn the different structures

necessary to answer the questions. Our goal is to

achieve commonsense knowledge by directly en-

forcing the network to build question-choice and

passage-choice relationships. For a human being,

regardless of the passage, it should be clear that

“ﬁreworks” is a more likely answer than “water”

for the question “how did the ﬁre start?” unless

the passage clearly states the contrary. Addition-

ally, inspired by the efﬁcacy of multi-task learn-

ing techniques, we propose a Task-Aware Expert

Training, where each network is trained on a dif-

ferent but relevant task in order to improve their

overall inference capabilities.

2 Related work

A lot of the work in Neural Machine comprehen-

sion focuses on how to extract the required infor-

mation from the given passage. Recent approaches

have had enormous success, for instance: R-net

(Wang et al.,2017b) and Match-LSTM (Wang and

Jiang,2016). However this might not be a good

solution for MRC when there is a need of gener-

ating additional text not included in the passage or

the question, augmenting information in multiple

passage spans and the question as and when re-

quired. The work from (Weston et al.,2014), (Tan

et al.,2017), (Cui et al.,2016) are examples of this

change of paradigm. An interesting adaptation in-

volves using single or multiple turns of reasoning

to effectively exploit the relation among queries,

documents, and answers. (Trischler et al.,2016),

(Hermann et al.,2015), (Shen et al.,2016), (Xu

et al.,2017), (Gupta et al.,2019), (Anirudha et al.,

2014b) and (Larionov et al.,2018) are great exam-

arXiv:2210.01750v1 [cs.CL] 4 Oct 2022

ples of recent relevant strategies. Some more rele-

vant work is available in (Anirudha et al.,2014a,c)

One intuitive line of work in Machine com-

prehension uses common sense knowledge along

with the comprehension text to generate answers.

Common-sense, knowledge increases the accu-

racy of machine comprehension systems. The

challenge is to ﬁnd a way to include this addi-

tional data and improve the system’s performance.

There are many possible common-sense knowl-

edge sources. Generally, script knowledge which

is sequences of events that describe typical human

actions in an everyday situations is used.

The work from (Lin et al.,2017a) shows

how a multi-knowledge reasoning method, which

explores heterogeneous knowledge relationships,

can be powerful for commonsense MRC. This ap-

proach is achieved by combining different kinds

of structures for knowledge: narrative knowledge,

entity semantic knowledge and sentiment coherent

knowledge. By using data mining techniques, they

provide a model with cost-based inference rules

as an encoding mechanism for knowledge. Later,

they are able to produce a multi-knowledge rea-

soning model that has the ability to select which

inference rule to use for each context.

Another interesting approach comes from

(Wang et al.,2017a) where the authors proposed

the usage of Conditional Generative Adversarial

Network (CGAN) to tackle the problem of insufﬁ-

cient data for reading comprehension tasks by gen-

erating additional fake sentences and the proposed

generator is conditioned by the referred context,

achieving state-of-the-art results.

The work by (Wang,2018a) assesses how a

Three-Way Attentive Network (TriAN) with the

inclusion of commonsense knowledge beneﬁts

multiple choice reading comprehension. The com-

bination of attention mechanisms have shown to

strongly improve performance for reading com-

prehension. In addition to that, commonsense

knowledge can help in inferring nontrivial im-

plicit events within the comprehension passage.

The work by (Lin et al.,2017b) focuses on rea-

soning with heterogeneous commonsense knowl-

edge. They use three kinds of commonsense

knowledge, causal relations, semantic relations

like co-reference,associative relations and lastly

sentiment knowledge - sentiment coherence (pos-

itivity and negativity) between two elements. In

human reasoning process, not all inference rules

have the same possibility to be applied, because

the more reasonable inference will be proposed

more likely. They use attention to weigh the in-

ferences based on the nature of the rule and the

given context. Their attention mechanism mod-

els the possibility that an inference rule is applied

during the inference from a premise document to a

hypothesis by considering the relatedness between

elements and knowledge category, as well as the

relatedness between two elements. They answer

the comprehension task by summarizing over all

valid inference rules.

Although Mixture of Experts models are widely

used for NLP tasks, it is still underused for Ma-

chine Reading Comprehension tasks. Recent work

includes the Language Model paper from (Le

et al.,2016) which introduces an LSTM-based

mixture method for the dynamic integration of

a group of word prediction experts in order to

achieve conditional language model which excels

simultaneously at several subtasks. Moreover, the

work from (Xiong et al.,2017) also includes a

sparse mixture of experts layer for a Question-

Answering task, which is inherited from the previ-

ous work of (Shazeer et al.,2017) on a Sparsely-

Gated Mixture-of-Experts Layer. The success

of the aforementioned approaches show the opti-

mism about introducing Mixture of Experts Deep

Learning for Machine Reading Comprehension.

3 Model description

We experiment with a mixture of experts model

to tackle the task of commonsense machine com-

prehension. The model is inspired from ana-

lyzing the errors made by a triple attention net-

work from (Wang,2018b) which achieves state-

of-the-art results by using a Three-Way Attentive

Network (TriAN) with the inclusion of common-

sense knowledge in the form of relational em-

beddings obtained from the ConceptNet knowl-

edge graph which is a large-scale graph of com-

monsense knowledge consisting of over 21 million

edges and 8 million nodes.

A training example in the commonsense com-

prehension task consists of a passage (P), question

(Q), answer (A) and label y which is 0 or 1. P,

Q and A are all sequences of words. For a word

Piin the given passage, the input representation

of Piis the concatenation of several vectors: Pre-

trained GloVe embeddings, Part-of-speech (POS)

embedding, named-entity embedding (NE), Con-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ModularApproachtoMachineReadingComprehension:MixtureofTask-AwareExpertsAnirudhaRayasamLanguageTechnologiesInstituteSchoolofComputerScienceCarnegieMellonUniversityarayasam@andrew.cmu.eduAnushaKamathLanguageTechnologiesInstituteSchoolofComputerScienceCarnegieMellonUniversityakamath1@andrew.cmu.eduGabr...

展开>> 收起<<

Modular Approach to Machine Reading Comprehension Mixture of Task-Aware Experts Anirudha Rayasam.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Modular Approach to Machine Reading Comprehension Mixture of Task-Aware Experts Anirudha Rayasam

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: