Variational Open-Domain Question Answering Valentin Liévin1 2Andreas Geert Motzfeldt1Ida Riis Jensen1Ole Winther1 2 3 4 Abstract

2025-04-26 0 0 2.39MB 28 页 10玖币

侵权投诉

Variational Open-Domain Question Answering

Valentin Liévin 1 2 Andreas Geert Motzfeldt 1Ida Riis Jensen 1Ole Winther 1234

Abstract

Retrieval-augmented models have proven to be

effective in natural language processing tasks, yet

there remains a lack of research on their opti-

mization using variational inference. We intro-

duce the Variational Open-Domain (VOD) frame-

work for end-to-end training and evaluation of

retrieval-augmented models, focusing on open-

domain question answering and language mod-

elling. The VOD objective, a self-normalized

estimate of the Rényi variational bound, approx-

imates the task marginal likelihood and is evalu-

ated under samples drawn from an auxiliary sam-

pling distribution (cached retriever and/or approx-

imate posterior). It remains tractable, even for re-

triever distributions deﬁned on large corpora. We

demonstrate VOD’s versatility by training reader-

retriever BERT-sized models on multiple-choice

medical exam questions. On the MedMCQA

dataset, we outperform the domain-tuned Med-

PaLM by +5.3% despite using 2.500

fewer pa-

rameters. Our retrieval-augmented BioLinkBERT

model scored 62.9% on the MedMCQA and

55.0% on the MedQA-USMLE. Last, we show

the effectiveness of our learned retriever compo-

nent in the context of medical semantic search.

1. Introduction

Scaling Transformer-based (Vaswani et al., 2017) language

models (LMs) with larger datasets and more parameters

(Radford et al., 2018; Kaplan et al., 2020; Hoffmann et al.,

2022) led to sustained improvements in various downstream

Equal contribution

Section for Cognitive Systems, Technical

University of Denmark, Denmark

FindZebra, Denmark

Center

for Genomic Medicine, Rigshospitalet, Copenhagen University

Hospital, Denmark

Bioinformatics Centre, Department of Biol-

ogy, University of Copenhagen, Denmark. Correspondence to:

Valentin Liévin <valv@dtu.dk>.

Proceedings of the

40 th

International Conference on Machine

Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright

2023 by the author(s).

Figure 1.

Parameter efﬁciency. Answering accuracy of baseline

methods and of VOD (BioLinkBERT backbone) on MedMCQA.

tasks.

However, large language models (LLMs) may reach

a plateau in their performance due to the limitations of the

implicit knowledge they possess, being incomplete, ﬂawed

or out-of-date. Open-domain question answering (ODQA)

consists of augmenting LMs with external knowledge bases

indexed with a retrieval mechanism. This approach was

popularized in the question-answering setting by Chen et al.

(2017) and was later applied to the task of language mod-

elling itself (Guu et al., 2020; Lewis et al., 2020; Borgeaud

et al., 2021; Izacard et al., 2022).

However, optimizing deep retrievers is challenging, unless

there is a set of annotated evidence documents that are sufﬁ-

ciently aligned with the target task, as explored in Karpukhin

et al. (2020); Qu et al. (2021); Khattab & Zaharia (2020).

An alternative approach is to model the whole collection of

documents as a latent variable (Lee et al., 2019), but this still

poses challenges for optimization, especially considering

that documents are discrete quantities.2

This research ﬁlls a gap in the literature by exploring the op-

timization of retrieval-augmented models using variational

inference. We introduce a probabilistic framework that ex-

tends Rényi divergence variational inference (Li & Turner,

Find a benchmark of LLMs in Srivastava et al. (2022), read about LLMs in

Brown et al. (2020); Rae et al. (2021); Chowdhery et al. (2022); Thoppilan et al.

(2022); Hoffmann et al. (2022); Smith et al. (2022); Zhang et al. (2022); Lieber et al.

(2021); Fedus et al. (2021); Laurençon et al. (2022).

Learn more about discrete latent variable optimization in Hinton et al. (1995);

Le et al. (2018); Mnih & Gregor (2014); Mnih & Rezende (2016); van den Oord et al.

(2017); Tucker et al. (2017); Grathwohl et al. (2017); Masrani et al. (2019); Liévin

et al. (2020).

arXiv:2210.06345v2 [cs.CL] 31 May 2023

Variational Open-Domain Question Answering

2016), allowing us to estimate the marginal task likelihood

and its gradient by sampling from an approximate posterior.

The proposed framework is versatile and applies to vari-

ous settings, including extractive, generative, and multiple-

choice models for open-domain question answering, as well

as the training of retrieval-enhanced language models.

To demonstrate the effectiveness of the framework, we

train reader-retriever BioLinkBERT models end-to-end on

multiple-choice medical QA tasks and achieve a new state-

of-the-art on the MedMCQA of

62.9

%, outperforming the

current 540B parameter domain-tuned Med-PaLM by +5.3%

(Singhal et al., 2022) using 2.500

fewer parameters (Fig-

ure 1). On the challenging MedQA-USMLE, we score

55.0

%: a new state-of-the-art in the open-domain setting.

We highlight the main contributions of this paper as follows:

The VOD framework: tractable, consistent, end-to-end

training of retrieval-augmented models.

Popularizing Rényi divergence variational inference

for natural language tasks.

Truncated retriever parameterization: relaxing the top-

Kretriever approximation to using top P≥K.

In addition to our theoretical contributions, we release Med-

Wiki: a subset of Wikipedia tailored to the MedMCQA and

USMLE dataset for low-resource research.

2. VOD: a Probabilistic Framework for

Retrieval-augmented Tasks

Let a question

be deﬁned in a space

Ω

(e.g., the space

of sequences of tokens) and the set of possible answers

A⊂Ω

with a correct answer denoted

a∈A

. We

introduce a corpus of

documents

D:={d1,...,dN} ∈

ΩN

. In open-domain tasks, we are interested in modelling

the marginal task likelihood with a reader-retriever model

pθ(a,d|q):=pθ(a|d,q)pθ(d|q)parameterized by θ:

pθ(a|q):=X

d∈D

pθ(a|d,q)

| {z }

reader

pθ(d|q)

| {z }

retriever

.(1)

Variational inference (Jordan et al., 1999; Kingma &

Welling, 2013; Burda et al., 2015) allows estimating the

marginal task likelihood eq. (1) using samples drawn from

an approximate posterior

rϕ(d|a,q)

. This consists of evalu-

ating the evidence lower bound (ELBO), a log-likelihood

lower bound.

In open-domain applications, the approxi-

mate posterior, with parameter

, can be deﬁned using either

a keyword-search engine (BM25; Robertson & Zaragoza

(2009)), a checkpoint of

pθ(d|q)

, or a model learned jointly.

We introduce the VOD framework in four acts: i) Why

Rényi divergence variational inference can aid likelihood-

3LELBO(a,q):= log pθ(a,q)− DKL (rϕ(d|a,q)∥pθ(d|a,q))

based learning, ii) The VOD objective: a tractable self-

normalized importance sampling estimate of the Rényi

bound, iii) A truncated retriever parameterization that gen-

eralizes existing approaches and iv) A discussion on the

application of the VOD framework.

2.1. Rényi Divergence Variational Inference

Figure 2.

Depicts the core component of the VOD framework: the

importance-weighted Rényi Variational Bound (IW-RVB) as a

function of the parameter

α∈[0,1]

and the number of samples

K≥1

. As the value of

and

increase, the IW-RVB becomes

a more accurate estimate of the likelihood of a given task, demon-

strating how we use VOD to optimize retrieval-augmented models

through the manipulation of

and

. See how the parameter

affects the training dynamics in Figure 7, Appendix G.

Rényi divergence variational inference (Li & Turner, 2016)

extends traditional variational inference (Jordan et al.,

1999; Kingma & Welling, 2013). Given a parameter

α < 1

and the importance weight

w1−α

θ,ϕ (a,q,d):=

pθ(a,d|q)r−1

ϕ(d|a,q)

the variational Rényi bound (RVB)

deﬁned as

Lα(a,q):=1

1−αlog Erϕ(d|a,q)hw1−α

θ,ϕ (a,q,d)i(2)

RVB is a lower bound of the marginal log-likelihood for

α≥0

and is extended by continuity in

α= 1

Lα=1(a,q):= limα→1Lα(a,q)

where it equals the ELBO.

In practice, the RVB and its gradients can be estimated us-

ing

documents sampled from

rϕ(d|a,q)

. The resulting

importance sampling estimate yields another bound: the

Importance Weighted RVB (IW-RVB; Li & Turner (2016)):

α(a,q):=1

1−αlog 1

i=1

w1−α

θ,ϕ (a,q,di)(3)

d1,...,dK

iid

∼rϕ(d|a,q)

which aligns with the importance-weighted bound (IWB;

Burda et al. (2015)) in

α= 0

. To sum up, the main proper-

Variational Open-Domain Question Answering

ties of the RVB and the IW-RVB are (α≥0):

Lα=0(a,q) = log pθ(a|q)Lα→1(a,q) =LELBO(a,q)

Lα≥0(a,q)≤log pθ(a|q)LK

α(a,q)≤Lα(a,q).

RVB gradient The gradient of the RVB w.r.t. θis:

∇θLα(a,q) = Erϕ^

w1−α

θ,ϕ (a,q,d)∇θlog pθ(a,d|q)

where the normalized importance weight is deﬁned as

w1−α

θ,ϕ (a,d):=w1−α

θ,ϕ (a,q,d)

Erϕ(d′|a,q)hw1−α

θ,ϕ (a,d′,q)i.(5)

In this paper, we consider the sampling distribution

rϕ

to be

static and therefore do not estimate the gradient w.r.t. the

approximate posterior. Optimizing the parameter

jointly

with

can be done by application of importance sampling

coupled with variance reduction techniques (Burda et al.,

2015; Mnih & Rezende, 2016; Le et al., 2018; Masrani et al.,

2019; Kool et al., 2019b; Liévin et al., 2020).

Stabilizing training using the RVB Considering the opti-

mization of the parameter

, a looser bound (e.g., the ELBO)

might be preferred to a tighter one (e.g., the IWB).

In this

paper, we explore interpolating between variational bounds

using the parameter

of the RVB. We argue that, even for

a non-trainable parameter

, optimizing for a looser bound

can overcome early optimization challenges.

For

α= 0

, the RVB aligns with the marginal log-likelihood

independently of the choice of the approximate posterior.

However, when the importance weight

wθ,ϕ(q,a,d)

suffers

from high variance, so does the Monte Carlo estimate of the

marginal likelihood and its gradient.5

For

α= 1

, the RVB matches the ELBO and the gradients

restricted to the reader and retriever decomposes as:

∇θREAD.Lα=1(a,q) = Erϕ(d|a,q)[∇θlog pθ(a|d,q)]

∇θRETR.Lα=1(a,q) = −∇θDKL (rϕ(d|a,q)∥pθ(d|q)) .

Maximizing the ELBO corresponds to optimizing the reader

and the retriever disjointly. On the reader side, this equals

maximizing the answer likelihood

pθ(a|d,q)

in expectation

over

rϕ(d|a,q)

independently of the value of

pθ(d|q)

. On

the retriever side, this corresponds to matching the approx-

imate posterior with the learned retriever

pθ(d|q)

. This

Exploring using hybrid ELBO/IWB objectives has been ex-

plored in Rainforth et al. (2018), interpolating the RVB has been

explored in Liévin et al. (2020).

See Kong (1992); Owen (2013); Nowozin (2015) for an intro-

duction and discussion about variance and importance sampling.

can be seen as an instance of knowledge distillation of the

posterior into the retriever. After an initial learning phase,

the RVB can be smoothly interpolated from the ELBO to

the marginal task likelihood by controlling the parameter

2.2. VOD objective

In ODQA applications, the IW-RVB eq. (3) is generally

intractable due to the normalization constant in eq. (8a)

which requires evaluating all documents.

The VOD objective is an approximation of the IW-RVB

which can be evaluated using

documents sampled

without replacement from rϕ(d|a,q). It is deﬁned as:

α(a,q):=1

1−αlog

i=1

siˆv1−α

θ,ϕ (a,q,di)(6)

(d1, si),...,(dK, sK)priority

∼rϕ(d|a,q).

where the self-normalized importance weight

ˆvθ,ϕ

deﬁned using the un-normalized retrieval density ratio

ζ(d)∝pθ(d|q)r−1

ϕ(d|a,q)as:

ˆvθ,ϕ :=pθ(a|q,di)ζ(di)



j=1

sjζ(dj)



−1

(7)

The set of documents

d1,...,dK

are sampled with-

out replacement from

rϕ(d|a,q)

using priority sam-

pling (Dufﬁeld et al., 2007). The sampling proce-

dure comes with importance weights

s1, . . . , sk

de-

ﬁned such that for a function

h(d)

i=1 sih(di)≈

Erϕ(d|a,q)[h(d)]

. We present priority sampling in

greater length in Appendix A.

The VOD objective and its gradient are consistent (i.e.,

converge to the RVB in the limit

K→N

with proba-

bility one) and can be evaluated with complexity

O(K)

whereas the IW-RVB is of complexity

O(N)

. Further-

more, the VOD objective approximates the IW-RVB,

which itself is guaranteed to approximate the marginal

task log-likelihood more tightly as

K→N

(Burda

et al., 2015).

The VOD objective is derived in Appendix B, the VOD

gradient is deﬁned in Appendix C. Our implementa-

tion of the sampling methods and the VOD objective is

available at http://github.com/VodLM/vod.

2.3. Truncated retriever parameterization

The VOD framework is compatible with retrievers deﬁned

on the whole corpus (

documents). However, in our ap-

proach, we truncate the retriever to consider only the top

Variational Open-Domain Question Answering

documents, where

K < P ≪N

refers to the num-

ber of sampled documents, while

represents the pool of

documents from which the top

documents are selected.

This truncation provides two key advantages: i) it enables

efﬁcient caching or retention of document scores, as only

documents need to be stored in memory, and ii) the value

serves as an exploration-exploitation threshold: a higher

value of

yield greater diversity in document sampling,

promoting exploration. While, a smaller value of

en-

sures that during training, all documents in the set

Tϕ

are

more likely visited, facilitating exploitation of the available

information.

Assuming the retrieval distributions to be described by score

functions

fθ: Ω2→R

and

fϕ: Ω3→R

. We deﬁne the

truncated retrievers as:6

pθ(d|q):=[d∈ Tϕ] exp fθ(d,q)

Pd′∈Tϕexp fθ(d′,q)(8a)

rϕ(d|a,q):=[d∈ Tϕ] exp fϕ(a,q,d)

Pd′∈Tϕexp fϕ(a,q,d′)(8b)

where

Tϕ

is the set of the top

P≤N

documents ranked

by the score

fϕ(a,q,d)

. The score function

fθ

and

fϕ

can be implemented using BM25 and/or contextual vector

representations extracted using pretrained language mod-

els such as DPR or ColBERT (Karpukhin et al., 2020;

Khattab & Zaharia, 2020). For instance using a dual-

encoder model

fθ(d,q) = BERTθ(d)TBERTθ(q)

and

fϕ(a,q,d) = BERTϕ([q;a])TBERTϕ(d)

where

BERT

is the function that return the output of a BERT model

at the CLS token and

[·;·]

is the concatenation operator.

Retrieving the top

documents is efﬁcient when using

elasticsearch7and/or faiss (Johnson et al., 2021).

2.4. Applying VOD

In this paper, we show how to apply the VOD framework

to multiple-choice ODQA. Nevertheless, VOD is general-

purpose and designed for latent variable models deﬁned on

a discrete and ﬁnite space. In NLP, it applies to a wide range

of settings such as generative, extractive, multiple-choice

ODQA as well as retrieval-augmented language modelling.

Find a non-exhaustive list of examples in Appendix E.

3. Related work

VOD aids the development of retrieval-augmented models

for language modeling (LM) tasks. In this section, we

review previous work on retrieval for LM, and compare

to VOD (summarized with references in Table 1).

When

P > K

, evaluating the retriever density eq. (8a) is

generally intractable due to the sum over Pdocuments.

7http://www.elastic.co/

Table 1.

Deep retrievers in literature, detailing if training was end-

to-end, variational, as well the size of support during training.

Method Retriever training

End-to-end

learning

Posterior

Guided

Retriever

Support

DPR1Supervised ✗ ✗ –

ColBERT2Supervised ✗ ✗ –

Contriever3Self-supervised ✗ ✗ –

FiD4Frozen DPR dual-encoder ✗ ✗ –

RETRO5Frozen BERT dual-encoder ✗ ✗ –

ORQA6Self-supervised + MLL*(✓)✗top-Kdoc.

RAG7MLL*+ frozen DPR doc. encoder (✓)✗top-Kdoc.

REALM8Self-supervised + MLL*✓ ✗ top-Kdoc.

EMDR-29Self-supervised + Expect.-Max. ✓ ✓ top-Kdoc.

Hindsight10 ColBERT init. + ELBO + MLL*✓ ✓ top-Kdoc.

VOD Rényi variational bound ✓ ✓ top-Pdoc.†

1Karpukhin et al. (2020), 2Khattab et al. (2021), 3Izacard et al. (2021), 4Izacard & Grave (2020)

5Borgeaud et al. (2021), 6Lee et al. (2019), 7Lewis et al. (2020), 8Guu et al. (2020)

9Sachan et al. (2021), 10 Paranjape et al. (2021), *MLL: marginal log-likelihood

†K≤P≤N(K:# of documents in a batch, N: corpus size, P: chosen)

Learning to search Retrieval-based training have gained

much attention for improving pre-trained LMs. ORQA and

Contriever proposed a self-supervised approach using con-

trastive learning to match a text passage with its context,

and is widely adopted in pre-training to enable zero-shot

retrieval (Inverse Cloze Task; Lee et al. (2019)). In contrast,

DPR and ColBERT use supervised contrastive learning with

questions paired to annotated documents. This method has

sparked many retrieval-augmented attempts such as FiD,

RETRO, and RAG to enhance auto-regressive LMs con-

ditioned on a frozen retriever. ORQA and REALM, later

followed by RAG, EMDR, Hindsight, and VOD proposed

optimizing both a retrieval component and a reader or lan-

guage modelling component end-to-end, by maximizing the

marginal log-likelihood (MLL).

Posterior guided supervision Many efforts has been

devoted to leveraging external knowledge with posterior

guided supervision. EMDR learns a retriever end-to-

end with an Expectation-Maximization objective evalu-

ated under the posterior distribution of

pθ(d|a,q)∝

pθ(d|q)pθ(a|d,q)

, while Hindsight optimizes the varia-

tional lower-bound (ELBO) evaluating under a target-aware

approximate posterior

rϕ(d|a,q)

. Among previous meth-

ods, Hindsight is most akin to VOD as both methods rely

on maximizing a variational bound. Nonetheless, VOD in-

troduces the more general Rényi variational bound, which

offers to model the sampling distribution explicitly. Ul-

timately, a more principled approach makes VOD more

versatile and capable of handling a wider range of problems.

Navigating large knowledge bases The large size of

knowledge bases such as Wikipedia makes it computation-

ally intractable to consider all

documents when comput-

ing MLL. To address this, all related methods rely on a strict

truncation of the retriever to the top-

cached documents.

In contrast to these aforementioned approaches, which lim-

its to a ﬁxed set of

documents, we propose a truncated

Variational Open-Domain Question Answering

Table 2.

Summarizes the medical QA datasets and corpora used in

our study, including the MedMCQA, USMLE, and FindZebra (FZ)

corpus, with the MedWiki as the knowledge base for all QA tasks.

The questions are numbered for the train/validation/test splits.

DATASETS MEDMCQA USMLE FZ QUERIES

QUESTIONS 182.8K/4.2K/6.1K10.2K/1.3K/1.3K248

CORPORA WIKPEDIA MEDWIKI FZ CORPUS

ARTICLES 6.6M 293.6K30.7K

PASSAGES – 7.8M 711.9K

retriever parameterization that works hand-in-hand with our

principled objective to handle over top

P > K

documents.

Ultimately, this allows for more diverse document sampling

during training and allows reducing the bias induced by trun-

cating the retriever distribution. In Appendix D, we show

that the top-

MLL is a special case of VOD for

K=P

and α= 0.

4. Experiments

In this section, we present the medical domain tasks and

datasets, results on end-to-end multiple-choice ODQA and

its application to information retrieval. The code and

datasets are available on GitHub.8

4.1. Datasets

The datasets utilized for the medical domain are summa-

rized in Table 2. We introduce the MedWiki, a subset of

Wikipedia targeted to medical QA tasks.

MedMCQA Pal et al. (2022) is a large-scale multiple-

choice question answering dataset collected from Indian

medical school entrance exams (AIIMS and NEET-PG). It

covers several medical topics (dentistry, pathology, surgery,

preventive medicine, etc.) and question types (diagnosis,

recalling expert factual knowledge, mathematical problems,

etc.)

MedQA-USMLE Jin et al. (2021)) is a collection of med-

ical questions from the US medical board exam. The ques-

tions aim to assess human doctors’ medical knowledge and

decision-making. Each question includes a medical history,

vital signs (e.g., blood pressure, temperature), and possibly

a speciﬁc analysis (e.g., CT scan).

MMLU Hendrycks et al. (2021) is a dataset for assessing

the knowledge acquired during pre-training by evaluating

models in a zero-shot setting. The test set comprises 57

tasks spanning different domains. We limit our analysis to

the subcategories psychology,biology, and health.9

8https://github.com/findzebra/fz-openqa

The subcategory professional_medicine corresponds to the

MedQA-USMLE questions.

MedWiki We release the MedWiki corpus (under MIT

license): a collection of

4.5%

of articles taken from the En-

glish Wikipedia and targeted to the MedMCQA and USMLE

datasets. The MedWiki corpus was built by querying each

answer option from the MedMCQA and USMLE datasets

against the Wikipedia API. Read more in Appendix H.

FindZebra corpus & queries FindZebra is a search tool

for assisting in the diagnosis of rare diseases that is built

on open-source information retrieval software (BM25) tai-

lored to this problem (Dragusin et al., 2013). The FindZebra

corpus indexes a collection of curated articles from various

reputable databases: GARD, GeneReviews, Genetics Home

Reference, OMIM, Orphanet, and Wikipedia. Each article

is referenced with a Concept Unique Identiﬁer (CUI) from

the Uniﬁed Medical Language System (UMLS; Bodenrei-

der (2004)). We use a collection of 248 publicly available

search queries (FZ queries). Each query is labelled with a

reference diagnostic, allowing to benchmark medical search

engines.10

4.2. VOD for multiple-choice QA

In the multiple-choice question answering (MCQA) set-

ting, we consider a vector of

answer options

[a1,...,aM]

, where

⋆

represents the index of the correct

option. Similarly, we deﬁne a vector of

queries as

Q= [q1,...,qM]

, where

qj= [q;aj]

represents the

concatenation of the question with the answer option of

index j. Additionally, we denote a vector of Mdocuments

D= [d1,...,dM]∈DM

, and the set of

combinations

of documents as

D(M)

, which contains

document vec-

tors. The marginal likelihood is deﬁned as follows:

pθ(a⋆|Q):=X

D∈D(M)

pθ(D|Q)pθ(a⋆|D,Q).(9)

To model this problem, we introduce i) a reader model

gθ: Ω2→R

, which evaluates the likelihood of answer

option

j∈[1, ..., M]

given the query and a tuple of

doc-

uments

d1,...,dK

, and ii) we deﬁne a truncated retriever

model

pθ(d|qj)

and

rϕ(d|qj)

, which retrieves

document

speciﬁc to each answer option. As described in eq. (8a),

these models are parameterized by scores

fθ(d,qj)

and

fϕ(d,qj)

respectively. The reader and retriever models are

deﬁned as:

pθ(a⋆|D,Q):=exp gθ(d⋆,q⋆)

j=1 exp gθ(dj,qj)(10)

pθ(D|Q):=

j=1

pθ(dj|qj), rϕ(D|Q) =

j=1

rϕ(dj|qj).

10https://huggingface.co/datasets/findzebra

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

VariationalOpen-DomainQuestionAnsweringValentinLiévin12AndreasGeertMotzfeldt1IdaRiisJensen1OleWinther1234AbstractRetrieval-augmentedmodelshaveproventobeeffectiveinnaturallanguageprocessingtasks,yetthereremainsalackofresearchontheiropti-mizationusingvariationalinference.Weintro-ducetheVariationalOpen...

展开>> 收起<<

Variational Open-Domain Question Answering Valentin Liévin1 2Andreas Geert Motzfeldt1Ida Riis Jensen1Ole Winther1 2 3 4 Abstract.pdf

共28页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Variational Open-Domain Question Answering Valentin Liévin1 2Andreas Geert Motzfeldt1Ida Riis Jensen1Ole Winther1 2 3 4 Abstract

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: