Speeding Up Question Answering Task of Language Models via Inverted Index Xiang Ji Yesim Sungu-Eryilmaz Elaheh Momeni and Reza Rawassizadeh

2025-05-03 0 0 711.44KB 8 页 10玖币

侵权投诉

Speeding Up Question Answering Task of

Language Models via Inverted Index

Xiang Ji, Yesim Sungu-Eryilmaz, Elaheh Momeni, and Reza Rawassizadeh

jjixiang@bu.edu, yesims@bu.edu,

momeni.elaheh@gmail.com,rezar@bu.edu

1Boston University, Boston MA 02215, USA

2eMentalist, Vienna, Austria

Abstract. Natural language processing applications, such as conversa-

tional agents and their question-answering capabilities, are widely used

in the real world. Despite the wide popularity of large language models

(LLMs), few real-world conversational agents take advantage of LLMs.

Extensive resources consumed by LLMs disable developers from inte-

grating them into end-user applications. In this study, we leverage an

inverted indexing mechanism combined with LLMs to improve the ef-

ﬁciency of question-answering models for closed-domain questions. Our

experiments show that using the index improves the average response

time by 97.44%. In addition, due to the reduced search scope, the aver-

age BLEU score improved by 0.23 while using the inverted index.

Keywords: Inverted Index ·Question Answering ·Large Language Model.

1 Introduction and background

Advances in large language models (LLMs), especially employing the transformer

architecture [5], revolutionized the quality of natural language processing appli-

cations. However, training an LLM with state-of-the-art architecture, including

several layers of decoder/encoder and attention, is computationally very expen-

sive and not possible by small and medium enterprises with limited budgets.

There are promising works to reduce the model sizes of LLMs such as com-

pression [9,19], quantization [20] or the use of knowledge distillation [11], but

still, larger models are favored due to their accuracy.

This issue leads to the introduction of services such as huggingface[7] or

spacy[10], which can share the trained (pre-trained and ﬁne-tuned) LLMs. These

services enable developers to beneﬁt from trained LLM, while not dealing with

the expensive model training process. However, executing a query on trained

LLMs still consumes a signiﬁcant amount of time. As the number of words in

the text increases, the response time of these models in answering the questions

increases exponentially. For example, if LLMs are used to answer questions based

on a book, they take an extremely long time to answer a question, which is not

practical in real-world applications. Response time has a direct correlation with

the usability of the application [18,15]. Therefore, slow response time hinders

the adaption of large language models into end-user applications.

arXiv:2210.13578v1 [cs.CL] 24 Oct 2022

2 Xiang et al.

In this work, we employ ﬁve popular transformer models for question answer-

ing (Q&A), i.e. BERT-base[5], BERT-large[5], DistilBERT[17], RoBERTa[12],

and Tiny-RoBERTa[12]. Then, we develop an inverted index layer [21] that re-

duces the execution time while maintaining and improving the accuracy of the

Q&A models. While extracting answers from a large amount of text (such as

a book), our result shows that our approach has a signiﬁcant improvement in

both response time and accuracy over the baseline models (not using an inverted

index).

2 Method

Our proposed architecture (see Figure 1) includes two phases and ﬁve steps.

Phase 1 includes two steps. Step 1 analyzes the large corpus of the text and

extracts its keywords from each paragraph. For example, the keyword “bench

press“ was extracted from several paragraphs from a ﬁtness book, and "SVM"

was extracted from several paragraphs from a machine learning book.

Fig. 1. Our proposed architecture. First, it identiﬁes the keywords from the user’s

question and then searches the relevant paragraphs based on extracted user keywords.

Dotted lines present phase 1 and straight lines present phase 2.

Step 2 builds an inverted index on the extracted keywords on the paragraph

level for the given book. Phase 1 (steps 1 and 2) should be done oﬄine and once

for every book. To build the inverted index, we do not use an existing library

such as Lucene [2] or SOLR [3]. Instead, we perform the indexing locally to have

control over the accuracy of extracted keywords, which is impossible to do with

the current inverted indexing method.

In Phase 2, step 3 parses the user’s question and extracts its keywords. Step

4 matches the keywords from the question with keywords inside the inverted

index. Step 5, which checks if there is a match, then it ﬁnds the given page and

paragraph of the index and feeds this text into the LLM module (e.g. BERT)

to ﬁnd the answer. Instead of searching the entire text corpus, it only searches

locations speciﬁed by an inverted index.

To implement Phase 1, step 1, we experiment with two keyword extraction

algorithms (Rake[16] and keyBERT[8]) to extract one or two keywords for each

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SpeedingUpQuestionAnsweringTaskofLanguageModelsviaInvertedIndexXiangJi,YesimSungu-Eryilmaz,ElahehMomeni,andRezaRawassizadehjjixiang@bu.edu,yesims@bu.edu,momeni.elaheh@gmail.com,rezar@bu.edu1BostonUniversity,BostonMA02215,USA2eMentalist,Vienna,AustriaAbstract.Naturallanguageprocessingapplications,suc...

展开>> 收起<<

Speeding Up Question Answering Task of Language Models via Inverted Index Xiang Ji Yesim Sungu-Eryilmaz Elaheh Momeni and Reza Rawassizadeh.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Speeding Up Question Answering Task of Language Models via Inverted Index Xiang Ji Yesim Sungu-Eryilmaz Elaheh Momeni and Reza Rawassizadeh

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: