Textual Entailment Recognition with Semantic Features from Empirical Text Representation

2025-05-02 0 0 282.71KB 12 页 10玖币

侵权投诉

TEXTUAL ENTAILMENT RECOGNITION WITH

SEMANTIC FEATURES FROM EMPIRICAL TEXT

REPRESENTATION

A PREPRINT

Md Shajalal1,6, Md Atabuzzaman2,4, Maksuda Bilkis Baby2,

Md Rezaul Karim1,3, Alexander Boden1,5

1Fraunhofer Institute for Applied Information Technology FIT, Germany

2Hajee Mohammad Danesh Science and Technology University, Bangladesh

3RWTH Aachen University, Germany

4Bangladesh University of Engineering and Technology

5Bonn-Rhein-Sieg University of Applied Sciences, Germany

6University of Siegen, Germany

atabuzzaman@gmail.com

ABSTRACT

Textual entailment recognition is one of the basic natural language understand-

ing (NLU) tasks. Understanding the meaning of sentences is a prerequisite

before applying any natural language processing (NLP) techniques to automati-

cally recognize the textual entailment. A text entails a hypothesis if and only if

the true value of the hypothesis follows the text. Classical approaches generally

utilize the feature value of each word from word embedding to represent the

sentences. In this paper, we propose a novel approach to identifying the textual

entailment relationship between text and hypothesis, thereby introducing a

new semantic feature focusing on empirical threshold-based semantic text

representation. We employ an element-wise Manhattan distance vector-based

feature that can identify the semantic entailment relationship between the text-

hypothesis pair. We carried out several experiments on a benchmark entailment

classiﬁcation (SICK-RTE) dataset. We train several machine learning (ML)

algorithms applying both semantic and lexical features to classify the text-

hypothesis pair as entailment, neutral, or contradiction. Our empirical sentence

representation technique enriches the semantic information of the texts and

hypotheses found to be more efﬁcient than the classical ones. In the end,

our approach signiﬁcantly outperforms known methods in understanding the

meaning of the sentences for the textual entailment classiﬁcation task.

This is the pre-print version of our accepted and presented paper at International Conference on Speech &

Language Technology for Low-resource Languages (SPELLL’2022)

arXiv:2210.09723v4 [cs.CL] 19 Jun 2023

Textual Entailment Recognition with Semantic Features A PREPRINT

Keywords Textual entailment ·Semantic representation ·Word embedding ·Machine learning

1 Introduction

Recognizing Textual Entailment (RTE) is one of the basics of Natural Language Understand-

ing (NLU) and NLU is a subclass of Natural Language Processing (NLP). Textual entailment

is the relationship between two texts where one text fragment, referred to as ‘Hypothesis (H)’

can be inferred from another text fragment, referred to as ‘Text (T)’ [Dagan et al.,2005,Sharma

et al.,2015]. In other words, Text

entails Hypothesis

, if hypothesis

is considered to

be true according to the corresponding text

’s context [Dagan et al.,2005]. Let’s consider a

text-hypothesis pair to illustrate an example of an entailment relationship. Suppose “A mother

is feeding milk to a baby" a particular text

and “A baby is drinking milk" is a hypothesis

We see that the hypothesis

is a true statement that can easily be inferred from the correspond-

ing text

. Let’s consider another hypothesis

,“A man is eating rice”. For the same text

fragment

, we can see that there is no entailment relationship between

and

. Hence this

text-hypothesis pair does not hold any entailment relationship, meaning neutral. The identiﬁca-

tion of entailment relationship has a signiﬁcant impact in different NLP applications that include

question answering, text summarization, machine translation, information extraction, information

retrieval etc. [Almarwani and Diab,2017,Sharma et al.,2015].

Since the ﬁrst PASCAL challenge [Dagan et al.,2005] for recognizing textual entailment to

date, different machine learning approaches have been proposed by the research community.

The proposed approaches tried to employ supervised machine learning (ML) techniques using

different underlying lexical, syntactic, and semantic features of the text-hypothesis pair. Re-

cently, deep learning-based approaches including LSTM (Long Short Term Memory), CNN

(Convolutional Neural Network), and Transfer Learning are being applied to detect the entailment

relationship between the text-hypothesis pair [Kiros et al.,2015,Vaswani et al.,2017,Devlin

et al.,2018,Conneau et al.,2017]. Almost all methods utilized the semantic information of the

text-hypothesis pair by representing them as semantic vectors. For doing so, they considered all

the values of the words’ vectors returned from the word embedding model. Classical approaches

also apply the average of real-valued words’ vectors as sentence representation. We hypothesize

that some values of a particular vector of a word might impact negatively since they will be

passed through an arithmetic average function. Considering this intuition, we observed that

the elements of the words’ vectors whose relevant elements are already present in the semantic

vectors of the text-hypothesis pair, can be eliminated to get a better semantic representation.

Following this observation, we proposed a threshold-based representation technique considering

the mean and standard deviation of the words’ vectors.

Applying the threshold-based semantic sentence representation, the text and hypothesis are

represented by two real-valued high-dimensional vectors. Then we introduce an element-wise

Manhattan distance vector (EMDV) between vectors for text and hypothesis to have semantic

representation for the text-hypothesis pair. This EMDV vector is directly employed as a feature

vector to ML algorithms to identify the entailment relationship of the text-hypothesis pair. In

addition, we introduce another feature by calculating the absolute average of the element-wise

Manhattan distance vector of the text-hypothesis pair. In turn, we extract several handcrafted

lexical and semantic features including Bag-of-Words (BoW) based similarity score, the Jaccard

similarity score (JAC), and the BERT-based semantic textual similarity score (STS) for the

Textual Entailment Recognition with Semantic Features A PREPRINT

corresponding text-hypothesis pair. To classify the text-hypothesis pair, we apply multiple

machine learning classiﬁers that use different textual features including our introduced ones.

Then the ensemble of the ML algorithms with the majority voting technique is employed that

provides the ﬁnal entailment relationship for the corresponding text-hypothesis pair. To validate

the performance of our method, a wide range of experiments are carried out on a benchmark

SICK-RTE dataset. The experimental results on the benchmark textual entailment classiﬁcation

dataset achieved efﬁcient performance to recognize different textual entailment relations. The

results also demonstrated that our approach outperforms some state-of-the-art methods.

The rest of the paper is organized as follows: Section 2presents some related works on RTE.

Then our method is discussed in Section 3. The details of the experiments with their results are

presented in Section 4. Finally, Section 5presents the conclusion with the future direction.

2 Related Work

With the ﬁrst PASCAL challenge, textual entailment recognition has gained considerable attention

of the research community [Dagan et al.,2005]. Several research groups participated in this

challenge. But most of the methods applied lexical features (i.e., word-overlapping) with ML

algorithms to recognize entailment relation [Dagan et al.,2005]. Several RTE challenges have

been organized and some methods with promising performance on different downstream tasks

are proposed [Haim et al.,2006,Giampiccolo et al.,2007,2008,Bentivogli et al.,2009,2011,

Dzikovska et al.,2013,Paramasivam and Nirmala,2021]. Malakasiotis et al. [Malakasiotis and

Androutsopoulos,2007] proposed a method employing the string matching-based lexical and

shallow syntactic features with support vector machine (SVM). Four distance-based features

with SVM are also employed [Castillo and Alemany,2008]. The features include edit distance,

distance in WordNet, and longest common substring between texts.

Similarly, Pakray et al. [Pakray et al.,2009] applied multiple lexical features including WordNet-

based unigram match, bigram match, longest common sub-sequence, skip-gram, stemming,

and named entity matching. Finally, they applied SVM classiﬁers with introducing lexical

and syntactic similarity. Basak et al. [Basak et al.,2015] visualized the text and hypothesis

leveraging directed networks (dependency graphs), with nodes denoting words or phrases and

edges denoting connections between nodes. The entailment relationship is then identiﬁed by

matching the graphs’ with vertex and edge substitution. Some other methods made use of

bag-of-words, word-overlapping, logic-based reasoning, lexical entailment, ML-based methods,

and graph matching to recognize textual entailment[Ghuge and Bhattacharya,2014,Renjit and

Sumam,2022,Liu et al.,2016].

Bowman et al. [Bowman et al.,2015] introduced a Stanford Natural Language Inference corpus

(SNLI) dataset consists of labeled sentence pairs that can be used as a benchmark in NLP tasks.

This is a very large entailment (inference) dataset that provides the opportunity for researchers

to apply deep learning-based approaches to identify the entailment relation between text and

hypothesis. Therefore, different deep learning-based approaches including LSTM (Long Short

Term Memory), CNN (Convolutional Neural Network), BERT, and Transfer Learning are being

applied to RTE [Kiros et al.,2015,Vaswani et al.,2017,Devlin et al.,2018,Conneau et al.,2017].

All the methods either used lexical or semantic features. But our proposed method uses both

the lexical and semantic features including element-wise Manhattan distance vector (EMDV),

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TEXTUALENTAILMENTRECOGNITIONWITHSEMANTICFEATURESFROMEMPIRICALTEXTREPRESENTATIONAPREPRINTMdShajalal1,6,MdAtabuzzaman2,4,MaksudaBilkisBaby2,MdRezaulKarim1,3,AlexanderBoden1,51FraunhoferInstituteforAppliedInformationTechnologyFIT,Germany2HajeeMohammadDaneshScienceandTechnologyUniversity,Bangladesh3RWTH...

展开>> 收起<<

Textual Entailment Recognition with Semantic Features from Empirical Text Representation.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Textual Entailment Recognition with Semantic Features from Empirical Text Representation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: