Textual Entailment Recognition with Semantic Features from Empirical Text Representation

2025-05-02 0 0 282.71KB 12 页 10玖币
侵权投诉
TEXTUAL ENTAILMENT RECOGNITION WITH
SEMANTIC FEATURES FROM EMPIRICAL TEXT
REPRESENTATION
A PREPRINT
Md Shajalal1,6, Md Atabuzzaman2,4, Maksuda Bilkis Baby2,
Md Rezaul Karim1,3, Alexander Boden1,5
1Fraunhofer Institute for Applied Information Technology FIT, Germany
2Hajee Mohammad Danesh Science and Technology University, Bangladesh
3RWTH Aachen University, Germany
4Bangladesh University of Engineering and Technology
5Bonn-Rhein-Sieg University of Applied Sciences, Germany
6University of Siegen, Germany
atabuzzaman@gmail.com
ABSTRACT
Textual entailment recognition is one of the basic natural language understand-
ing (NLU) tasks. Understanding the meaning of sentences is a prerequisite
before applying any natural language processing (NLP) techniques to automati-
cally recognize the textual entailment. A text entails a hypothesis if and only if
the true value of the hypothesis follows the text. Classical approaches generally
utilize the feature value of each word from word embedding to represent the
sentences. In this paper, we propose a novel approach to identifying the textual
entailment relationship between text and hypothesis, thereby introducing a
new semantic feature focusing on empirical threshold-based semantic text
representation. We employ an element-wise Manhattan distance vector-based
feature that can identify the semantic entailment relationship between the text-
hypothesis pair. We carried out several experiments on a benchmark entailment
classification (SICK-RTE) dataset. We train several machine learning (ML)
algorithms applying both semantic and lexical features to classify the text-
hypothesis pair as entailment, neutral, or contradiction. Our empirical sentence
representation technique enriches the semantic information of the texts and
hypotheses found to be more efficient than the classical ones. In the end,
our approach significantly outperforms known methods in understanding the
meaning of the sentences for the textual entailment classification task.
This is the pre-print version of our accepted and presented paper at International Conference on Speech &
Language Technology for Low-resource Languages (SPELLL’2022)
arXiv:2210.09723v4 [cs.CL] 19 Jun 2023
Textual Entailment Recognition with Semantic Features A PREPRINT
Keywords Textual entailment ·Semantic representation ·Word embedding ·Machine learning
1 Introduction
Recognizing Textual Entailment (RTE) is one of the basics of Natural Language Understand-
ing (NLU) and NLU is a subclass of Natural Language Processing (NLP). Textual entailment
is the relationship between two texts where one text fragment, referred to as ‘Hypothesis (H)’
can be inferred from another text fragment, referred to as ‘Text (T)’ [Dagan et al.,2005,Sharma
et al.,2015]. In other words, Text
T
entails Hypothesis
H
, if hypothesis
H
is considered to
be true according to the corresponding text
T
s context [Dagan et al.,2005]. Let’s consider a
text-hypothesis pair to illustrate an example of an entailment relationship. Suppose “A mother
is feeding milk to a baby" a particular text
T
and “A baby is drinking milk" is a hypothesis
H
.
We see that the hypothesis
H
is a true statement that can easily be inferred from the correspond-
ing text
T
. Let’s consider another hypothesis
H
,“A man is eating rice”. For the same text
fragment
T
, we can see that there is no entailment relationship between
T
and
H
. Hence this
text-hypothesis pair does not hold any entailment relationship, meaning neutral. The identifica-
tion of entailment relationship has a significant impact in different NLP applications that include
question answering, text summarization, machine translation, information extraction, information
retrieval etc. [Almarwani and Diab,2017,Sharma et al.,2015].
Since the first PASCAL challenge [Dagan et al.,2005] for recognizing textual entailment to
date, different machine learning approaches have been proposed by the research community.
The proposed approaches tried to employ supervised machine learning (ML) techniques using
different underlying lexical, syntactic, and semantic features of the text-hypothesis pair. Re-
cently, deep learning-based approaches including LSTM (Long Short Term Memory), CNN
(Convolutional Neural Network), and Transfer Learning are being applied to detect the entailment
relationship between the text-hypothesis pair [Kiros et al.,2015,Vaswani et al.,2017,Devlin
et al.,2018,Conneau et al.,2017]. Almost all methods utilized the semantic information of the
text-hypothesis pair by representing them as semantic vectors. For doing so, they considered all
the values of the words’ vectors returned from the word embedding model. Classical approaches
also apply the average of real-valued words’ vectors as sentence representation. We hypothesize
that some values of a particular vector of a word might impact negatively since they will be
passed through an arithmetic average function. Considering this intuition, we observed that
the elements of the words’ vectors whose relevant elements are already present in the semantic
vectors of the text-hypothesis pair, can be eliminated to get a better semantic representation.
Following this observation, we proposed a threshold-based representation technique considering
the mean and standard deviation of the words’ vectors.
Applying the threshold-based semantic sentence representation, the text and hypothesis are
represented by two real-valued high-dimensional vectors. Then we introduce an element-wise
Manhattan distance vector (EMDV) between vectors for text and hypothesis to have semantic
representation for the text-hypothesis pair. This EMDV vector is directly employed as a feature
vector to ML algorithms to identify the entailment relationship of the text-hypothesis pair. In
addition, we introduce another feature by calculating the absolute average of the element-wise
Manhattan distance vector of the text-hypothesis pair. In turn, we extract several handcrafted
lexical and semantic features including Bag-of-Words (BoW) based similarity score, the Jaccard
similarity score (JAC), and the BERT-based semantic textual similarity score (STS) for the
2
Textual Entailment Recognition with Semantic Features A PREPRINT
corresponding text-hypothesis pair. To classify the text-hypothesis pair, we apply multiple
machine learning classifiers that use different textual features including our introduced ones.
Then the ensemble of the ML algorithms with the majority voting technique is employed that
provides the final entailment relationship for the corresponding text-hypothesis pair. To validate
the performance of our method, a wide range of experiments are carried out on a benchmark
SICK-RTE dataset. The experimental results on the benchmark textual entailment classification
dataset achieved efficient performance to recognize different textual entailment relations. The
results also demonstrated that our approach outperforms some state-of-the-art methods.
The rest of the paper is organized as follows: Section 2presents some related works on RTE.
Then our method is discussed in Section 3. The details of the experiments with their results are
presented in Section 4. Finally, Section 5presents the conclusion with the future direction.
2 Related Work
With the first PASCAL challenge, textual entailment recognition has gained considerable attention
of the research community [Dagan et al.,2005]. Several research groups participated in this
challenge. But most of the methods applied lexical features (i.e., word-overlapping) with ML
algorithms to recognize entailment relation [Dagan et al.,2005]. Several RTE challenges have
been organized and some methods with promising performance on different downstream tasks
are proposed [Haim et al.,2006,Giampiccolo et al.,2007,2008,Bentivogli et al.,2009,2011,
Dzikovska et al.,2013,Paramasivam and Nirmala,2021]. Malakasiotis et al. [Malakasiotis and
Androutsopoulos,2007] proposed a method employing the string matching-based lexical and
shallow syntactic features with support vector machine (SVM). Four distance-based features
with SVM are also employed [Castillo and Alemany,2008]. The features include edit distance,
distance in WordNet, and longest common substring between texts.
Similarly, Pakray et al. [Pakray et al.,2009] applied multiple lexical features including WordNet-
based unigram match, bigram match, longest common sub-sequence, skip-gram, stemming,
and named entity matching. Finally, they applied SVM classifiers with introducing lexical
and syntactic similarity. Basak et al. [Basak et al.,2015] visualized the text and hypothesis
leveraging directed networks (dependency graphs), with nodes denoting words or phrases and
edges denoting connections between nodes. The entailment relationship is then identified by
matching the graphs’ with vertex and edge substitution. Some other methods made use of
bag-of-words, word-overlapping, logic-based reasoning, lexical entailment, ML-based methods,
and graph matching to recognize textual entailment[Ghuge and Bhattacharya,2014,Renjit and
Sumam,2022,Liu et al.,2016].
Bowman et al. [Bowman et al.,2015] introduced a Stanford Natural Language Inference corpus
(SNLI) dataset consists of labeled sentence pairs that can be used as a benchmark in NLP tasks.
This is a very large entailment (inference) dataset that provides the opportunity for researchers
to apply deep learning-based approaches to identify the entailment relation between text and
hypothesis. Therefore, different deep learning-based approaches including LSTM (Long Short
Term Memory), CNN (Convolutional Neural Network), BERT, and Transfer Learning are being
applied to RTE [Kiros et al.,2015,Vaswani et al.,2017,Devlin et al.,2018,Conneau et al.,2017].
All the methods either used lexical or semantic features. But our proposed method uses both
the lexical and semantic features including element-wise Manhattan distance vector (EMDV),
3
摘要:

TEXTUALENTAILMENTRECOGNITIONWITHSEMANTICFEATURESFROMEMPIRICALTEXTREPRESENTATIONAPREPRINTMdShajalal1,6,MdAtabuzzaman2,4,MaksudaBilkisBaby2,MdRezaulKarim1,3,AlexanderBoden1,51FraunhoferInstituteforAppliedInformationTechnologyFIT,Germany2HajeeMohammadDaneshScienceandTechnologyUniversity,Bangladesh3RWTH...

展开>> 收起<<
Textual Entailment Recognition with Semantic Features from Empirical Text Representation.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:282.71KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注