
CDCONV: A Benchmark for Contradiction Detection in
Chinese Conversations
Chujie Zheng1∗Jinfeng Zhou1,2∗Yinhe Zheng3Libiao Peng3Zhen Guo4
Wenquan Wu4Zheng-Yu Niu4Hua Wu4Minlie Huang1,3†
1The CoAI Group, Institute for Artificial Intelligence, State Key Lab of Intelligent Technology and Systems,
1Beijing National Research Center for Information Science and Technology, DCST, Tsinghua University, Beijing 100084, China
2College of Intelligence and Computing, Tianjin University, Tianjin, China
3Lingxin AI, Beijing 100084, China 4Baidu Inc., China
chujiezhengchn@gmail.com jfzhou.mail@gmail.com aihuang@tsinghua.edu.cn
{guozhenguozhen, wuwenquan01, niuzhengyu, wu_hua}@baidu.com
Abstract
Dialogue contradiction is a critical issue in
open-domain dialogue systems. The contex-
tualization nature of conversations makes dia-
logue contradiction detection rather challeng-
ing. In this work, we propose a bench-
mark for Contradiction Detection in Chinese
Conversations, namely CDCONV. It contains
12K multi-turn conversations annotated with
three typical contradiction categories: Intra-
sentence Contradiction, Role Confusion, and
History Contradiction. To efficiently construct
the CDCONV conversations, we devise a se-
ries of methods for automatic conversation
generation, which simulate common user be-
haviors that trigger chatbots to make contra-
dictions. We conduct careful manual quality
screening of the constructed conversations and
show that state-of-the-art Chinese chatbots can
be easily goaded into making contradictions.
Experiments on CDCONV show that properly
modeling contextual information is critical for
dialogue contradiction detection, but there are
still unresolved challenges that require future
research.1
1 Introduction
Large-scale pre-training for dialogue generation
(Zhang et al.,2020;Freitas et al.,2020) has ad-
vanced the development of engaging and human-
like dialogue systems. Unfortunately, state-of-
the-art open-domain chatbots, such as BlenderBot
(Roller et al.,2021), EVA (Zhou et al.,2021;Gu
et al.,2022) and PLATO (Bao et al.,2021b), still of-
ten behave inconsistently with their role or identity
and produce utterances that are self-contradictory
∗Equal contribution.
†Corresponding author.
1
Our data and codes are available at
https:
//www.github.com/thu-coai/CDConv
and
https://github.com/PaddlePaddle/Knover/
tree/dygraph/projects/cdconv
𝑢1: 你喜欢吃面条吗?(Do you like noodles?)
𝑏1: 我喜欢吃米饭! (I love to eat rice!)
𝑢2: 你不喜欢面条吗? (Don’t you like noodles?)
𝑏2: 不喜欢! (I dislike!)
𝑢1: 那你喜欢什么小动物呢? (What animals do you like?)
𝑏1: 狗,猫 (Dogs, cats)
𝑢2: 你不喜欢什么小动物? (What animals do you dislike?)
𝑏2: 不喜欢猫,其他的都喜欢 (I dislike cats. I like all the
other animals)
{"model": "plato", "method": "反义-否定词", "u1": "那你喜欢看什么类型的电影?", "b1": "科幻,动作,恐怖,喜剧片都可以", "u2": "你不喜欢什么类型的电影?", "b2": "恐怖,血腥,血腥暴力的,还有恶心的", "label": 3, "persona": 2}
{"model": "plato", "method": "反义-否定词", "u1": "那你喜欢看什么类型的电影?", "b1": "科幻,动作,恐怖,喜剧片都可以", "u2": "你不喜欢什么类型的电影?", "b2": "不喜欢韩国电影", "label": 0}
{"model": "plato", "method": "反义-否定词", "u1": "那你喜欢看什么类型的电影?", "b1": "科幻,动作,恐怖,喜剧片都可以", "u2": "你不喜欢什么类型的电影?", "b2": "我不喜欢剧情片", "label": 0}
{"model": "plato", "method": "反义-否定词", "u1": "你难道不喜欢她吗?", "b1": "不喜欢,我有喜欢的人", "u2": "你喜欢她吗?", "b2": "不喜欢我还这么纠结啊?", "label": 3, "persona": 2}
{"model": "plato", "method": "反义-否定词", "u1": "你难道不喜欢她吗?", "b1": "不喜欢,我有喜欢的人", "u2": "你喜欢她吗?", "b2": "我也不知道,我感觉我不喜欢他了", "label": 1}
{"model": "plato", "method": "反义-否定词", "u1": "你难道不喜欢她吗?", "b1": "不喜欢,我有喜欢的人", "u2": "你喜欢她吗?", "b2": "不喜欢,不讨厌", "label": 0}
{"model": "plato", "method": "反义-否定词", "u1": "你难道不喜欢她吗?", "b1": "不喜欢,我有喜欢的人", "u2": "你喜欢她吗?", "b2": "不喜欢,只是有好感", "label": 0}
{"model": "plato", "method": "反义-否定词", "u1": "你喜欢干什么平时?", "b1": "玩游戏,逛街,看电影", "u2": "你不喜欢做什么?", "b2": "不喜欢逛街,太累了", "label": 3, "persona": 2}
{"model": "eva", "method": "反义-否定词", "u1": "那你下班之后最喜欢干的事是什么?", "b1": "吃饭睡觉打豆豆", "u2": "下班后你最不喜欢做什么?", "b2": "不喜欢睡觉,也不喜欢吃东西", "label": 3, "persona": 2}
Figure 1: Dialogue contradiction detection requires the
full contextual information (including u1and u2) rather
than only the bot’s utterances (i.e., b1and b2).
or contradict the dialogue history (Shuster et al.,
2022;Gu et al.,2022;Xu et al.,2022a). Such in-
consistency or contradiction phenomena violate
Grice’s cooperative principle (Grice,1975) and
greatly impair the users’ long-term trust (Huang
et al.,2020;Lee et al.,2022).
Dialogue contradiction detection has shown to
be an effective means to improve the consistency
of chatbots (Welleck et al.,2019;Nie et al.,2021),
which, however, is always a challenging task.
Specifically, the contextualization nature of con-
versations indicates the necessity of considering
and modeling contextual information. For instance,
in the “Contradiction” example in Figure 1,
b2
does
not explicitly contradict
b1
. However, given
u1
, the
actual meaning of
b1
should be “
I like
dogs, cats”
and
b1
and
b2
are thus contradictory. In contrast, in
the “Non-contradiction” example, while
b1
and
b2
seem inconsistent (“love” vs. “dislike”),
b2
actually
means “I dislike
noodles
” considering the dialogue
context. Hence,
b2
is compatible with
b1
and does
not make a contradiction.
Despite the above challenge, existing datasets for
contradiction detection (Dziri et al.,2019;Welleck
arXiv:2210.08511v1 [cs.CL] 16 Oct 2022