
Unsupervised Domain Adaptation for COVID-19
Information Service with Contrastive Adversarial
Domain Mixup
Huimin Zeng∗, Zhenrui Yue∗, Ziyi Kou∗, Lanyu Shang∗, Yang Zhang†, Dong Wang∗
∗School of Information Sciences
University of Illinois Urbana-Champaign, IL, USA
{huiminz3, zhenrui3, ziyikou2, lshang3, dwang24}@illinois.edu
†Department of Computer Science and Engineering
University of Notre Dame, IN, USA
yzhang42@nd.edu
Abstract—In the real-world application of COVID-19 misin-
formation detection, a fundamental challenge is the lack of the
labeled COVID data to enable supervised end-to-end training
of the models, especially at the early stage of the pandemic.
To address this challenge, we propose an unsupervised domain
adaptation framework using contrastive learning and adversarial
domain mixup to transfer the knowledge from an existing source
data domain to the target COVID-19 data domain. In particular,
to bridge the gap between the source domain and the target
domain, our method reduces a radial basis function (RBF)
based discrepancy between these two domains. Moreover, we
leverage the power of domain adversarial examples to establish
an intermediate domain mixup, where the latent representations
of the input text from both domains could be mixed during
the training process. Extensive experiments on multiple real-
world datasets suggest that our method can effectively adapt
misinformation detection systems to the unseen COVID-19 target
domain with significant improvements compared to the state-of-
the-art baselines.
I. INTRODUCTION
In this work, we focus on COVID-19 misinformation
detection, given its global impact of the ongoing pandemic
and the “Infodemic”
1
it causes on social media [1]. Regarding
COVID-19 misinformation detection, if the language models
trained on non-COVID datasets without any fine-tuning on
COVID-19 specific data, these models might suffer from a
severe issue of generalization and perform poorly on the
COVID-19 data, due to the domain shift between the non-
COVID training data distribution and the test COVID-19 data
distribution. Recently, the ongoing pandemic of COVID-19
inspires a variety of studies [2] to develop NLP models to
provide reliable COVID-19 information services across various
social media platforms (e.g., Twitter, Facebook). However,
the supervised learning approaches often require a large-scale
training dataset while collecting annotations for COVID training
data is extremely expensive and time consuming due to the cost
and complexity in recruiting the qualified annotators and keep
the annotations update to date to accommodate the dynamics
1https://www.who.int/health-topics/infodemic#tab=tab 1
of COVID-19 knowledge (e.g., different variants of the virus)
[3]. Moreover, our unsupervised domain adaptation setting
is motivated for a more general setting of any early-stage
pandemic (not limited to COVID-19) where there is no ground-
truth information about the novel disease at all, but the need for
correct information is urgent. Therefore, it is critical to develop
unsupervised domain adaptation frameworks to train COVID
models so that knowledge from an existing data domain could
be adapted and transferred to the unseen COVID data domain
without requiring any ground-truth training labels.
In this paper, we explore an unsupervised domain adaptation
problem for COVID-19 misinformation detection on social
media. In particular, we propose an unsupervised domain
adaptation framework
C
ontrastive
A
dversarial
D
omain
M
ixup
(CADM), which uses adversarial domain mixup and contrastive
learning to bridge the gap between the source training data
domain and the target COVID data domain. The overview
of our framework is shown in Figure 1. To demonstrate
the effectiveness of the proposed CADM, we evaluate it
on several real-world COVID-19 datasets. Our experimental
results suggest that our CADM effectively adapts pre-trained
language models to the target COVID domain, and consistently
outperforms state-of-the-art baselines.
II. RELATED WORK
Misinformation Detection.
Great efforts have been made
to detect the misinformation from online platforms (e.g., social
media). In [2], knowledge graphs are integrated into the
misinformation detection framework to enhance the model’s
performance. The concurrent work [4] also proposed a domain
adaptation framework to address the COVID-19 misinformation
detection using label correction. However, such misinformation
detection systems are built under a supervised or semi-
supervised learning setting, but in practice labeled COVID-19
misinformation data is not always accessible. Therefore, this
paper focuses on unsupervised domain adaptation of language
models for COVID-19 misinformation detection, where the
arXiv:2210.03250v1 [cs.CL] 6 Oct 2022