
Time Topics
t0
Scientist Doctor, Covid, Corona Virus, No Mask, Return To Work,
Govern, Nose, Mouth, Dread
t1
Real Patriots Wear Mask, Failed Lock Down, 2020 US Election, Save,
Immune Compromise, Covid19 Outbreak, Schools Wear a Mask, Corona
Virus Pakistan, Corona Virus Canada
t2
Symptom, Temperature, Lock Down, Panic Buy, Cough, I m With Fauci,
Trump Kills Us, Mask Up
t3
Therapy, Inject, Wear Your Mask, Trump Is A National Disgrace, Trump
Land Slid, Surgeon, Covid 2019 India, Wear You Masks Dont Work
t4
Red State, Blue State, Trump Lies Americans, Lying Trump, Trump
Melt Down, End Lock Down
Figure 1: Results from the Twitter stance detection dataset for COVID-19 topics (Glandt et al.,2021). t0refers to
the time span of the earliest 40% tweets and the rest are equally split into 4 segments in the chronological order
corresponding to t1,t2,t3, and t4, respectively. Latent topics from t0to t4are shown on the left and topic words
are learned by VAE. Stance detection results over time are shown on the right, where x-axis indicates test sets from
t0to t4and y-axis the prediction accuracy. LSTM results are displayed in the light blue line and BERT dark blue.
for topics related to his COVID-19 policies.
To empirically examine how dynamicity affects
NLU performance, we experiment in a
dynamic
setup
: the data is split with an absolute time, where
the messages posted beforehand are used for train-
ing while those afterwards are for test. On the
contrary, most social media benchmarks adopt the
static setup
, where training and test sets are ran-
domly split and tend to exhibit similar data distri-
butions (Glandt et al.,2021;Hansen et al.,2021;
Mathew et al.,2021). It is thus incapable of reflect-
ing the realistic application scenarios — a model
should usually learn to tackle the data created af-
ter it is trained while the evolving features would
continuously shift the data distributions.
Language learning with distribution shift (a.k.a.,
OOD, short for out-of-distribution) has drawn a
growing attention in the NLP community (Shen
et al.,2021;Arora et al.,2021). Most previous
work focuses on OOD in different domains (Muan-
det et al.,2013;Ganin et al.,2015) and studies
how to learn generalizable cross-domain features.
Here we experiment OOD in the dynamic environ-
ment — whose time-sensitive nature renders the
data evolution to occur progressively and contin-
uously; whereas most prior empirical studies dis-
cuss OOD across domains and hence focus on the
relatively discrete shifts from the source to target
domains (Volpi et al.,2018;Krueger et al.,2021).
To further examine NLU adaption to time evolu-
tion (henceforth
time-adaptive learning
), we ex-
ploit a small set of unlabeled data posted after a
model is trained (henceforth
trans-data
) and in-
vestigate its potential in mitigating the time-shaped
feature gap. For methodology, we start with the
existing solutions in unsupervised domain adaption
(UDA) (Ramponi and Plank,2020) and employ two
popular baselines in this line, one is feature-centric
based on auto-encoding (specifically VAE) and the
other data-centric pseudo-labeling (PL). Further-
more, a joint-training framework is explored to
study their coupled effects in fighting against the
possible performance deterioration over time.
The experiments are based on three trendy so-
cial media tasks about the detection of COVID-19
stance (Glandt et al.,2021), fake news (Hansen
et al.,2021), and hate speech (Mathew et al.,
2021) with the benchmark data from Twitter. We
also gather a new corpus for hashtag prediction to
broaden our scope to noisy user-generated labels
tremendous on social media.
1
Dynamic setup is
adopted and models are tested on multiple datasets
varying in the time gap to the training data to quan-
tify the model sensitivity to the time evolution.
In the main results, the performance of all mod-
els are gradually worse in general over time. It
implies dynamic social media environment may
universally and negatively affect the NLU effec-
tiveness. With some trans-data, both VAE and PL
can helpfully tackle dynamicity and the their joint
framework achieves the best results consistently
over time. We then analyze the effects of trans-data
scale and create time and find both PL and VAE
might benefit from trans-data with larger scales and
smaller time gap to the training data. At last, case
studies interpret how VAE and PL collaboratively
handle the dynamic environments.
To conclude, we present the first empirical study,
to the best of our knowledge, on the universal ef-
fects of dynamic social media environment on NLU,
and provide insights to when and how UDA meth-
ods help advance model robustness over time.
1
Hashtags are tagged by the author of a post to indicate its
topic label and start with an hash “#”, e.g., “#COVID19”.