NSL entered into force. This historical dataset includes
posts and accounts that may have subsequently been
deleted or made private. The archived data consists of
about 2 million Tweets from various user populations
during 2019. Second, we compile datasets of currently
available Tweets and Twitter users from before and af-
ter enactment of the NSL. These datasets contain over
7 million Tweets from Hong Kong users and 8 million
Tweets from a set of control users.
In our analysis to answer RQ1, we nd that Hong
Kong users are over a third more likely than a control
sample to protect their accounts and over twice as likely
to delete past Tweets than control Twitter users.
To address RQ2, we additionally curate a dataset of
Tweet keywords that were common among Hong Kong
users before the NSL and that are associated with polit-
ically sensitive topics that are censored on social media
platforms in mainland China. We analyze the relative
frequency of Tweets containing politically sensitive key-
words over time for Hong Kong users and for a control
group. We nd that Hong Kong users continue to speak
less online about politically sensitive topics.
Our case study presents large-scale quantitative evi-
dence that aggressive legislation and policy can quickly
and starkly alter the nature of online political discourse.
2 Background
In this section we present prior research measuring self-
censorship in online discourse, and we oer background
for our Hong Kong case study.
2.1 Measuring self-censorship in online
discourse
We dene self-censorship, or the chilling eect, consis-
tent with prior scholarship: when an individual with-
holds or falsies discourse for fear of repercussion [20].
There is a vast literature on measuring online political
discourse [11,13,25]. There is also a large body of quali-
tative research, especially in law, public policy, and poli-
tics, about self-censorship and chilling eects [14,19,20].
The media also often reports on this phenomenon, of-
ten from anecdotal evidence or hypotheses by policy-
makers [16–18]. There is, however, very little large-
scale empirical research on changes in online political
discourse that are attributable to self-censorship.
Past research has shown that measurable dierences
can surface around discrete events that increase the per-
ception of online surveillance. In the most similar prior
work, Tanash et al. quantied the change in Tweet-
ing behavior by Turkish users specically after the 2016
attempted coup in Turkey [23]. The Turkish govern-
ment subsequently arrested thousands of people that it
blamed for plotting the coup, with little due process.
Many of these arrests resulted from investigations into
social media activity, solely on the basis of individu-
als’ online speech and actions. Notably, Tanash et al.
measured both a surge in retroactively deleted tweets
by Turkish users and a signicant decrease in certain
politically sensitive tweets from Turkish accounts.
2.2 The Hong Kong national security
law
The new national security law for Hong Kong creates
penalties for people who participate in secession, sub-
version of the governments of mainland China or Hong
Kong, terrorist activities, or collusion with a foreign
country to endanger national security. In addition to
having a vague and sweeping scope, the law extends
beyond Hong Kong: Article 38 establishes liability for
oenses that occur “outside the region by a person who
is not a permanent resident of the region”[27].
In the six months after the NSL entered into force,
Hong Kong law enforcement arrested at least 100 indi-
viduals on the basis of the new law [26]. At least 24
of the arrests involved charges related to “seditious”or
“secessionist”speech. The arrestees included legislators,
protestors, student activists, journalists, and an Amer-
ican human rights lawyer. Journalists in Hong Kong
have described their fear of declining press freedoms and
increased self-censorship in the media [3]. Because en-
forcement of the NSL has already targeted online polit-
ical speech, Hong Kongers may have a strong incentive
to self-censor their online social media activity.
3 Methodology
In this section, we describe how we collect data to an-
swer our two research questions. For each, we curate
several large datasets, and we perform various analyses
on the data.
To curate these datasets, we combine various sources
of Twitter data both from archives and from Twitter’s
Full-Archive Search API. We then augment the data
with additional live data from the Twitter API. Next,
we lter and curate these large data sources into smaller
datasets, which we use for analysis. We enumerate our
datasets in Table 1.
Figure 1illustrates the data collection process for our
study. The code for the data collection and analysis can
be found at https://github.com/citp/hk-twitter.
3.1 Post and account deletion
RQ1: Comparing social media activity by Hong Kong
users before and after enactment of the national secu-
2