Weakly Supervised Learning for Analyzing Political Campaigns on Facebook Tunazzina Islam Shamik Roy Dan Goldwasser Department of Computer Science Purdue University

2025-05-06 0 0 1.75MB 12 页 10玖币
侵权投诉
Weakly Supervised Learning for Analyzing Political Campaigns on Facebook
Tunazzina Islam*, Shamik Roy, Dan Goldwasser
Department of Computer Science, Purdue University
West Lafayette, Indiana 47907
{islam32, roy98, dgoldwas}@purdue.edu
Abstract
Social media platforms are currently the main channel for po-
litical messaging, allowing politicians to target specific demo-
graphics and adapt based on their reactions. However, making
this communication transparent is challenging, as the messag-
ing is tightly coupled with its intended audience and often
echoed by multiple stakeholders interested in advancing spe-
cific policies. Our goal in this paper is to take a first step
towards understanding these highly decentralized settings. We
propose a weakly supervised approach to identify the stance
and issue of political ads on Facebook and analyze how po-
litical campaigns use some kind of demographic targeting by
location, gender, or age. Furthermore, we analyze the temporal
dynamics of the political ads on election polls.
Introduction
Over the last decade, social media has impacted public
discourse and communication, particularly in the political
context (Kushin and Yamamoto 2010; Wattal et al. 2010;
Ratkiewicz et al. 2011; Stieglitz and Dang-Xuan 2013; Jensen
2017; Marozzo and Bessi 2018; Badawy, Ferrara, and Ler-
man 2018; Ferrara et al. 2020; Sharma, Ferrara, and Liu
2021). Social media has a transformative effect on how po-
litical candidates interact with potential voters by adapting
their messaging to different demographic groups’ specific
concerns and interests. This process, known as microtarget-
ing (Hersh 2015), relies on data-driven campaigning tech-
niques that exploit the rich information collected by social
networks about their users. By measuring the users’ engage-
ment with political content, candidates can identify the issues,
and even the specific phrases and slogans, that resonate with
each demographic group. Furthermore, political campaigns
on social media are highly distributed, with multiple stake-
holders and interest groups using the platforms to advance
their interests and show support for different candidates by
specifically focusing on agenda items relevant for their in-
terests (e.g., the National Rifle Association (NRA) might
emphasize the track-record of each candidate on protecting
gun rights).
Our goal in this paper is to take a first step towards ana-
lyzing and monitoring the landscape of political advertising
on social media. We focus our experiments on the U.S.
*Corresponding author. Email: islam32@purdue.edu
2020 presidential elections, analyzing content supporting
either the Biden-Harris or the Trump-Pence campaigns.
Our goal is twofold: first to characterize the different
stakeholders and analyze their content, and second to build
on this characterization to analyze political messaging across
different demographics. We deal with the decentralized
nature of political advertising on social media. We analyze
over
5K
advertisers (referred to as funding entities) that
funded over
800K
political ads on Facebook
, associating
advertisers with a binary label (Pro-Biden or Pro-Trump)
and ads with four categories capturing positive or negative
messaging and its target. We also identify the specific policy
issue discussed in the ad, a 13-class classification problem.
To clarify, consider the following two ads:
Ad1:
From COVID-19 to the environment to racial justice, Donald
Trump has failed. Joe Biden and Kamala Harris can set us on a
new course. The stakes for Pennsylvanians could not be higher.
Ad2:
President Trump PROTECTED Social Security and
Medicare. Joe Biden tried to cut them MULTIPLE times. President
Trump LOWERED drug costs, and Medicare Advantage Premiums
fell 34%. Under Biden, drug prices SKYROCKETED. Joe Biden
and Kamala Harris’s FAR left plan threatens private insurance and
limits choices.
Ad1 has Anti-Trump and Pro-Biden stances focusing on the
multiple issues covid, climate, racial justice. On the other
hand, Ad2 has Pro-Trump and Anti-Biden stances focusing
on the healthcare issue.
In this paper, we suggest a weakly supervised graph-
embedding based framework in which ads and advertisers
are learned jointly. While some cases, the name of the ad-
vertisers capture their bias, e.g., ‘BIDEN FOR PRESIDENT’,
‘TRUMP MAGA COMMITTEE’ and we mention them as explicit
advertisers. Some advertisers
do not
explicitly mention any
candidate name/party affiliation in their names, e.g., ‘Union
2020’, ‘Plains PAC’ and we call them implicit advertisers. We
leverage weak supervision from explicit advertisers for the
stance prediction. Our embedding objective is derived from
dedicated lexicons developed for identifying policy issues,
and by identifying the political position of a small number of
advertisers (i.e., the position of the Trump-Pence campaign is
known). During learning, our model learns the associations
https://www.facebook.com/ads/library/api
arXiv:2210.10669v2 [cs.CL] 9 May 2023
Figure 1: Political advertising graph capturing relations
among ads, funding entities, stances, issues, issue lexicons.
A funding entity connects to Ad1, which has both pro-biden’
and ‘anti-trump’ stances, on issues related to ‘climate’ and
‘coronavirus’.
between advertisers and their positions and the content they
publish, as well as the issues they are mostly concerned with.
Fig. 1 represents the embedding graph of our framework
containing nodes i.e.,
ads, funding entities, stances, issues,
issue lexicons
and edges representing their relationships, i.e.
as Ad1 has Pro-Biden stance, there is an edge between node
Ad1
and
Pro-Biden
. We learn a graph embedding to max-
imize the similarity between neighboring nodes (Perozzi,
Al-Rfou, and Skiena 2014; Tang et al. 2015; Grover and
Leskovec 2016). Fig. 1 shows that
Ad1
and
Ad2
have weak
labels for stances and issues obtaining from their funding
entities and issue lexicons respectively. Initially,
Ad3
doesn’t
have a label. It has edge with
Funding Entity1
and issue lex-
icons named ‘covid’ and ‘mask’. We exploit these resources
to train our embedding model, which captures the context
and, as a result, can generalize and grasp stances and issues
in new ads (i.e., Ad3 in Fig. 1).
The learned model allows us to analyze how political can-
didates and stakeholders micro-target specific demographics.
In this work, we examine a novel dataset of advertisements
posted to Facebook during the 2020 U.S presidential election
and make the full dataset available to the research community.
Using the information provided by Facebook Political Ads
API, we analyze the issues used for supporting (and attack-
ing) each candidate on ads targeting different geographical
regions, age, and gender groups. We evaluate the quality of
the learned model by applying it to detect the stance of im-
plicit funding entities and compare it with their views (ground
truth). Further, we discuss how election polls affect political
campaigns using Granger causality (Granger 1988). We focus
on the following research questions (RQ) to analyze political
campaigns:
RQ1. Can we analyze political campaigns without direct
supervision? (Section Results and Analysis)
RQ2. Are messages distinctive in ads? (Subsection
De-
scriptive Insights)
RQ3. Which demographics are reached by advertisers?
(Subsection Audience Demographics)
RQ4. How specific region is reached by advertisers and
their messages? (Subsection
State-wise Issue and Demo-
graphics)
RQ5. Are election polls represented in ad campaigns?
(Subsection Granger Causality with Polls)
Our contributions are summarized as follows:
1.
We formulate a novel problem of exploiting weak super-
vision to analyze the landscape of political advertising on
social media.
2.
We propose a weakly supervised graph embedding based
framework to identify political stance of advertisers as
well as the published content and issues of the content.
We show that our model outperforms the baselines.
3.
We conduct quantitative and qualitative analysis on real-
world dataset to demonstrate the effectiveness of our pro-
posed model.
Our code and data are publicly available here.
Related Work
During elections, political candidates use social media for
their campaigns. Recent works show monitoring and analysis
of targeted advertising on social media (Andreou et al. 2019;
Silva et al. 2020b; Serrano et al. 2020). Islam and Goldwasser
(2022b) analyzed Covid-19 vaccine campaign on Facebook.
Ribeiro et al. (2019) analyzed political ads on Facebook that
are linked to a Russian propaganda group: Internet Research
Agency (IRA). Silva et al. (2020a) showed Facebook ad
(created by IRA) engagement targeting the 2016 U.S. general
election. Silva et al. (2020b) designed a system to monitor
political ads on Facebook in Brazil and deployed during
the Brazilian 2018 elections. Capozzi et al. (2020, 2021)
examined advertising concerning the issue of immigration
in Italy. Our paper detects the stance and issue of political
ads on Facebook. It analyzes political campaigns for both
candidates based on the target audience’s demographic and
geographic information as well as presents temporal analysis
for the 2020 U.S presidential election.
Recent works frame the issue of perspective detection
as a text categorization problem (Greene and Resnik 2009;
Klebanov, Beigman, and Diermeier 2010; Recasens et al.
2013; Iyyer et al. 2014; Johnson and Goldwasser 2016). It
is typically studied as a supervised learning task (Lin et al.
2006; Durant and Smith 2006; Greene and Resnik 2009). In
contrast, our approach relies on weak supervision and lex-
icon based approaches (Roy and Goldwasser 2020; Field,
Kliger et al. 2018). Weakly supervised methods reduce de-
pendence on labeled texts. Graph based semi-supervised al-
gorithms achieved considerable attention over the years (Zhu
and Ghahramani 2002; Belkin, Niyogi, and Sindhwani 2006;
Subramanya and Bilmes 2008; Talukdar et al. 2008; Sind-
hwani and Melville 2008; Yang, Cohen, and Salakhudinov
2016; Hisano 2018). Tang, Qu, and Mei (2015); Zhang et al.
https://github.com/tunazislam/weaklysup-FB-ad-political
(2020) used graph-based methods to build text networks with
words, documents and labels and propagate labeling informa-
tion along the graph via embedding learning. Han and Shen
(2016) encoded weakly supervised information in positive
unlabelled learning tasks into pairwise constraints between
training instances imposing on graph embedding. Recently,
Islam and Goldwasser (2022a) proposed weakly supervised
graph embedding based EM-style framework to characterize
user types on social media. Our embedding model is similar
to contrastive learning-based embedding (Wu et al. 2020;
Giorgi et al. 2020). However, contrastive learning is self-
supervised, where labels are generated from the data without
any manual or weak label sources. In our case, we generate
the label using weak supervision. Our work is also closely
related to the entity-targeted sentiment analysis (Mohammad
et al. 2016; Field and Tsvetkov 2019; Mitchell et al. 2013;
Meng et al. 2012). In our work, we use weak supervision to
identify stance and issue of political ads and analyze political
campaigns. To the best of our knowledge, this is the first
work to utilize a weakly supervised graph embedding based
framework to analyze political campaigns on social media.
Data
We collect around
0.8
million political ads from January-
October 2020 using Facebook Ad Library API with the search
term ‘biden’, ‘harris’, ‘trump’, ‘pence’. All advertisements
are written in English. For each ad, the API provides the ad
ID, title, ad body, and URL, ad creation time and the time
span of the campaign, the Facebook page authoring the ad,
funding entity, the cost of the ad (given as a range). The API
also provides information on the users who have seen the
ad (called ‘impressions’): the total number of impressions
(given as a range and we take the average of the end points
of the range), distribution over impressions broken down by
gender (male, female, unknown), age (
7
groups), and location
down to states in the USA. We have duplicate content among
those collected ads because the same ad has been targeted to
different regions and demographics with unique ad id. We
have
35327
ads with different contents,
5431
unique fund-
ing entities, among them
537
explicitly mention candidate
names and/or party affiliations, e.g., BIDEN FOR PRESIDENT,
DONALD J. TRUMP FOR PRESIDENT, INC.
Holdout Data
For validation purpose, we manually annotate
667
ads for
stances and issues. We consider
4
stances ‘pro-biden’, ‘pro-
trump’, ‘anti-biden’, ‘anti-trump’ and
13
issues
§
called ‘abor-
tion’, ‘covid’, ‘climate’, ‘criminal justice reform, race, law
& order’, ‘economy and taxes’, ‘education’, ‘foreign pol-
icy’, ‘guns’, ‘healthcare’, ‘immigration’, ‘supreme court’,
‘terrorism’, ‘lgbtq’. We also mark ‘non-stance’, ‘non-issue’
ads. Two annotators from the Computer Science department
manually annotate a subset of ads to calculate inter-annotator
agreement using Cohen’s Kappa coefficient (Cohen 1960).
This subset has inter-annotator agreements of
77.50%
for
stance and
69.60%
for the issue, which are substantial agree-
ments. In case of a disagreement, we resolve it by discussion.
§https://ballotpedia.org/
ISSUE(UNI,BI,TRI) ISSUE(UNI,BI,TRI)
Abortion (56, 20, 1) Foreign policy (95, 31, 6)
Covid (52, 23, 5) Guns (92, 20, 6)
Climate (66, 22, 3) Healthcare (62, 21, 4)
Criminal justice reform, race Immigration (78, 25, 3)
law & order (93, 26, 5) Supreme court (80, 25, 4)
Economy & taxes (41, 16, 2) Terrorism (73, 19, 3)
Education (62, 22, 2) LGBTQ (55, 12, 1)
Table 1: Number of unigram, bigram, trigram in each issue.
The rest of the data was annotated by one graduate student
from the Computer Science department.
Methodology
We represent political advertising activity on social media as
a graph, connecting funding entities to their ads. We repre-
sent the outcome of our analysis, stance and issue predictions,
as separate label nodes in the graph connected via edges to
ads and funding entities. Each issue label-node is associated
with an
n
-gram lexicon, a set of nodes representing lexical
indicators for the issue. Based on known associations be-
tween funding entities and stances, we associate
10%
of the
funding entities and their ads with stance labels. The lexicon
and observed stance relations act as a weak form of super-
vision for graph embedding. Our model learns to generalize
the stance predictor to new ads, and by contextualizing the
lexicon
n
-grams based on their occurrence in ads, we learn
to associate other ads with the relevant issue even when the
lexicon items are not present. These settings are described
in Fig. 1. Note that each ad can be associated with multiple
issues and stances (e.g., pro-biden and anti-trump).
Issue Lexicon
To create the issue lexicon, we collect
30
news articles
covering each issue from left leaning, right leaning, and
neutral news media. We know the news source bias from
https://mediabiasfactcheck.com/. We calculate the Pointwise
Mutual Information (PMI) (Church and Hanks 1990) to iden-
tify issue-specific lexicons. We calculate the PMI for an
n
-
gram,
w
with issue,
i
as
P MI(w, i) = log P(w|i)
P(w)
. To com-
pute
P(w|i)
, we take all news articles related to an issue
i
and compute
count(w)
count(all ngrams)
. We have
30
news articles per
issue.
P(w)
is computed by counting
n
-gram,
w
over the
whole corpus (
390
news articles). We assign each
n
-gram
to the issue with the highest PMI and build an
n
-gram lexi-
con for each issue. Table 1 shows the number of unigrams,
bigrams, and trigrams with PMI
0.5
per issue. In this pa-
per, we use only unigrams, resulting in
905
issue-indicating
words.
Model
To identify stances and issues, we do the followings:
Inferring Stance Labels Using Knowledge.
In some cases,
the names of funding entities capture their bias. For example:
‘Biden Victory Fund’, ‘Keep Trump in office’ clearly state
摘要:

WeaklySupervisedLearningforAnalyzingPoliticalCampaignsonFacebookTunazzinaIslam*,ShamikRoy,DanGoldwasserDepartmentofComputerScience,PurdueUniversityWestLafayette,Indiana47907fislam32,roy98,dgoldwasg@purdue.eduAbstractSocialmediaplatformsarecurrentlythemainchannelforpo-liticalmessaging,allowingpolitic...

展开>> 收起<<
Weakly Supervised Learning for Analyzing Political Campaigns on Facebook Tunazzina Islam Shamik Roy Dan Goldwasser Department of Computer Science Purdue University.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:12 页 大小:1.75MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注