Weakly Supervised Learning for Analyzing Political Campaigns on Facebook Tunazzina Islam Shamik Roy Dan Goldwasser Department of Computer Science Purdue University

2025-05-06 0 0 1.75MB 12 页 10玖币

侵权投诉

Weakly Supervised Learning for Analyzing Political Campaigns on Facebook

Tunazzina Islam*, Shamik Roy, Dan Goldwasser

Department of Computer Science, Purdue University

West Lafayette, Indiana 47907

{islam32, roy98, dgoldwas}@purdue.edu

Abstract

Social media platforms are currently the main channel for po-

litical messaging, allowing politicians to target speciﬁc demo-

graphics and adapt based on their reactions. However, making

this communication transparent is challenging, as the messag-

ing is tightly coupled with its intended audience and often

echoed by multiple stakeholders interested in advancing spe-

ciﬁc policies. Our goal in this paper is to take a ﬁrst step

towards understanding these highly decentralized settings. We

propose a weakly supervised approach to identify the stance

and issue of political ads on Facebook and analyze how po-

litical campaigns use some kind of demographic targeting by

location, gender, or age. Furthermore, we analyze the temporal

dynamics of the political ads on election polls.

Introduction

Over the last decade, social media has impacted public

discourse and communication, particularly in the political

context (Kushin and Yamamoto 2010; Wattal et al. 2010;

Ratkiewicz et al. 2011; Stieglitz and Dang-Xuan 2013; Jensen

2017; Marozzo and Bessi 2018; Badawy, Ferrara, and Ler-

man 2018; Ferrara et al. 2020; Sharma, Ferrara, and Liu

2021). Social media has a transformative effect on how po-

litical candidates interact with potential voters by adapting

their messaging to different demographic groups’ speciﬁc

concerns and interests. This process, known as microtarget-

ing (Hersh 2015), relies on data-driven campaigning tech-

niques that exploit the rich information collected by social

networks about their users. By measuring the users’ engage-

ment with political content, candidates can identify the issues,

and even the speciﬁc phrases and slogans, that resonate with

each demographic group. Furthermore, political campaigns

on social media are highly distributed, with multiple stake-

holders and interest groups using the platforms to advance

their interests and show support for different candidates by

speciﬁcally focusing on agenda items relevant for their in-

terests (e.g., the National Riﬂe Association (NRA) might

emphasize the track-record of each candidate on protecting

gun rights).

Our goal in this paper is to take a ﬁrst step towards ana-

lyzing and monitoring the landscape of political advertising

on social media. We focus our experiments on the U.S.

*Corresponding author. Email: islam32@purdue.edu

2020 presidential elections, analyzing content supporting

either the Biden-Harris or the Trump-Pence campaigns.

Our goal is twofold: ﬁrst to characterize the different

stakeholders and analyze their content, and second to build

on this characterization to analyze political messaging across

different demographics. We deal with the decentralized

nature of political advertising on social media. We analyze

over

advertisers (referred to as funding entities) that

funded over

800K

political ads on Facebook

†

, associating

advertisers with a binary label (Pro-Biden or Pro-Trump)

and ads with four categories capturing positive or negative

messaging and its target. We also identify the speciﬁc policy

issue discussed in the ad, a 13-class classiﬁcation problem.

To clarify, consider the following two ads:

Ad1:

From COVID-19 to the environment to racial justice, Donald

Trump has failed. Joe Biden and Kamala Harris can set us on a

new course. The stakes for Pennsylvanians could not be higher.

Ad2:

President Trump PROTECTED Social Security and

Medicare. Joe Biden tried to cut them MULTIPLE times. President

Trump LOWERED drug costs, and Medicare Advantage Premiums

fell 34%. Under Biden, drug prices SKYROCKETED. Joe Biden

and Kamala Harris’s FAR left plan threatens private insurance and

limits choices.

Ad1 has Anti-Trump and Pro-Biden stances focusing on the

multiple issues covid, climate, racial justice. On the other

hand, Ad2 has Pro-Trump and Anti-Biden stances focusing

on the healthcare issue.

In this paper, we suggest a weakly supervised graph-

embedding based framework in which ads and advertisers

are learned jointly. While some cases, the name of the ad-

vertisers capture their bias, e.g., ‘BIDEN FOR PRESIDENT’,

‘TRUMP MAGA COMMITTEE’ and we mention them as explicit

advertisers. Some advertisers

do not

explicitly mention any

candidate name/party afﬁliation in their names, e.g., ‘Union

2020’, ‘Plains PAC’ and we call them implicit advertisers. We

leverage weak supervision from explicit advertisers for the

stance prediction. Our embedding objective is derived from

dedicated lexicons developed for identifying policy issues,

and by identifying the political position of a small number of

advertisers (i.e., the position of the Trump-Pence campaign is

known). During learning, our model learns the associations

†https://www.facebook.com/ads/library/api

arXiv:2210.10669v2 [cs.CL] 9 May 2023

Figure 1: Political advertising graph capturing relations

among ads, funding entities, stances, issues, issue lexicons.

A funding entity connects to Ad1, which has both pro-biden’

and ‘anti-trump’ stances, on issues related to ‘climate’ and

‘coronavirus’.

between advertisers and their positions and the content they

publish, as well as the issues they are mostly concerned with.

Fig. 1 represents the embedding graph of our framework

containing nodes i.e.,

ads, funding entities, stances, issues,

issue lexicons

and edges representing their relationships, i.e.

as Ad1 has Pro-Biden stance, there is an edge between node

Ad1

and

Pro-Biden

. We learn a graph embedding to max-

imize the similarity between neighboring nodes (Perozzi,

Al-Rfou, and Skiena 2014; Tang et al. 2015; Grover and

Leskovec 2016). Fig. 1 shows that

Ad1

and

Ad2

have weak

labels for stances and issues obtaining from their funding

entities and issue lexicons respectively. Initially,

Ad3

doesn’t

have a label. It has edge with

Funding Entity1

and issue lex-

icons named ‘covid’ and ‘mask’. We exploit these resources

to train our embedding model, which captures the context

and, as a result, can generalize and grasp stances and issues

in new ads (i.e., Ad3 in Fig. 1).

The learned model allows us to analyze how political can-

didates and stakeholders micro-target speciﬁc demographics.

In this work, we examine a novel dataset of advertisements

posted to Facebook during the 2020 U.S presidential election

and make the full dataset available to the research community.

Using the information provided by Facebook Political Ads

API, we analyze the issues used for supporting (and attack-

ing) each candidate on ads targeting different geographical

regions, age, and gender groups. We evaluate the quality of

the learned model by applying it to detect the stance of im-

plicit funding entities and compare it with their views (ground

truth). Further, we discuss how election polls affect political

campaigns using Granger causality (Granger 1988). We focus

on the following research questions (RQ) to analyze political

campaigns:

•

RQ1. Can we analyze political campaigns without direct

supervision? (Section Results and Analysis)

•

RQ2. Are messages distinctive in ads? (Subsection

De-

scriptive Insights)

•

RQ3. Which demographics are reached by advertisers?

(Subsection Audience Demographics)

•

RQ4. How speciﬁc region is reached by advertisers and

their messages? (Subsection

State-wise Issue and Demo-

graphics)

•

RQ5. Are election polls represented in ad campaigns?

(Subsection Granger Causality with Polls)

Our contributions are summarized as follows:

We formulate a novel problem of exploiting weak super-

vision to analyze the landscape of political advertising on

social media.

We propose a weakly supervised graph embedding based

framework to identify political stance of advertisers as

well as the published content and issues of the content.

We show that our model outperforms the baselines.

We conduct quantitative and qualitative analysis on real-

world dataset to demonstrate the effectiveness of our pro-

posed model.

Our code and data are publicly available here‡.

Related Work

During elections, political candidates use social media for

their campaigns. Recent works show monitoring and analysis

of targeted advertising on social media (Andreou et al. 2019;

Silva et al. 2020b; Serrano et al. 2020). Islam and Goldwasser

(2022b) analyzed Covid-19 vaccine campaign on Facebook.

Ribeiro et al. (2019) analyzed political ads on Facebook that

are linked to a Russian propaganda group: Internet Research

Agency (IRA). Silva et al. (2020a) showed Facebook ad

(created by IRA) engagement targeting the 2016 U.S. general

election. Silva et al. (2020b) designed a system to monitor

political ads on Facebook in Brazil and deployed during

the Brazilian 2018 elections. Capozzi et al. (2020, 2021)

examined advertising concerning the issue of immigration

in Italy. Our paper detects the stance and issue of political

ads on Facebook. It analyzes political campaigns for both

candidates based on the target audience’s demographic and

geographic information as well as presents temporal analysis

for the 2020 U.S presidential election.

Recent works frame the issue of perspective detection

as a text categorization problem (Greene and Resnik 2009;

Klebanov, Beigman, and Diermeier 2010; Recasens et al.

2013; Iyyer et al. 2014; Johnson and Goldwasser 2016). It

is typically studied as a supervised learning task (Lin et al.

2006; Durant and Smith 2006; Greene and Resnik 2009). In

contrast, our approach relies on weak supervision and lex-

icon based approaches (Roy and Goldwasser 2020; Field,

Kliger et al. 2018). Weakly supervised methods reduce de-

pendence on labeled texts. Graph based semi-supervised al-

gorithms achieved considerable attention over the years (Zhu

and Ghahramani 2002; Belkin, Niyogi, and Sindhwani 2006;

Subramanya and Bilmes 2008; Talukdar et al. 2008; Sind-

hwani and Melville 2008; Yang, Cohen, and Salakhudinov

2016; Hisano 2018). Tang, Qu, and Mei (2015); Zhang et al.

‡https://github.com/tunazislam/weaklysup-FB-ad-political

(2020) used graph-based methods to build text networks with

words, documents and labels and propagate labeling informa-

tion along the graph via embedding learning. Han and Shen

(2016) encoded weakly supervised information in positive

unlabelled learning tasks into pairwise constraints between

training instances imposing on graph embedding. Recently,

Islam and Goldwasser (2022a) proposed weakly supervised

graph embedding based EM-style framework to characterize

user types on social media. Our embedding model is similar

to contrastive learning-based embedding (Wu et al. 2020;

Giorgi et al. 2020). However, contrastive learning is self-

supervised, where labels are generated from the data without

any manual or weak label sources. In our case, we generate

the label using weak supervision. Our work is also closely

related to the entity-targeted sentiment analysis (Mohammad

et al. 2016; Field and Tsvetkov 2019; Mitchell et al. 2013;

Meng et al. 2012). In our work, we use weak supervision to

identify stance and issue of political ads and analyze political

campaigns. To the best of our knowledge, this is the ﬁrst

work to utilize a weakly supervised graph embedding based

framework to analyze political campaigns on social media.

Data

We collect around

0.8

million political ads from January-

October 2020 using Facebook Ad Library API with the search

term ‘biden’, ‘harris’, ‘trump’, ‘pence’. All advertisements

are written in English. For each ad, the API provides the ad

ID, title, ad body, and URL, ad creation time and the time

span of the campaign, the Facebook page authoring the ad,

funding entity, the cost of the ad (given as a range). The API

also provides information on the users who have seen the

ad (called ‘impressions’): the total number of impressions

(given as a range and we take the average of the end points

of the range), distribution over impressions broken down by

gender (male, female, unknown), age (

groups), and location

down to states in the USA. We have duplicate content among

those collected ads because the same ad has been targeted to

different regions and demographics with unique ad id. We

have

35327

ads with different contents,

5431

unique fund-

ing entities, among them

537

explicitly mention candidate

names and/or party afﬁliations, e.g., BIDEN FOR PRESIDENT,

DONALD J. TRUMP FOR PRESIDENT, INC.

Holdout Data

For validation purpose, we manually annotate

667

ads for

stances and issues. We consider

stances ‘pro-biden’, ‘pro-

trump’, ‘anti-biden’, ‘anti-trump’ and

issues

called ‘abor-

tion’, ‘covid’, ‘climate’, ‘criminal justice reform, race, law

& order’, ‘economy and taxes’, ‘education’, ‘foreign pol-

icy’, ‘guns’, ‘healthcare’, ‘immigration’, ‘supreme court’,

‘terrorism’, ‘lgbtq’. We also mark ‘non-stance’, ‘non-issue’

ads. Two annotators from the Computer Science department

manually annotate a subset of ads to calculate inter-annotator

agreement using Cohen’s Kappa coefﬁcient (Cohen 1960).

This subset has inter-annotator agreements of

77.50%

for

stance and

69.60%

for the issue, which are substantial agree-

ments. In case of a disagreement, we resolve it by discussion.

§https://ballotpedia.org/

ISSUE(UNI,BI,TRI) ISSUE(UNI,BI,TRI)

Abortion (56, 20, 1) Foreign policy (95, 31, 6)

Covid (52, 23, 5) Guns (92, 20, 6)

Climate (66, 22, 3) Healthcare (62, 21, 4)

Criminal justice reform, race Immigration (78, 25, 3)

law & order (93, 26, 5) Supreme court (80, 25, 4)

Economy & taxes (41, 16, 2) Terrorism (73, 19, 3)

Education (62, 22, 2) LGBTQ (55, 12, 1)

Table 1: Number of unigram, bigram, trigram in each issue.

The rest of the data was annotated by one graduate student

from the Computer Science department.

Methodology

We represent political advertising activity on social media as

a graph, connecting funding entities to their ads. We repre-

sent the outcome of our analysis, stance and issue predictions,

as separate label nodes in the graph connected via edges to

ads and funding entities. Each issue label-node is associated

with an

-gram lexicon, a set of nodes representing lexical

indicators for the issue. Based on known associations be-

tween funding entities and stances, we associate

10%

of the

funding entities and their ads with stance labels. The lexicon

and observed stance relations act as a weak form of super-

vision for graph embedding. Our model learns to generalize

the stance predictor to new ads, and by contextualizing the

lexicon

-grams based on their occurrence in ads, we learn

to associate other ads with the relevant issue even when the

lexicon items are not present. These settings are described

in Fig. 1. Note that each ad can be associated with multiple

issues and stances (e.g., pro-biden and anti-trump).

Issue Lexicon

To create the issue lexicon, we collect

news articles

covering each issue from left leaning, right leaning, and

neutral news media. We know the news source bias from

https://mediabiasfactcheck.com/. We calculate the Pointwise

Mutual Information (PMI) (Church and Hanks 1990) to iden-

tify issue-speciﬁc lexicons. We calculate the PMI for an

gram,

with issue,

P MI(w, i) = log P(w|i)

P(w)

. To com-

pute

P(w|i)

, we take all news articles related to an issue

and compute

count(w)

count(all ngrams)

. We have

news articles per

issue.

P(w)

is computed by counting

-gram,

over the

whole corpus (

390

news articles). We assign each

-gram

to the issue with the highest PMI and build an

-gram lexi-

con for each issue. Table 1 shows the number of unigrams,

bigrams, and trigrams with PMI

≥0.5

per issue. In this pa-

per, we use only unigrams, resulting in

905

issue-indicating

words.

Model

To identify stances and issues, we do the followings:

Inferring Stance Labels Using Knowledge.

In some cases,

the names of funding entities capture their bias. For example:

‘Biden Victory Fund’, ‘Keep Trump in ofﬁce’ clearly state

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

WeaklySupervisedLearningforAnalyzingPoliticalCampaignsonFacebookTunazzinaIslam*,ShamikRoy,DanGoldwasserDepartmentofComputerScience,PurdueUniversityWestLafayette,Indiana47907fislam32,roy98,dgoldwasg@purdue.eduAbstractSocialmediaplatformsarecurrentlythemainchannelforpo-liticalmessaging,allowingpolitic...

展开>> 收起<<

Weakly Supervised Learning for Analyzing Political Campaigns on Facebook Tunazzina Islam Shamik Roy Dan Goldwasser Department of Computer Science Purdue University.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Weakly Supervised Learning for Analyzing Political Campaigns on Facebook Tunazzina Islam Shamik Roy Dan Goldwasser Department of Computer Science Purdue University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: