Exposing Influence Campaigns in the Age of LLMs A Behavioral-Based AI Approach to Detecting State-Sponsored Trolls

2025-05-06 0 0 861.89KB 22 页 10玖币

侵权投诉

Ezzeddine et al.

RESEARCH

Exposing Inﬂuence Campaigns in the Age of

LLMs: A Behavioral-Based AI Approach to

Detecting State-Sponsored Trolls

Fatima Ezzeddine1,2*, Omran Ayoub1, Silvia Giordano1, Gianluca Nogara1, Ihab Sbeity2, Emilio

Ferrara3and Luca Luceri1,3

*Correspondence:

fatima.ezzeddine@supsi.ch

1University of Applied Sciences

and Arts of Southern Switzerland,

Department of Innovative

Technologies, Lugano, Switzerland

Full list of author information is

available at the end of the article

Abstract

The detection of state-sponsored trolls operating in inﬂuence campaigns on social

media is a critical and unsolved challenge for the research community, which has

signiﬁcant implications beyond the online realm. To address this challenge, we

propose a new AI-based solution that identiﬁes troll accounts solely through

behavioral cues associated with their sequences of sharing activity, encompassing

both their actions and the feedback they receive from others. Our approach does

not incorporate any textual content shared and consists of two steps: First, we

leverage an LSTM-based classiﬁer to determine whether account sequences

belong to a state-sponsored troll or an organic, legitimate user. Second, we

employ the classiﬁed sequences to calculate a metric named the “Troll Score”,

quantifying the degree to which an account exhibits troll-like behavior. To assess

the eﬀectiveness of our method, we examine its performance in the context of the

2016 Russian interference campaign during the U.S. Presidential election. Our

experiments yield compelling results, demonstrating that our approach can

identify account sequences with an AUC close to 99% and accurately diﬀerentiate

between Russian trolls and organic users with an AUC of 91%. Notably, our

behavioral-based approach holds a signiﬁcant advantage in the ever-evolving

landscape, where textual and linguistic properties can be easily mimicked by Large

Language Models (LLMs): In contrast to existing language-based techniques, it

relies on more challenging-to-replicate behavioral cues, ensuring greater resilience

in identifying inﬂuence campaigns, especially given the potential increase in the

usage of LLMs for generating inauthentic content. Finally, we assessed the

generalizability of our solution to various entities driving diﬀerent information

operations and found promising results that will guide future research.

Keywords: social network; troll; misinformation

1 Introduction

Social Media Networks (SMNs) are a crucial constituent of societies, providing a

primary platform for individuals to engage in social and political discourse, as well

as to disseminate critical messages and promote propaganda. SMNs have undergone

a signiﬁcant transformation, evolving from a simple aggregation medium to a com-

plex ecosystem where the line between oﬄine and online realms is often blurred [1].

Recent studies have shown that the impact of discussions on SMNs extends beyond

the online platform and can have a signiﬁcant eﬀect on societies, such as undermin-

ing the integrity of political elections and public health [2–6].

arXiv:2210.08786v6 [cs.SI] 11 Oct 2023

Ezzeddine et al. Page 2 of 22

In this context, the accuracy, conﬁdentiality, and authenticity of shared content

are crucial elements for safe communication and, therefore, the well-being of so-

cieties. However, SMNs have experienced a shortage of these elements, as their

growth has led to an increase in deceptive and fraudulent accounts that intention-

ally damage the credibility of online discussions [7,8]. The activity of these accounts

often results in online harms that threaten the honesty and ethics of conversations,

such as the propagation of hate speech, incitement of violence, and dissemination

of misleading and controversial content. This has been observed in recent debates

concerning the Ukraine-Russia conﬂict [9], Covid-19 pandemic [10–14], as well as

the rise of conspiracy theories [15–17]. These fraudulent accounts represent a sig-

niﬁcant threat to healthy online conversations, whose activity has the potential to

exacerbate societal divisions and aﬀect the sovereignty of elections [18–23].

In the political sphere, Russian meddling in the 2016 U.S. Presidential election rep-

resents the most prominent case of deceptive online interference campaign [24,25].

The Mueller report [26] suggests that Russia engaged in extensive attacks on the

U.S. election system to manipulate the outcome of the 2016 voting event. The

“sweeping and systematic” interference allegedly used bots (i.e., automated ac-

counts) and trolls (i.e., state-sponsored human operators) to spread politically bi-

ased and false information [27]. In the aftermath of the election, the U.S. Congress

released a list of 2,752 Twitter accounts associated with Russia’s “Internet Research

Agency” (IRA), known as Russian trolls. As a result, signiﬁcant research eﬀorts were

launched to identify fraudulent accounts and deceptive activity on several SMNs.

Among these platforms, Twitter has been continuously working to eliminate mali-

cious entities involved in information operations across diﬀerent countries [28–30]

and diﬀerent geopolitical events [31,32]. While there are several proven techniques

for uncovering bot accounts [33–39], the detection of troll accounts is currently an

unsolved issue for the research community, due to several factors tied with the hu-

man character of trolls [40]. Note that throughout this manuscript, our deﬁnition

of troll is limited to state-sponsored human actors who have a political agenda and

operate in coordinated inﬂuence campaigns, disregarding thus other hateful and

harassing online activities tied with Internet-mediated trolling behavior.

Recent eﬀorts have devised approaches for identifying trolls by leveraging lin-

guistic cues and proﬁle meta-data [41–45]. Although these approaches have shown

promising results, they suﬀer from certain limitations. Some of these methods are

language-dependent, focusing solely on speciﬁc spoken languages associated with

the trolls under investigation [46,47]. Others are constrained to a single SMN, re-

lying on proﬁle metadata and platform-speciﬁc information. Furthermore, the ease

of imitating language and linguistic cues has increased with the emergence of Large

Language Models (LLMs), such as ChatGPT and similar technologies. As we look

ahead, our ability to detect inﬂuence operations based solely on linguistic cues

may be hindered by the growing reliance on LLMs for such operations [48,49].

These signiﬁcant limitations have prompted research eﬀorts to develop language-

and content-agnostic approaches, as demonstrated in the work of Luceri et al. [50].

This approach distinguishes troll accounts by uncovering behavioral incentives from

their observed activities using an Inverse Reinforcement Learning (IRL) framework.

Given that mimicking behaviors and incentives is notably more challenging than

Ezzeddine et al. Page 3 of 22

imitating language, incorporating behavioral cues either in addition to or as an al-

ternative to purely linguistic-based methods emerges as a promising strategy in an

uncertain future, particularly when the cost of generating inauthentic, yet credible,

content appears to be exceptionally low [51,52].

In this work, we advance along this research line and propose a novel approach

to identify state-sponsored troll activity solely based on behavioral cues linked to

accounts’ sharing activities on Twitter. Speciﬁcally, we consider online activities

regardless of the content shared, the language used, and the linked metadata to

classify accounts as trolls or organic, legitimate users (from now on, simply users).

Our approach aims to capture cues of behavior that diﬀerentiate trolls from users

by analyzing their interactions and responses to feedback. For this purpose, we

consider both the actions performed by an account, namely active online activities,

and the feedback received by others, namely passive online activities, e.g., received

replies and retweets. We refer to the sequence of active and passive activities as a

trajectory, in accordance with [50]. We demonstrate the validity of our approach by

detecting Russian trolls involved in the interference of the 2016 U.S. Presidential

election. We also evaluate whether the proposed approach can be eﬀectively used to

identify various entities involved in diverse Twitter information operations during

the 2020 U.S. Presidential election.

Contributions of this work. The core contributions of this work are summa-

rized as follows:

•We propose a novel approach based on Long Short-Term Memory (LSTM)

for classifying accounts’ trajectories. Our approach correctly identiﬁes trolls’

and users’ trajectories with an AUC and an F1-score of about 99%.

•Leveraging the classiﬁed trajectories, we introduce a metric, namely the Troll

Score, that enables us to quantitatively assess the extent to which an account

exhibits behavior akin to that of a state-sponsored troll. We propose a Troll

Score-based classiﬁer that can eﬀectively detect troll accounts with remarkable

accuracy, achieving an AUC of about 91% (F1-score ∼90%). Our approach

outperforms existing behavioral-based methods and approaches the classiﬁca-

tion performance of existing linguistic solutions, all while not requiring access

to the content of shared messages. This feature enhances its robustness, espe-

cially given the possibility of increased usage of LLMs for inﬂuence operations.

•By analyzing the active and passive activities in which accounts engage, we

uncovered three distinct, naturally emerging behavioral clusters where trolls

intermingle with user accounts. This ﬁnding conﬁrms the diﬃculty of diﬀer-

entiating these two account classes when their trajectories are not considered.

•We demonstrate the capability of our approach to generalize and accurately

identify diverse actors responsible for driving information operations. The

results reveal that our methodology achieves an AUC of 80% (F1-score ∼82%)

in detecting the drivers of diﬀerent campaigns, indicating promising results

for its applicability across countries, languages, and various malicious entities.

2 Related Work

In this Section, we survey research on the automated detection of malicious accounts

operated by trolls, with a focus on the troll farm connected to the IRA [53]. Some

Ezzeddine et al. Page 4 of 22

of these eﬀorts have proposed linguistic approaches that rely on the content posted

by trolls to identify and detect them. For instance, [46] presented a theory-driven

linguistic study of Russian trolls’ language and demonstrated how deceptive linguis-

tic signals can contribute to accurate troll identiﬁcation. Similarly, [47] proposed an

automated reasoning mechanism for hunting trolls on Twitter during the COVID-19

pandemic, which leverages a unique linguistic analysis based on adversarial machine

learning and ambient tactical deception. In [54], the authors proposed a deep learn-

ing solution for troll detection on Reddit and analyzed the shared content using

natural language processing techniques. Other works have considered fusing users’

metadata and linguistic features, such as [41], which used proﬁle description, stop

word usage, language distribution, and bag of words features for detecting Rus-

sian trolls. Other approaches have relied on multimedia analysis, combining text,

audio, and video analysis to detect improper material or behavior [55,56]. For in-

stance, [55] designed a platform for monitoring social media networks with the aim

of automatically tracking malicious content by analyzing images, videos, and other

media. In [56], the authors attempted to capture disinformation and trolls based on

the existence of a ﬁrearm in images using the Object Detection API. A limitation of

these works is their reliance on the content posted by accounts and on the extraction

of linguistic features for troll identiﬁcation. In contrast, our approach solely relies

on the online behavior of accounts, speciﬁcally, the temporal sequence of online

activities performed by a user. This presents an advantage over previous works, as

it is independent of the language used or content shared and has, therefore, the po-

tential to generalize to inﬂuence campaigns originating from diverse countries and

be resilient to the use of LLMs for generating inauthentic content.

Previous studies have proposed sequence analysis approaches for identifying ma-

licious accounts. For example, Kim et al. [57] used text and time as features to

categorize trolls into subgroups based on the temporal and semantic similarity of

their shared content. Luceri et al. [50] proposed a solution that only relies on the

sequence of users’ activity on online platforms to capture the incentives that the

two classes of accounts (trolls vs. users) respond to. They detect troll accounts with

a supervised learning approach fed by the incentives estimated via Inverse Rein-

forcement Learning (IRL). In [58], the authors proposed a model based on users’

sequence of online actions to identify clusters of accounts with similar behavior.

However, this approach was found to be ineﬀective in detecting Russian trolls, as

reported in [50]. Similarly to these approaches, we propose a language- and content-

agnostic method for identifying trolls based only on the sharing activities performed

by the accounts on Twitter. We utilize deep learning, speciﬁcally LSTM, to classify

the sequence of activities as belonging to either troll accounts or organic users. We

leverage the classiﬁed sequences to quantify the extent to which an account behaves

like a troll, a feature not available in earlier methods.

3 Problem Formulation and Trajectory Deﬁnition

This section outlines the objectives of the proposed framework and elucidates the

features, variables, and learning tasks integral to it. While existing methods for

identifying troll activity in SMNs rely on linguistic and metadata features, our

approach strives to be language- and content-agnostic. To achieve this, we rely only

Ezzeddine et al. Page 5 of 22

on behavioral cues and do not incorporate any textual or media content shared

by the accounts, nor do we use their proﬁle metadata. Consequently, our approach

holds the potential for application across various SMNs and is robust against the

increasing use of LLMs and their potential role in inﬂuence campaigns [48,49,51].

To extract the unique online behaviors of trolls and organic users on Twitter, we

extract the accounts’ sequences of online activities. We consider their active online

activities, including generating an original post (i.e., tweet), re-sharing another post

(i.e., retweet), commenting an existing post (i.e., reply), or mentioning another user

in a post (i.e., mention). In addition, we also propose to consider the feedback

the accounts receive from other accounts, namely passive online activities, such

as receiving a retweet (i.e., being retweeted), a comment (i.e., being replied to),

or a mention (i.e., being mentioned in a post). By considering both the actions

performed by the accounts and the feedback received, we aim to capture the distinct

motivations driving trolls’ activity, which may diﬀer from those of organic users [50].

The rationale is that trolls might be motivated to pursue their agenda regardless

of the feedback received from others, while organic users may be inﬂuenced by the

level of endorsement they receive from the online community. For example, users

might be more motivated to generate a new tweet when their content is re-shared

by others, which is also viewed as a form of social endorsement [59,60], or when

they receive positive comments.

To formalize this approach, we model the SMN Twitter as a Markov Decision

Process (MDP). Similarly to Luceri et al. [50], we represent Twitter as an environ-

ment constituted of multiple agents (i.e., Twitter accounts) that can perform a set

of actions (i.e., active online activities) and receive feedback from the environment

(i.e., passive online activities). Consistently with the IRL formulation in [50], we

refer to the latter as states, as they represent the response of the Twitter environ-

ment, whereas we refer to the former as actions, as they indicate the active online

activities that an account can perform on Twitter.

We consider four actions that can be performed by Twitter accounts:

•Original tweet (tw): to generate original content;

•Retweet (rt): to re-share content generated by others;

•Interact with others (in): to interact with other users via replies or mentions;

•No Action (no): to keep silent, i.e., the account does not perform any action.

For what pertains to states, we consider three possible feedback that Twitter

accounts can receive:

•Retweeted (RT): an original tweet generated by the account is re-shared;

•Interacted with (IN): the account is involved by others via replies (i.e., com-

ments to a tweet generated by the account) or mentions;

•No Interaction (NO): no feedback is received by the account.

Every account can move from one state to another when performing an action,

and we refer to such a transition as a state-action pair. Note that an account can

be only in one of the above-mentioned states and can perform only one action in

any given state. By considering the accounts’ timeline, we construct a sequence of

state-action pairs that reconstruct the (observed) history of the account on Twitter.

Overall, there exist only 11 possible combinations of state-action pairs that form a

sequence of an account— the state-action pair (NO, no) is not considered as it does

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Ezzeddineetal.RESEARCHExposingInfluenceCampaignsintheAgeofLLMs:ABehavioral-BasedAIApproachtoDetectingState-SponsoredTrollsFatimaEzzeddine1,2*,OmranAyoub1,SilviaGiordano1,GianlucaNogara1,IhabSbeity2,EmilioFerrara3andLucaLuceri1,3*Correspondence:fatima.ezzeddine@supsi.ch1UniversityofAppliedSciencesand...

展开>> 收起<<

Exposing Influence Campaigns in the Age of LLMs A Behavioral-Based AI Approach to Detecting State-Sponsored Trolls.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Exposing Influence Campaigns in the Age of LLMs A Behavioral-Based AI Approach to Detecting State-Sponsored Trolls

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: