Exposing Influence Campaigns in the Age of LLMs A Behavioral-Based AI Approach to Detecting State-Sponsored Trolls

2025-05-06 0 0 861.89KB 22 页 10玖币
侵权投诉
Ezzeddine et al.
RESEARCH
Exposing Influence Campaigns in the Age of
LLMs: A Behavioral-Based AI Approach to
Detecting State-Sponsored Trolls
Fatima Ezzeddine1,2*, Omran Ayoub1, Silvia Giordano1, Gianluca Nogara1, Ihab Sbeity2, Emilio
Ferrara3and Luca Luceri1,3
*Correspondence:
fatima.ezzeddine@supsi.ch
1University of Applied Sciences
and Arts of Southern Switzerland,
Department of Innovative
Technologies, Lugano, Switzerland
Full list of author information is
available at the end of the article
Abstract
The detection of state-sponsored trolls operating in influence campaigns on social
media is a critical and unsolved challenge for the research community, which has
significant implications beyond the online realm. To address this challenge, we
propose a new AI-based solution that identifies troll accounts solely through
behavioral cues associated with their sequences of sharing activity, encompassing
both their actions and the feedback they receive from others. Our approach does
not incorporate any textual content shared and consists of two steps: First, we
leverage an LSTM-based classifier to determine whether account sequences
belong to a state-sponsored troll or an organic, legitimate user. Second, we
employ the classified sequences to calculate a metric named the “Troll Score”,
quantifying the degree to which an account exhibits troll-like behavior. To assess
the effectiveness of our method, we examine its performance in the context of the
2016 Russian interference campaign during the U.S. Presidential election. Our
experiments yield compelling results, demonstrating that our approach can
identify account sequences with an AUC close to 99% and accurately differentiate
between Russian trolls and organic users with an AUC of 91%. Notably, our
behavioral-based approach holds a significant advantage in the ever-evolving
landscape, where textual and linguistic properties can be easily mimicked by Large
Language Models (LLMs): In contrast to existing language-based techniques, it
relies on more challenging-to-replicate behavioral cues, ensuring greater resilience
in identifying influence campaigns, especially given the potential increase in the
usage of LLMs for generating inauthentic content. Finally, we assessed the
generalizability of our solution to various entities driving different information
operations and found promising results that will guide future research.
Keywords: social network; troll; misinformation
1 Introduction
Social Media Networks (SMNs) are a crucial constituent of societies, providing a
primary platform for individuals to engage in social and political discourse, as well
as to disseminate critical messages and promote propaganda. SMNs have undergone
a significant transformation, evolving from a simple aggregation medium to a com-
plex ecosystem where the line between offline and online realms is often blurred [1].
Recent studies have shown that the impact of discussions on SMNs extends beyond
the online platform and can have a significant effect on societies, such as undermin-
ing the integrity of political elections and public health [26].
arXiv:2210.08786v6 [cs.SI] 11 Oct 2023
Ezzeddine et al. Page 2 of 22
In this context, the accuracy, confidentiality, and authenticity of shared content
are crucial elements for safe communication and, therefore, the well-being of so-
cieties. However, SMNs have experienced a shortage of these elements, as their
growth has led to an increase in deceptive and fraudulent accounts that intention-
ally damage the credibility of online discussions [7,8]. The activity of these accounts
often results in online harms that threaten the honesty and ethics of conversations,
such as the propagation of hate speech, incitement of violence, and dissemination
of misleading and controversial content. This has been observed in recent debates
concerning the Ukraine-Russia conflict [9], Covid-19 pandemic [1014], as well as
the rise of conspiracy theories [1517]. These fraudulent accounts represent a sig-
nificant threat to healthy online conversations, whose activity has the potential to
exacerbate societal divisions and affect the sovereignty of elections [1823].
In the political sphere, Russian meddling in the 2016 U.S. Presidential election rep-
resents the most prominent case of deceptive online interference campaign [24,25].
The Mueller report [26] suggests that Russia engaged in extensive attacks on the
U.S. election system to manipulate the outcome of the 2016 voting event. The
“sweeping and systematic” interference allegedly used bots (i.e., automated ac-
counts) and trolls (i.e., state-sponsored human operators) to spread politically bi-
ased and false information [27]. In the aftermath of the election, the U.S. Congress
released a list of 2,752 Twitter accounts associated with Russia’s “Internet Research
Agency” (IRA), known as Russian trolls. As a result, significant research efforts were
launched to identify fraudulent accounts and deceptive activity on several SMNs.
Among these platforms, Twitter has been continuously working to eliminate mali-
cious entities involved in information operations across different countries [2830]
and different geopolitical events [31,32]. While there are several proven techniques
for uncovering bot accounts [3339], the detection of troll accounts is currently an
unsolved issue for the research community, due to several factors tied with the hu-
man character of trolls [40]. Note that throughout this manuscript, our definition
of troll is limited to state-sponsored human actors who have a political agenda and
operate in coordinated influence campaigns, disregarding thus other hateful and
harassing online activities tied with Internet-mediated trolling behavior.
Recent efforts have devised approaches for identifying trolls by leveraging lin-
guistic cues and profile meta-data [4145]. Although these approaches have shown
promising results, they suffer from certain limitations. Some of these methods are
language-dependent, focusing solely on specific spoken languages associated with
the trolls under investigation [46,47]. Others are constrained to a single SMN, re-
lying on profile metadata and platform-specific information. Furthermore, the ease
of imitating language and linguistic cues has increased with the emergence of Large
Language Models (LLMs), such as ChatGPT and similar technologies. As we look
ahead, our ability to detect influence operations based solely on linguistic cues
may be hindered by the growing reliance on LLMs for such operations [48,49].
These significant limitations have prompted research efforts to develop language-
and content-agnostic approaches, as demonstrated in the work of Luceri et al. [50].
This approach distinguishes troll accounts by uncovering behavioral incentives from
their observed activities using an Inverse Reinforcement Learning (IRL) framework.
Given that mimicking behaviors and incentives is notably more challenging than
Ezzeddine et al. Page 3 of 22
imitating language, incorporating behavioral cues either in addition to or as an al-
ternative to purely linguistic-based methods emerges as a promising strategy in an
uncertain future, particularly when the cost of generating inauthentic, yet credible,
content appears to be exceptionally low [51,52].
In this work, we advance along this research line and propose a novel approach
to identify state-sponsored troll activity solely based on behavioral cues linked to
accounts’ sharing activities on Twitter. Specifically, we consider online activities
regardless of the content shared, the language used, and the linked metadata to
classify accounts as trolls or organic, legitimate users (from now on, simply users).
Our approach aims to capture cues of behavior that differentiate trolls from users
by analyzing their interactions and responses to feedback. For this purpose, we
consider both the actions performed by an account, namely active online activities,
and the feedback received by others, namely passive online activities, e.g., received
replies and retweets. We refer to the sequence of active and passive activities as a
trajectory, in accordance with [50]. We demonstrate the validity of our approach by
detecting Russian trolls involved in the interference of the 2016 U.S. Presidential
election. We also evaluate whether the proposed approach can be effectively used to
identify various entities involved in diverse Twitter information operations during
the 2020 U.S. Presidential election.
Contributions of this work. The core contributions of this work are summa-
rized as follows:
We propose a novel approach based on Long Short-Term Memory (LSTM)
for classifying accounts’ trajectories. Our approach correctly identifies trolls’
and users’ trajectories with an AUC and an F1-score of about 99%.
Leveraging the classified trajectories, we introduce a metric, namely the Troll
Score, that enables us to quantitatively assess the extent to which an account
exhibits behavior akin to that of a state-sponsored troll. We propose a Troll
Score-based classifier that can effectively detect troll accounts with remarkable
accuracy, achieving an AUC of about 91% (F1-score 90%). Our approach
outperforms existing behavioral-based methods and approaches the classifica-
tion performance of existing linguistic solutions, all while not requiring access
to the content of shared messages. This feature enhances its robustness, espe-
cially given the possibility of increased usage of LLMs for influence operations.
By analyzing the active and passive activities in which accounts engage, we
uncovered three distinct, naturally emerging behavioral clusters where trolls
intermingle with user accounts. This finding confirms the difficulty of differ-
entiating these two account classes when their trajectories are not considered.
We demonstrate the capability of our approach to generalize and accurately
identify diverse actors responsible for driving information operations. The
results reveal that our methodology achieves an AUC of 80% (F1-score 82%)
in detecting the drivers of different campaigns, indicating promising results
for its applicability across countries, languages, and various malicious entities.
2 Related Work
In this Section, we survey research on the automated detection of malicious accounts
operated by trolls, with a focus on the troll farm connected to the IRA [53]. Some
Ezzeddine et al. Page 4 of 22
of these efforts have proposed linguistic approaches that rely on the content posted
by trolls to identify and detect them. For instance, [46] presented a theory-driven
linguistic study of Russian trolls’ language and demonstrated how deceptive linguis-
tic signals can contribute to accurate troll identification. Similarly, [47] proposed an
automated reasoning mechanism for hunting trolls on Twitter during the COVID-19
pandemic, which leverages a unique linguistic analysis based on adversarial machine
learning and ambient tactical deception. In [54], the authors proposed a deep learn-
ing solution for troll detection on Reddit and analyzed the shared content using
natural language processing techniques. Other works have considered fusing users’
metadata and linguistic features, such as [41], which used profile description, stop
word usage, language distribution, and bag of words features for detecting Rus-
sian trolls. Other approaches have relied on multimedia analysis, combining text,
audio, and video analysis to detect improper material or behavior [55,56]. For in-
stance, [55] designed a platform for monitoring social media networks with the aim
of automatically tracking malicious content by analyzing images, videos, and other
media. In [56], the authors attempted to capture disinformation and trolls based on
the existence of a firearm in images using the Object Detection API. A limitation of
these works is their reliance on the content posted by accounts and on the extraction
of linguistic features for troll identification. In contrast, our approach solely relies
on the online behavior of accounts, specifically, the temporal sequence of online
activities performed by a user. This presents an advantage over previous works, as
it is independent of the language used or content shared and has, therefore, the po-
tential to generalize to influence campaigns originating from diverse countries and
be resilient to the use of LLMs for generating inauthentic content.
Previous studies have proposed sequence analysis approaches for identifying ma-
licious accounts. For example, Kim et al. [57] used text and time as features to
categorize trolls into subgroups based on the temporal and semantic similarity of
their shared content. Luceri et al. [50] proposed a solution that only relies on the
sequence of users’ activity on online platforms to capture the incentives that the
two classes of accounts (trolls vs. users) respond to. They detect troll accounts with
a supervised learning approach fed by the incentives estimated via Inverse Rein-
forcement Learning (IRL). In [58], the authors proposed a model based on users’
sequence of online actions to identify clusters of accounts with similar behavior.
However, this approach was found to be ineffective in detecting Russian trolls, as
reported in [50]. Similarly to these approaches, we propose a language- and content-
agnostic method for identifying trolls based only on the sharing activities performed
by the accounts on Twitter. We utilize deep learning, specifically LSTM, to classify
the sequence of activities as belonging to either troll accounts or organic users. We
leverage the classified sequences to quantify the extent to which an account behaves
like a troll, a feature not available in earlier methods.
3 Problem Formulation and Trajectory Definition
This section outlines the objectives of the proposed framework and elucidates the
features, variables, and learning tasks integral to it. While existing methods for
identifying troll activity in SMNs rely on linguistic and metadata features, our
approach strives to be language- and content-agnostic. To achieve this, we rely only
Ezzeddine et al. Page 5 of 22
on behavioral cues and do not incorporate any textual or media content shared
by the accounts, nor do we use their profile metadata. Consequently, our approach
holds the potential for application across various SMNs and is robust against the
increasing use of LLMs and their potential role in influence campaigns [48,49,51].
To extract the unique online behaviors of trolls and organic users on Twitter, we
extract the accounts’ sequences of online activities. We consider their active online
activities, including generating an original post (i.e., tweet), re-sharing another post
(i.e., retweet), commenting an existing post (i.e., reply), or mentioning another user
in a post (i.e., mention). In addition, we also propose to consider the feedback
the accounts receive from other accounts, namely passive online activities, such
as receiving a retweet (i.e., being retweeted), a comment (i.e., being replied to),
or a mention (i.e., being mentioned in a post). By considering both the actions
performed by the accounts and the feedback received, we aim to capture the distinct
motivations driving trolls’ activity, which may differ from those of organic users [50].
The rationale is that trolls might be motivated to pursue their agenda regardless
of the feedback received from others, while organic users may be influenced by the
level of endorsement they receive from the online community. For example, users
might be more motivated to generate a new tweet when their content is re-shared
by others, which is also viewed as a form of social endorsement [59,60], or when
they receive positive comments.
To formalize this approach, we model the SMN Twitter as a Markov Decision
Process (MDP). Similarly to Luceri et al. [50], we represent Twitter as an environ-
ment constituted of multiple agents (i.e., Twitter accounts) that can perform a set
of actions (i.e., active online activities) and receive feedback from the environment
(i.e., passive online activities). Consistently with the IRL formulation in [50], we
refer to the latter as states, as they represent the response of the Twitter environ-
ment, whereas we refer to the former as actions, as they indicate the active online
activities that an account can perform on Twitter.
We consider four actions that can be performed by Twitter accounts:
Original tweet (tw): to generate original content;
Retweet (rt): to re-share content generated by others;
Interact with others (in): to interact with other users via replies or mentions;
No Action (no): to keep silent, i.e., the account does not perform any action.
For what pertains to states, we consider three possible feedback that Twitter
accounts can receive:
Retweeted (RT): an original tweet generated by the account is re-shared;
Interacted with (IN): the account is involved by others via replies (i.e., com-
ments to a tweet generated by the account) or mentions;
No Interaction (NO): no feedback is received by the account.
Every account can move from one state to another when performing an action,
and we refer to such a transition as a state-action pair. Note that an account can
be only in one of the above-mentioned states and can perform only one action in
any given state. By considering the accounts’ timeline, we construct a sequence of
state-action pairs that reconstruct the (observed) history of the account on Twitter.
Overall, there exist only 11 possible combinations of state-action pairs that form a
sequence of an account— the state-action pair (NO, no) is not considered as it does
摘要:

Ezzeddineetal.RESEARCHExposingInfluenceCampaignsintheAgeofLLMs:ABehavioral-BasedAIApproachtoDetectingState-SponsoredTrollsFatimaEzzeddine1,2*,OmranAyoub1,SilviaGiordano1,GianlucaNogara1,IhabSbeity2,EmilioFerrara3andLucaLuceri1,3*Correspondence:fatima.ezzeddine@supsi.ch1UniversityofAppliedSciencesand...

展开>> 收起<<
Exposing Influence Campaigns in the Age of LLMs A Behavioral-Based AI Approach to Detecting State-Sponsored Trolls.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:22 页 大小:861.89KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注