Trust and Believe Should We Evaluating the Trustworthiness of Twitter Users 1stTanveer Khan

2025-05-06 0 0 979.02KB 10 页 10玖币
侵权投诉
Trust and Believe – Should We?
Evaluating the Trustworthiness of Twitter Users
1st Tanveer Khan
Network and Information Security Group
Tampere University
Tampere, Finland
tanveer.khan@tuni.fi
2nd Antonis Michalas
Network and Information Security Group
Tampere University
Tampere, Finland
antonios.michalas@tuni.fi
Abstract—Social networking and micro-blogging services, such
as Twitter, play an important role in sharing digital information.
Despite the popularity and usefulness of social media, they
are regularly abused by corrupt users. One of these nefarious
activities is so-called fake news – a “virus” that has been
spreading rapidly thanks to the hospitable environment provided
by social media platforms. The extensive spread of fake news
is now becoming a major problem with far-reaching negative
repercussions on both individuals and society. Hence, the iden-
tification of fake news on social media is a problem of utmost
importance that has attracted the interest not only of the research
community but most of the big players on both sides - such
as Facebook, on the industry side, and political parties on the
societal one. In this work, we create a model through which
we hope to be able to offer a solution that will instill trust in
social network communities. Our model analyses the behaviour
of 50,000 politicians on Twitter and assigns an influence score
for each evaluated user based on several collected and analysed
features and attributes. Next, we classify political Twitter users
as either trustworthy or untrustworthy using random forest and
support vector machine classifiers. An active learning model has
been used to classify any unlabeled ambiguous records from our
dataset. Finally, to measure the performance of the proposed
model, we used accuracy as the main evaluation metric.
Index Terms—Credibility, Fake News, Influence Score, Senti-
ment Analysis, Trust, Twitter, Active Learning
I. INTRODUCTION
With one-third of the world’s population using some form
of social media [61], it is evident that the popularity of social
networking sites has rapidly increased in recent years. This has
significantly changed the dynamics of communication across
all age groups; the way we work, the way we live, the way we
interact with other people and the way we share information
have already changed drastically. Furthermore, social media
enables sharing of important information with many people
simultaneously, allowing users to reach a bigger audience.
While social media has its positive sides, it is also important
to consider the flip side and properly evaluate its negative
impacts. One of the latest negative effects of social media is
the so-called fake news phenomenon. It has been proven that
the massive distribution of fake news plays an important role
in the success or failure of important events and causes [10],
This research has received funding from the EU research projects ASCLE-
PIOS (No. 826093) and CYBELE (No 825355).
[11]. Apart from the dissemination and circulation of false
information, social networks provide the ideal toolkit for
corrupt users to perform a wide range of illegitimate actions
such as spamming and political Astroturfing [7], [9].
Twitter, with around half a billion users, is one of the
three most popular social media platforms. It generates on
average 10,000 tweets per second (approximately 500 million
tweets per day1) [47]. It is considered a valuable resource
for government agencies, businesses, political parties, financial
institutions, fundraising, and many other actors as it enables
uncomplicated extraction and dissemination of important in-
formation.
A recent study [1] examined 10 million tweets generated by
700,000 different Twitter accounts and linked to 600 fake and
conspiracy news sites. It identified clusters of Twitter accounts
that linked back to these sites repeatedly, often in ways that
seemed coordinated or even automated. In another study, it was
found that 6.6 million tweets with fake news were distributed
before the 2016 US elections. Different social and political
events such as the 2016 US presidential election [15] were
tainted by a growing number of fake news.
Global concern about the impact of fake news on our
societies is on the rise. Hence, there is an immediate need
for the design, implementation, and adoption of new systems
and algorithms that are able to identify and differentiate
between fake and real news. However, with the increase in
the number of social media users2, the quantity of generated
content is increasing rapidly, which hinders the identification
of fabricated stories [16] and prevents the identification of a
significant amount of information that can potentially give rise
to false rumours. Therefore, verifying the credibility of a tweet
or assigning a score to users based on the information they
have been sharing is a problem that has caught the interest
of many academic and industrial researchers [17], [18], [20]–
[25].
A. Our Contribution
In this work, we present a model for analysing Twitter users
that assigns a score calculated based on their social profiles,
1https://www.omnicoreagency.com/twitter-statistics/
2In 2018, an estimated 2.65 billion people were using social media
worldwide, a number projected to increase to almost 3.1 billion in 2021 [61].
arXiv:2210.15214v1 [cs.SI] 27 Oct 2022
tweet credibility and h-index score (i.e. retweets and likes).
Users with a higher score are not only considered to be more
influential but their tweets are also given greater credibility.
Our main contribution can be summarised as follows:
First, we generated a dataset of 50,000 Twitter users.
For each user, we created a unique profile containing 19
features (discussed in Section III). Our dataset contained
only users whose tweets are public and who have friends
and followers.
For each of the analysed users, we calculated their
Social Reputation score (Section III-B), an h-Index Score
(Section III-B), a Sentiment Score (Section III-B), Tweet
Credibility (Section III-B) and an Influence Score III-C.
Furthermore, we classified each Twitter user account as
either trustworthy or untrustworthy. A trustworthy or
untrustworthy flag was assigned to each user based on
their social reputation, tweet credibility, the sentiment
score of a tweet and H-index score of re-tweets and likes,
as well as an influence score.
To classify a large pool of unlabeled data, we used
an active learning model (a semi-supervised learning
algorithm) – a technique ideal for a situation in which
unlabeled data is abundant but manual labeling is expen-
sive [63], [67].
We measured the performance of our model by using
the accuracy metric. This metric measures the percentage
of correctly predicted Twitter users (trustworthy and
untrustworthy).
We hope that this work will inspire others to perform further
research on this emerging problem while at the same time
kick-starting a period of greater trust on social media through
sustained collaboration between humans and machines.
B. Organisation
The rest of this paper is organised as follows: In Section II
related work is discussed followed by Section III in which we
discuss in detail our proposed approach. The active learning
approach and types of classifiers used are discussed in Sec-
tion IV. Section V features the experimental results and model
evaluation and presents the data collection and experimental
results of our model. Finally, in Section VI, we conclude the
paper.
II. RELATED WORK
Twitter is considered one of the top Online Social Networks
(OSNs) that provide a fertile environment for a variety of
research purposes. Compared to other popular OSNs, Twitter
gains significantly more attention in the research community
due to its open policy on data sharing and distinctive fea-
tures [4]. In 2011, the network had about 175 million unique
accounts [27], a figure that has grown to an estimated 1.3
billion3, making it one of the most popular social media
platforms.
3https://www.brandwatch.com/blog/twitter-stats-and-statistics/
Even though openness and vulnerability are two separate
issues, there have been many cases where malicious users
have taken advantage of Twitter’s openness and managed to
exploit the service in several ways (e.g. political Astroturfing,
spammers sending unsolicited messages, posting malicious
links, etc.).
Despite the important negative impact that the distribution
of fake news has on our society, only a handful of techniques
for identifying fake news on social media have been pro-
posed [4], [7], [9], [30], [31]. One of the most popular and
promising ideas is to evaluate Twitter users and assign them
a credit/reputation score.
Authors in [7] elaborated on the idea that posting duplicate
tweets should affect the reputation score of a user since this
is a behaviour that legitimate users typically do not engage in.
Therefore, posting the same tweet several times would have a
negative effect on the user’s overall credit score. The authors
calculated the edit distance to detect duplication between
two tweets posted from the same account. Furthermore, the
staggering quantities of exchanged messages and information
on Twitter have been exploited by users to hijack trending top-
ics [8]. This is a technique used to send unsolicited messages
to legitimate users. Additionally, there are Twitter accounts
whose only purpose is to artificially boost the popularity of a
hashtag with the main aim of increasing its popularity and
ultimately making the underlying topic a trend. One BBC
report mentioned that £150 was paid on Twitter users to
increase the popularity of a hashtag and make it a trend4.
To tackle these problems, researchers have used different
ways to assess the trustworthiness of tweets and assign an
overall rank to users [31]. Castillo et al. [35] measured the
credibility of tweets (news topics) based on Twitter features.
More precisely, an automated classification technique to detect
news from conversational topics was used. Alex Hai Wang [7]
used followers and friends parameters to calculate the reputa-
tion score, which further aided user classification (i.e. to detect
spammers). Additionally, Saito and Masuda [60] considered
these metrics while assigning a rank to Twitter users. In [36],
the authors analysed tweets relevant to the Mumbai attacks5.
Their analysis showed that most information providers were
unknown while the reputation of the others (based on number
of followers) was very low. In another study [37] that looked at
the same event, an information retrieval technique and machine
learning algorithm found that only 17% of the tweets related
to the underlying attacks were credible.
Gilani et al. [43] found that compared to normal users, bots
and fake accounts use a large number of external links in
their tweets. Hence, analysing other Twitter features such as
URLs is of paramount importance for correctly evaluating the
overall credibility of a user. While Twitter has built tools to
filter out such URLs, there are several masking techniques that
can effectively bypass Twitter’s safeguards.
4https://www.bbc.com/news/blogs-trending-43218939
5https://www.theguardian.com/world/blog/2011/jul/13/mumbai-blasts
摘要:

TrustandBelieve–ShouldWe?EvaluatingtheTrustworthinessofTwitterUsers1stTanveerKhanNetworkandInformationSecurityGroupTampereUniversityTampere,Finlandtanveer.khan@tuni.2ndAntonisMichalasNetworkandInformationSecurityGroupTampereUniversityTampere,Finlandantonios.michalas@tuni.Abstract—Socialnetworkinga...

展开>> 收起<<
Trust and Believe Should We Evaluating the Trustworthiness of Twitter Users 1stTanveer Khan.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:979.02KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注