FollowerFollowee Ratio Category and User Vector for Analyzing Following Behavior Hayato Oshimo Shiori Hironaka Mitsuo Yoshiday and Kyoji Umemura

2025-05-06 0 0 386.41KB 6 页 10玖币
侵权投诉
Follower–Followee Ratio Category and User Vector
for Analyzing Following Behavior
Hayato Oshimo, Shiori Hironaka, Mitsuo Yoshida, and Kyoji Umemura
Department of Computer Science and Engineering
Toyohashi University of Technology
Aichi, Japan
Email: oshimo.hayato.zk@tut.jp, hironaka.shiori.ru@tut.jp, umemura@tut.jp
Faculty of Business Sciences
University of Tsukuba
Tokyo, Japan
Email: mitsuo@gssm.otsuka.tsukuba.ac.jp
Abstract—Analyzing following behavior is important in many
applications. Following behavior may depend on the main inten-
tion of the follower. Users may either follow their friends or they
may follow celebrities to know more about them. It is difficult
to estimate users’ intention from their following relationships.
In this paper, we propose an approach to analyze following
relationships. First, we investigated the similarity between users.
Similar followers and followees are likely to be friends. However,
when the follower and followee are not similar, it is likely
that follower seeks to obtain more information on the followee.
Second, we categorized users by the network structure. We
then proposed analysis of following behavior based on similarity
and category of users estimated from tweets and user data.
We confirmed the feasibility of the proposed method through
experiments. Finally, we examined users in different categories
and analyzed their following behavior.
KeywordsUser analysis, User embeddings, Network science,
Twitter, Following behavior
I. INTRODUCTION
Twitter is a social media platform where people post short
messages called tweets and communicate with each other.
Twitter users follow other users by subscribing to their tweets.
Twitter users can follow without the permission of the targeted
user; thus, the following relationship is directed.
Analyzing following behavior is important in many appli-
cations, such as friend recommendations [1] or information
diffusion analysis [2]. Users’ following behavior depends on
their intention. There are various intentions on the following
links [3]. It is difficult to classify these links because it is hard
to collect data indicating the intentions of the links.
We assume that different categories of users have different
preferences for whom to follow. For example, users that
are willing to learn more about celebrities follow them. We
classified users by the follower–followee ratio, which is the
ratio of the number of followees to the number of followers.
The follower–followee ratio has been used to analyze social
media users [4].
We analyzed the preferences of users followers based on the
user category and topical similarity. Topical similarity reflects
the similarity between the users’ tweets. The user category
was defined using the follower–followee ratio, which reflects
the user’s characteristics. First, we confirmed the feasibility
of the computed topical similarity. Then, we confirmed the
feasibility of the category using topical similarity. We found
that the following behavior described based on the topical
similarity between the follower and followee provided a rea-
sonable explanation for the following relation among users
in different categories. This suggests that both category and
topical similarity are useful for analyzing following behavior.
II. RELATED WORK
A. User Categories on Social Media
Java et al. [5] considered that Twitter users can mainly be
categorized as Information Source, Friends, and Information
Seeker, based on their link structure. Yan et al. [4] used
the follower–followee ratio to determine user characteristics
on ResearchGate, a social media platform for scientists and
researchers. ResearchGate users can share their research pa-
pers and follow other researchers. Yan et al. adopted the user
categories proposed by Java et al. and classified users based
on the follower–followee ratio.
Other researchers have performed classification without
using link structures, using other methods such as classifying
users into five types based on social theory [6] and estimating
Big Five personalities from user profiles [7]. These classi-
fications require training data collected through surveys or
crowdsourcing.
We classified Twitter users into four categories based on
their follower–followee ratio. The categories of Information
Source and Information Seeker were the same as in previous
studies [4], [5]. In addition, we divided the Friends category
into two groups, according to whether the follower–followee
ratio was greater than 1. We assumed that more general users
would have a smaller number of followers than followees and
would exhabit different characteristics.
B. Purpose and Intention of Following
Following behavior depends on the purpose of the follow-
ing, which relates to edge types. Barbieri et al. [1] proposed
arXiv:2210.13874v1 [cs.SI] 25 Oct 2022
a user recommendation method based on whether the edge is
topical or social. Komori et al. [8] analyzed following rela-
tionships by classifying them into virtual and real friendships.
Takemura et al. [3] classified following relationships into
eight types, combining three axes: user-orientation, content-
orientation, and mutuality. These researchers collected data
for each following relationships by using surveys to build a
classification model. However, it is difficult to collect training
data on individual following relationships automatically. Ya-
maguchi et al. [9] proposed a method to explain the reason
for following through coupled tensor analysis using tagging
action (add users to the lists). However, only a few users use
the list feature on Twitter.
We consider that different categories of users tend to have
different main purposes for following. We simply classified
user categories by follower–followee ratio and analyzed the
following behavior according to user category.
C. Homophily on Social Media
Homophily is a phenomenon where users tend to be friends
with similar people [10]. Various types of homophily have
been observed on the online social graph [11]–[13]. In a
previous study [14], topical homophily was reported based on
the user’s topics of interest recognized from tweets using latent
dirichlet allocation (LDA) and the following relationship, and
the authors concluded that the topics of users with following
relationships are similar. We also focused on the topical
homophily of the users’ tweet content.
Homophily relates to network structure. The follower–
followee ratio and homophily of various attributes have been
investigated [15]. Homophily is an important assumption in
network-based user attribute estimation. Hironaka et al. [16]
examined the relationship between the follower–followee ratio
and location homophily using home location estimation. Based
on the data of the countries that are the top-10 users of Twitter,
they reported that the follower–followee ratio contributes to
the estimation performance. In this study, we examined the
relationship between topical homophily and follower–followee
ratio.
III. DATA COLLECTION
First, we randomly extracted users for analysis using Twitter
API. Then, we collected data on their followees and followers.
In addition, we collected their tweets to calculate topical
homophily.
We collected English tweets from July 11 to July 17, 2021,
using Twitter Streaming API1. We randomly selected 50,000
unique users who tweeted at least once in this period.
Next, we collected followees and followers data using API2.
We also collected the latest 3200 tweets using API3. If a user
1https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/
api-reference/post-statuses-filter (viewed 2022-06-10)
2https://developer.twitter.com/en/docs/twitter-api/v1/accounts-and-users/
follow-search-get-users/api-reference/get-followers-ids and https:
//developer.twitter.com/en/docs/twitter-api/v1/accounts-and-users/
follow-search-get-users/api-reference/get-friends-ids (viewed 2022-06-10)
3https://developer.twitter.com/en/docs/twitter-api/v1/tweets/timelines/
api-reference/get-statuses-user timeline (viewed 2022-06-10)
Generate user vectors
from tweets Categorize users
by follower-
followee ratio
Investigate topical
homophily
Investigate preferred user category
for each user category
Investigate similarities of topics for
each user category
Fig. 1. Research workflow
had posted less than 3200 tweets, we collected as many as
possible. As a result, 48,881 user timelines were collected.
In the analysis, we used the data of 48,829 users, that is, the
users whose tweet and follower data that we could successfully
collect. We detected 59,778 following relationships among
them.
IV. USER CLASSIFICATION AND USER VECTOR
In this study, we analyzed users’ following behavior based
on user category and topical homophily. We analyzed the
following workflow showed in Figure 1.
First, we explain the follower–followee ratio to classify
users and then describe the classification method. Second, we
define the user vector for calculating the topical homophily and
then describe the calculation method of topical homophily.
A. Follower–Followee Ratio
The follower–followee ratio is the ratio of the number of
followees Nfollowee to the number of followers Nfollower, as
defined in Equation (1).
follower–followee ratio =Nfollowee + 1
Nfollower + 1 (1)
In Equation (1), we add 1 to the denominator to avoid
devision by zero and to the numerator to guarantee that the
ratio of a user with equal number of followees and followers
become 1.
Figure 2a and 2b, respectively, show examples of users
with high and low follower–followee ratios. Users with a high
follower–followee ratio are those whose number of followees
Nfollowee is signigicantly outnumbered by the number of their
followers Nfollower. The reverse is the case for users with a
low follower–followee ratio.
B. User Classification Using Follower–Followee Ratio
In this study, we classify users into four categories, A
through D, according to the follower–followee ratio. Category
A (Information Seeker) represents users with a follower–
followee ratio of 2.0 or higher, B (Friend) represents users
with a ratio between 1.0 and 1.25, C (Friend Hub) represents
users with a ratio between 0.8 and 1.0, and D (Information
Source) represents users with a ratio of 0.5 or lower. In this
摘要:

Follower–FolloweeRatioCategoryandUserVectorforAnalyzingFollowingBehaviorHayatoOshimo,ShioriHironaka,MitsuoYoshiday,andKyojiUmemuraDepartmentofComputerScienceandEngineeringToyohashiUniversityofTechnologyAichi,JapanEmail:oshimo.hayato.zk@tut.jp,hironaka.shiori.ru@tut.jp,umemura@tut.jpyFacultyofBus...

展开>> 收起<<
FollowerFollowee Ratio Category and User Vector for Analyzing Following Behavior Hayato Oshimo Shiori Hironaka Mitsuo Yoshiday and Kyoji Umemura.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:6 页 大小:386.41KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注