
a user recommendation method based on whether the edge is
topical or social. Komori et al. [8] analyzed following rela-
tionships by classifying them into virtual and real friendships.
Takemura et al. [3] classified following relationships into
eight types, combining three axes: user-orientation, content-
orientation, and mutuality. These researchers collected data
for each following relationships by using surveys to build a
classification model. However, it is difficult to collect training
data on individual following relationships automatically. Ya-
maguchi et al. [9] proposed a method to explain the reason
for following through coupled tensor analysis using tagging
action (add users to the lists). However, only a few users use
the list feature on Twitter.
We consider that different categories of users tend to have
different main purposes for following. We simply classified
user categories by follower–followee ratio and analyzed the
following behavior according to user category.
C. Homophily on Social Media
Homophily is a phenomenon where users tend to be friends
with similar people [10]. Various types of homophily have
been observed on the online social graph [11]–[13]. In a
previous study [14], topical homophily was reported based on
the user’s topics of interest recognized from tweets using latent
dirichlet allocation (LDA) and the following relationship, and
the authors concluded that the topics of users with following
relationships are similar. We also focused on the topical
homophily of the users’ tweet content.
Homophily relates to network structure. The follower–
followee ratio and homophily of various attributes have been
investigated [15]. Homophily is an important assumption in
network-based user attribute estimation. Hironaka et al. [16]
examined the relationship between the follower–followee ratio
and location homophily using home location estimation. Based
on the data of the countries that are the top-10 users of Twitter,
they reported that the follower–followee ratio contributes to
the estimation performance. In this study, we examined the
relationship between topical homophily and follower–followee
ratio.
III. DATA COLLECTION
First, we randomly extracted users for analysis using Twitter
API. Then, we collected data on their followees and followers.
In addition, we collected their tweets to calculate topical
homophily.
We collected English tweets from July 11 to July 17, 2021,
using Twitter Streaming API1. We randomly selected 50,000
unique users who tweeted at least once in this period.
Next, we collected followees and followers data using API2.
We also collected the latest 3200 tweets using API3. If a user
1https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/
api-reference/post-statuses-filter (viewed 2022-06-10)
2https://developer.twitter.com/en/docs/twitter-api/v1/accounts-and-users/
follow-search-get-users/api-reference/get-followers-ids and https:
//developer.twitter.com/en/docs/twitter-api/v1/accounts-and-users/
follow-search-get-users/api-reference/get-friends-ids (viewed 2022-06-10)
3https://developer.twitter.com/en/docs/twitter-api/v1/tweets/timelines/
api-reference/get-statuses-user timeline (viewed 2022-06-10)
Generate user vectors
from tweets Categorize users
by follower-
followee ratio
Investigate topical
homophily
Investigate preferred user category
for each user category
Investigate similarities of topics for
each user category
Fig. 1. Research workflow
had posted less than 3200 tweets, we collected as many as
possible. As a result, 48,881 user timelines were collected.
In the analysis, we used the data of 48,829 users, that is, the
users whose tweet and follower data that we could successfully
collect. We detected 59,778 following relationships among
them.
IV. USER CLASSIFICATION AND USER VECTOR
In this study, we analyzed users’ following behavior based
on user category and topical homophily. We analyzed the
following workflow showed in Figure 1.
First, we explain the follower–followee ratio to classify
users and then describe the classification method. Second, we
define the user vector for calculating the topical homophily and
then describe the calculation method of topical homophily.
A. Follower–Followee Ratio
The follower–followee ratio is the ratio of the number of
followees Nfollowee to the number of followers Nfollower, as
defined in Equation (1).
follower–followee ratio =Nfollowee + 1
Nfollower + 1 (1)
In Equation (1), we add 1 to the denominator to avoid
devision by zero and to the numerator to guarantee that the
ratio of a user with equal number of followees and followers
become 1.
Figure 2a and 2b, respectively, show examples of users
with high and low follower–followee ratios. Users with a high
follower–followee ratio are those whose number of followees
Nfollowee is signigicantly outnumbered by the number of their
followers Nfollower. The reverse is the case for users with a
low follower–followee ratio.
B. User Classification Using Follower–Followee Ratio
In this study, we classify users into four categories, A
through D, according to the follower–followee ratio. Category
A (Information Seeker) represents users with a follower–
followee ratio of 2.0 or higher, B (Friend) represents users
with a ratio between 1.0 and 1.25, C (Friend Hub) represents
users with a ratio between 0.8 and 1.0, and D (Information
Source) represents users with a ratio of 0.5 or lower. In this