Understanding or Manipulation Rethinking Online Performance Gains of Modern Recommender Systems

2025-05-02 0 0 4.14MB 33 页 10玖币

侵权投诉

Understanding or Manipulation: Rethinking Online

Performance Gains of Modern Recommender Systems

ZHENGBANG ZHU, Shanghai Jiao Tong University, China

RONGJUN QIN, National Key Laboratory for Novel Software Technology, Nanjing University, China

and Polixir Technologies, China

JUNJIE HUANG, Shanghai Jiao Tong University, China

XINYI DAI, Shanghai Jiao Tong University, China

YANG YU

†

,National Key Laboratory for Novel Software Technology, Nanjing University, China and Polixir

Technologies, China

YONG YU, Shanghai Jiao Tong University, China

WEINAN ZHANG†,Shanghai Jiao Tong University, China

Recommender systems are expected to be assistants that help human users nd relevant information automati-

cally without explicit queries. As recommender systems evolve, increasingly sophisticated learning techniques

are applied and have achieved better performance in terms of user engagement metrics such as clicks and

browsing time. The increase in the measured performance, however, can have two possible attributions: a

better understanding of user preferences, and a more proactive ability to utilize human bounded rationality to

seduce user over-consumption. A natural following question is whether current recommendation algorithms

are manipulating user preferences. If so, can we measure the manipulation level? In this paper, we present a

general framework for benchmarking the degree of manipulations of recommendation algorithms, in both

slate recommendation and sequential recommendation scenarios. The framework consists of four stages, initial

preference calculation, training data collection, algorithm training and interaction, and metrics calculation that

involves two proposed metrics, Manipulation Score and Preference Shift. We benchmark some representative

recommendation algorithms in both synthetic and real-world datasets under the proposed framework. We

have observed that a high online click-through rate does not necessarily mean a better understanding of

user initial preference, but ends in prompting users to choose more documents they initially did not favor.

Moreover, we nd that the training data have notable impacts on the manipulation degrees, and algorithms

with more powerful modeling abilities are more sensitive to such impacts. The experiments also veried

the usefulness of the proposed metrics for measuring the degree of manipulations. We advocate that future

recommendation algorithm studies should be treated as an optimization problem with constrained user

preference manipulations.

CCS Concepts: •Information systems →Recommender systems.

Additional Key Words and Phrases: Recommender System, User Model, Bounded Rationality

†Corresponding authors.

Authors’ addresses: Zhengbang Zhu, Shanghai Jiao Tong University, China, zhengbangzhu@sjtu.edu.cn; Rongjun Qin,

National Key Laboratory for Novel Software Technology, Nanjing University, China and Polixir Technologies, China,

qinrj@polixir.ai; Junjie Huang, Shanghai Jiao Tong University, China, legend0018@sjtu.edu.cn; Xinyi Dai, Shanghai Jiao

Tong University, China, daixinyi@sjtu.edu.cn; Yang Yu

†

, National Key Laboratory for Novel Software Technology, Nanjing

University, China and Polixir Technologies, China, yuy@polixir.ai; Yong Yu, Shanghai Jiao Tong University, China, yyu@

apex.sjtu.edu.cn; Weinan Zhang†, Shanghai Jiao Tong University, China, wnzhang@sjtu.edu.cn.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee

provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the

full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored.

Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires

prior specic permission and/or a fee. Request permissions from permissions@acm.org.

ACM 1046-8188/2023/12-ART

https://doi.org/10.1145/3637869

ACM Trans. Inf. Syst., Vol. 1, No. 1, Article . Publication date: December 2023.

arXiv:2210.05662v2 [cs.IR] 18 Dec 2023

2 Z. Zhu et al.

ACM Reference Format:

Zhengbang Zhu, Rongjun Qin, Junjie Huang, Xinyi Dai, Yang Yu

†

, Yong Yu, and Weinan Zhang

†

. 2023.

Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems.

ACM Trans. Inf. Syst. 1, 1 (December 2023), 33 pages. https://doi.org/10.1145/3637869

1 INTRODUCTION

With the popularity of the Internet and the growth of User Generated Content (UGC) [

] during

recent years, we are being overwhelmed by a massive volume of information. To save users

from information overload, recommender systems have been widely applied in today’s short

video [

], news [

] and e-commerce [

] platforms. Dierent from traditional information

retrieval techniques, recommender systems are considered to be more advanced by using user

information to personalize the recommendations. This is accomplished through the prediction of the

relevance between users and documents. There are various recommendation algorithms proposed

for relevance modeling, and their development is evidenced by the progressive improvement in

various oine and online metrics. Oine metrics refer to those that can be evaluated with a static

dataset and without interactions with real users. For example, Normalized Discounted Cumulative

Gain (NDCG) is commonly used in recommendation literature to measure how well the model

ranks more relevant documents at the top of the list. On the other hand, online metrics have to

be computed by deploying the trained model online and gathering feedback from real users. As

a general case of online metrics, Click-Through Rate (CTR) shows how often people click on the

displayed documents by a recommendation algorithm.

While improved online metrics bring more trac and revenue to the platform, it is questionable

whether existing online metrics are aligned with the aspiration of recommender systems, which is

to help people nd relevant or favored content. As revealed by research in behavioral economics,

humans make decisions with bounded rationality [

]. Bounded rationality is contrary to expected-

utility models [

] where decisions made are always optimal under some expected utility function.

Instead, humans are trying to make optimal decisions, but are bounded by various cognitive

limitations, e.g., limited memory and decision time. Under the bounded rationality framework, a

variety of phenomena have been observed and investigated, such as the decoy eect, conrmation

bias, anchoring eect, etc [

]. Most of them are possible in online recommendations, which

are already studied by prior works [

]. Once a recommender system is deployed online, it

will actively interact with human users of bounded rationality. Therefore, we cannot distinguish

whether the active participation of users is due to the system’s accurate grasp of their preferences

or to an over-exploitation of their psychological weaknesses. To our knowledge, there is no existing

work that studies whether the performance gain of current recommendation algorithms in online

metrics comes from the utilization of bounded rationality.

Since intrusive evaluations directly on the production platform can have a negative impact on the

company’s benets, building simulated interactive environments is used to assess the properties of

algorithms when interacting with users without actually going online. Agent-based user simulations

are adopted by early works in building dialogue systems [

], where user behaviors are simulated by

either predened rules [

] or statistical models trained on a small amount of data [

]. In recent

years, a series of simulation frameworks or specic environments have been proposed in the eld

of recommender systems. Recsim [

] is a general framework for simulating the interactive process

of online recommendations, where the whole recommender system is decomposed into separate

congurable components. Shi et.al. [

] propose Virtual-Taobao as an interactive e-commence

environment, in which the user behaviors are given by a neural model adversarially trained on

real logged data. However, the evaluation criteria adopted in existing works are aligned with

ACM Trans. Inf. Syst., Vol. 1, No. 1, Article . Publication date: December 2023.

Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems 3

...

Fig. 1. Examples of two typical scenarios in recommender systems. On the le is the search result in an e-

commerce platform, referred to as slate recommendation. On the right is the illustration of a recommendation

queue in music platforms, referred to as sequential recommendation.

traditional online metrics, and there is no simulation framework to specically test the extent to

which recommendation algorithms manipulate user preferences.

To relieve these concerns, in this paper, we provide a general simulation framework Mirror to

evaluate the manipulations of user preferences from recommendation algorithms. The evaluation

consists of four stages. In the rst stage, each simulated user is presented with all the documents

one by one to get a score for each document. Since the scoring happens before the interactions with

recommender systems, and each document itself makes up a slate, such scores are not inuenced

by historically viewed documents or by other documents within the same recommended slate.

Therefore, the obtained scores can serve as the initial preferences of users. In the second stage, we

generate training datasets by using pre-dened initial recommendation strategies to interact with

users. To acquire a diverse set of training datasets to study the impact of how training datasets

inuence trained models, we mix the datasets from multiple initial strategies with dierent ratios.

After that, in the third stage, the recommendation model is trained on those datasets and evaluated

by interacting with simulated users. At the nal stage, the interaction data, as well as users’ initial

preferences, are jointly used to compute evaluation metrics. By incorporating the initial preferences,

we dene the notion of favorite documents and propose several manipulation-aware metrics.

Those metrics can be used in both slate and sequential recommendations, which are two typical

recommendation scenarios as illustrated in Fig. 1. Intuitively, if the users click on more documents

but less proportion of favorite documents, we say the recommendation algorithms are manipulating

the users’ preferences. Also, the manipulations can be quantied by the long-term shift in users’

unbiased preferences.

Under the proposed framework, we conduct extensive studies based on four dierent simulation

experiments, two of which are rule-based and the other two are data-driven. Those environments

cover both slate and sequential recommendations. In slate recommendations, we nd that compared

to simple point-wise models, the reranking methods with the ability to model item relationships

within one slate tend to generate recommendations that have higher clicks but a lower proportion

of clicks on favorite documents. As the training data contains more slates with manipulations,

all the algorithms recommend slates with increasing clicks and decreasing proportion of favorite

documents clicked. Moreover, our case study reveals that the reranking method makes use of

ACM Trans. Inf. Syst., Vol. 1, No. 1, Article . Publication date: December 2023.

4 Z. Zhu et al.

comparison bias, where people tend to choose those documents if they have already seen similar

but worse alternatives within the same list. In sequential recommendations, the experimental

results show that sequential recommendation algorithms that take users’ recent interaction history

into account can lead to more signicant manipulation of users’ preferences.

To summarize, the main contributions of our work are:

•

We put forward the potential issue of manipulations on user preference in current recom-

mendation systems, and propose a general and congurable framework named Mirror

that can quantify the degree of such manipulations of recommendation algorithms. The

core procedure inside the benchmark is to make simulated users view and score the entire

documents one by one, through which we can access the initial preferences of each user.

•

By utilizing the proposed framework, we instantiate four benchmark scenarios, ranging from

slate recommendations and sequential recommendations, and including both synthetic and

data-driven user behavior models.

•

We benchmark several recommendation algorithms under these scenarios, and draw up a

number of key ndings on manipulations from recommendation systems.

–

In both slate and sequential recommendations, better performances on traditional online

metrics are accompanied by more manipulations on user preference.

–

In both slate and sequential recommendations, the degree of manipulation from recom-

mendation algorithms is overall positively correlated with the manipulation degree of the

recommendation algorithm that collects the training data.

–

In slate recommendations, reranking methods are more actively using the mutual inuence

of documents within the slate to improve the overall clicks compared to point-wise ranking

methods. And such inuence often leads users to click on documents they do not originally

favor.

–

In sequential recommendations, recommendation models that take the user’s recent inter-

action behavior into account can make users choose less proportion of originally favored

documents and induce greater changes to user preferences.

The remaining part of this paper is organized as follows. Section 2provides an overview of

current recommender systems and the emerging concerns on negative impacts. In Section 3, we

make two necessary assumptions to delimit the scope of manipulations studied in this paper. In

Section 4, we introduce our benchmark framework Mirror, including its components, benchmark

procedure and evaluation metrics. In Section 5, we conduct four benchmark experiments and

analyze the results. Related works are summarized in Section 6. We nally conclude this paper in

Section 7.

2 AN OVERVIEW OF CURRENT RECOMMENDER SYSTEMS

We are living in an era of information explosion. With the popularity of the internet, we have

access to far more information than ever before, making it dicult to nd what we need among

the vast amount of content. To save people from information overload and nd the most relevant

content for each of us, modern recommender systems make use of rich user proles to model user

preferences. These systems utilize techniques like collaborative ltering to guess what users might

favor, using not only their own but also other users’ behaviors. The more you use the recommender

system, and the more other people are using the system, the more you will be recommended with

relevant results [49,73].

If we look at the other side, the situation is also thriving. For companies who are deploying large-

scale recommender systems on their products or platforms, personalized recommendation greatly

improves the retention rate of users, as well as the clicks or purchases, depending on the service

ACM Trans. Inf. Syst., Vol. 1, No. 1, Article . Publication date: December 2023.

Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems 5

they provide. The method from a pioneer study [

] from Google achieves an average increase

of 38% in click rates on its news platforms with the personalized recommendation, compared to

general methods. Even more gratifyingly, the e-commerce giant e-Bay reported a 500% spike in

Gross Merchandise Volume in online A/B Testing [

] with their personalized recommendation.

In industry, recommender systems are continuing to iterate, taking advantage of the increasing

amount of data and computing power available to deliver signicant revenue for businesses.

Fig. 2. The overview of the workflow of a typical recommender system. Here we show user feedback as clicks,

but in practice, user feedback can also be purchases, ratings, or just browsing, etc.

Let us briey review the working mechanism of a standard recommender system. As illustrated

in Fig. 2, we consider the multi-stage recommendation setting with user clicks as feedback. The

users can interact with the recommender system

for one or multiple rounds. Each round starts

with a user

𝑢

proposing a query

𝑞

. During round

𝑟

, the user is recommended with a sorted list

𝑇

documents

𝐷𝑟={𝑑𝑟

1, 𝑑𝑟

2, . . . , 𝑑𝑟

𝑇}

, and the user can choose to click on any number of documents,

forming a click sequence 𝐶𝑟={𝑐𝑟

1, 𝑐𝑟

2, . . . , 𝑐𝑟

𝑇}. The process continues until the user exits.

The recommended list

𝐷𝑟

is generated by the recommender system from a three-stage procedure.

A small candidate set

D𝑞

is rst retrieved based on

𝑞

by a recall algorithm from the full document

set. Then the documents in the candidate set are scored and sorted by a ranking algorithm. Top-

𝑇

ranked items form the initial ranking list and are subsequently processed by a reranking algorithm

to get a rened order. The reordered list is used as the recommended list

𝐷𝑟

and displayed to the

ACM Trans. Inf. Syst., Vol. 1, No. 1, Article . Publication date: December 2023.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnderstandingorManipulation:RethinkingOnlinePerformanceGainsofModernRecommenderSystemsZHENGBANGZHU,ShanghaiJiaoTongUniversity,ChinaRONGJUNQIN,NationalKeyLaboratoryforNovelSoftwareTechnology,NanjingUniversity,ChinaandPolixirTechnologies,ChinaJUNJIEHUANG,ShanghaiJiaoTongUniversity,ChinaXINYIDAI,Shangh...

展开>> 收起<<

Understanding or Manipulation Rethinking Online Performance Gains of Modern Recommender Systems.pdf

共33页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Understanding or Manipulation Rethinking Online Performance Gains of Modern Recommender Systems

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: