Cognitive Models as Simulators The Case of Moral Decision-Making Ardavan S. Nobandegani125 Thomas R. Shultz13 Irina Rish45 fardavan.salehinobandegani thomas.shultz gmcgill.ca

2025-04-27 3 0 1.18MB 7 页 10玖币

侵权投诉

Cognitive Models as Simulators: The Case of Moral Decision-Making

Ardavan S. Nobandegani1,2,5, Thomas R. Shultz1,3, & Irina Rish4,5

{ardavan.salehinobandegani, thomas.shultz}@mcgill.ca

{irina.rish}@mila.quebec

1Department of Psychology, McGill University

2Department of Electrical & Computer Engineering, McGill University

3School of Computer Science, McGill University

4Department of Computer Science & Operations Research, Universit´

e de Montr´

eal

5Mila - Quebec AI Institute

Abstract

To achieve desirable performance, current AI systems of-

ten require huge amounts of training data. This is especially

problematic in domains where collecting data is both expen-

sive and time-consuming, e.g., where AI systems require hav-

ing numerous interactions with humans, collecting feedback

from them. In this work, we substantiate the idea of cogni-

tive models as simulators, which is to have AI systems in-

teract with, and collect feedback from, cognitive models in-

stead of humans, thereby making their training process both

less costly and faster. Here, we leverage this idea in the con-

text of moral decision-making, by having reinforcement learn-

ing (RL) agents learn about fairness through interacting with

a cognitive model of the Ultimatum Game (UG), a canonical

task in behavioral and brain sciences for studying fairness. In-

terestingly, these RL agents learn to rationally adapt their be-

havior depending on the emotional state of their simulated UG

responder. Our work suggests that using cognitive models as

simulators of humans is an effective approach for training AI

systems, presenting an important way for computational cog-

nitive science to make contributions to AI.

Keywords: reinforcement learning; moral decision-making;

Ultimatum game; fairness; emotions; cognitive models

1 Introduction

Recent years have witnessed artiﬁcial intelligence (AI) sys-

tems with remarkable abilities (e.g., Devlin et al., 2018; Sil-

ver et al., 2016; Goyal et al., 2021), whose success critically

depends on having access to huge amounts of training data.

Examples include the famous Google BERT language model

pre-trained on 800M words from BooksCorpus and 2,500M

words from Wikipedia (Devlin et al., 2018), the DeepMind

AlphaGo system trained on over 30M expert moves (Silver

et al., 2016), the OpenAI GPT-3 model pre-trained on 300

billion tokens (Brown et al., 2020), and the recent Facebook

SEER image recognition model trained on one billion images

from Instagram photos (Goyal et al., 2021).

Indeed, an inﬂuential subﬁeld of AI, called reinforcement

learning (RL), requires AI agents to learn by having interac-

tions with their environment to collect feedback, in the form

of rewards (Sutton & Barto, 2018). This is especially chal-

lenging in settings where the environment consists of human

agents, resulting in these interactions being both expensive

and time-consuming, thus exacerbating the training process.

Could we instead use cognitive models, as a proxy for hu-

mans, to address this issue?

In this work, we substantiate the idea of using cognitive

models as simulators, which is to have AI systems interact

with, and collect feedback from, cognitive models instead

of humans, thereby making their training process both less

costly and faster. Here, for the ﬁrst time in the literature, we

leverage this idea in the context of moral decision-making

(Haidt, 2007; Lapsley, 2018), by having RL agents learn

about fairness through interacting with a cognitive model of

the Ultimatum Game (UG), a well-established game in be-

havioral and brain sciences for studying fairness (e.g., Sanfey,

2009; Battigalli et al., 2015; Vavra et al., 2018; Sanfey et al.,

2003; Xiang et al., 2013; Chang & Sanfey, 2013). Interest-

ingly, these RL agents learn to rationally adapt their behavior

depending on the emotional state of their UG Responder (see

Sec. 1 for an explanation of how UG works). Our work sug-

gests that using cognitive models as simulators of humans is

an effective approach for training AI systems, presenting an

important way for computational cognitive science to make

contributions to the ﬁeld of AI.

We begin by describing UG and presenting an overview of

the relevant psychological ﬁndings on the role of emotions

in UG (Sec. 2). We then discuss in Sec. 3 a process model

of UG Responder under a variety of emotional states (Liz-

zotte, Nobandegani, & Shultz, 2021; Nobandegani, Destais,

& Shultz, 2020), and subsequently present our RL train-

ing results under various UG Responder’s emotional states

(Sec. 4). We conclude by discussing the implications of our

work for the ﬁelds of cognitive science and AI, and synergis-

tic interactions between the two (Sec. 5).

2 UG and the Role of Emotions in UG

The Ultimatum Game (UG; G¨

uth et al., 1982) is a canonical

task for studying fairness, and has been extensively studied in

psychology (e.g., Sanfey, 2009; Battigalli et al., 2015; Vavra

et al., 2018), neuroscience (Sanfey et al., 2003; Xiang et al.,

2013; Chang & Sanfey, 2013), philosophy (Guala, 2008), and

behavioral economics (e.g., G¨

uth et al., 1982; Thaler, 1988;

Camerer & Thaler, 1995; Fehr & Schmidt, 1999; Sutter et

al., 2003; Camerer & Fehr, 2006). UG has a simple design:

Two players, Proposer and Responder, must agree on how to

split a sum of money. Proposer makes an offer. If Responder

accepts, the deal goes through; if Responder rejects, neither

player gets anything. In both cases, the game is over.

An extensive body of empirical work has established that

UG Proposers predominantly respect fairness by offering

about 50% of the endowed amount, and that this split is al-

arXiv:2210.04121v1 [cs.AI] 8 Oct 2022

most invariably accepted by UG Responders (see Camerer,

2011). Relatedly, UG Responders often reject offers below

30%, presumably as retaliation for being treated unfairly

(G¨

uth et al., 1982; Thaler, 1988; G¨

uth & Tietz, 1990; Bolton

& Zwick, 1995; Nowak et al., 2000; Camerer & Fehr, 2006).

A growing body of experimental work has revealed that in-

duced emotions strongly affect UG Responder’s accept/reject

behavior, with positive emotions increasing the chance of low

offers being accepted (e.g., Riepl et al., 2016; Andrade &

Ariely, 2009), and negative emotions decreasing the chance

of low offers being accepted (e.g., Bonini et al., 2011; Harl´

& Sanfey, 2010; Liu et al., 2016; Moretti & Di Pellegrino,

2010; Vargas et al., 2019). Experimentally, these emotions

are often induced by a movie clip or recall task.

3 A Computational Model of UG Responder

Recently, Nobandegani et al. (2020) presented a process

model of UG Responder, called sample-based expected util-

ity (SbEU). SbEU provides a uniﬁed account of several dis-

parate empirical ﬁndings in UG (i.e., the effects of expecta-

tion, competition, and time pressure on UG Responder), and

also explains the effect of a wide range of emotions on UG

Responder (Lizzotte, Nobandegani, & Shultz, 2021).

Nobandegani et al.’s process-level account rests on two

main assumptions. First, UG Responder uses SbEU to esti-

mate the expected-utility gap between their expectation and

the offer, i.e., E[u(offer)−u(expectation)], where u(·)de-

notes Responder’s utility function. If this estimate is pos-

itive — indicating that the offer made is, on average, higher

than Responder’s expectation — Responder accepts the offer;

otherwise, Responder rejects the offer. This assumption is

supported by substantial empirical evidence showing that Re-

sponder’s expectation serves as a reference point for subjec-

tive valuation of offers (Sanfey, 2009; Battigalli et al., 2015;

Vavra et al., 2018; Xiang et al., 2013; Chang & Sanfey, 2013).

The second assumption is that negative emotions elevate

loss-aversion while positive emotions lower loss-aversion

(Lizotte et al., 2021). Again, this assumptions is supported by

mounting empirical evidence (e.g., De Martino et al., 2010;

Sokol-Hessner et al., 2015, 2009) suggesting that emotions

modulate loss-aversion — the tendency to overweight losses

as compared to gains (Kahneman & Tverskey, 1979).

Concretely, SbEU assumes that an agent estimates ex-

pected utility:

E[u(o)] = Zp(o)u(o)do,(1)

using self-normalized importance sampling (Nobandegani et

al., 2018; Nobandegani & Shultz, 2020b, 2020c), with its im-

portance distribution q∗aiming to optimally minimize mean-

squared error (MSE):

E=1

∑s

j=1wj

∑

i=1

wiu(oi),∀i:oi∼q∗,wi=p(oi)

q∗(oi),(2)

q∗(o)∝p(o)|u(o)|s1+|u(o)|√s

|u(o)|√s.(3)

MSE is a standard measure of estimation quality, widely used

in decision theory and mathematical statistics (Poor, 2013).

In Eqs. (1-3), odenotes an outcome of a risky gamble, p(o)

the objective probability of outcome o,u(o)the subjective

utility of outcome o,ˆ

Ethe importance-sampling estimate of

expected utility given in Eq. (1), q∗the importance-sampling

distribution, oian outcome randomly sampled from q∗, and s

the number of samples drawn from q∗.

SbEU has so far explained a broad range of empirical

ﬁndings in human decision-making, e.g., the fourfold pat-

terns of risk preferences in both outcome probability and out-

come magnitude (Nobandegani et al., 2018), risky decoy and

violation of betweenness (Nobandegani et al., 2019c), vio-

lation of stochastic dominance (Xia, Nobandegani, Shultz,

& Bhui, 2022), violation of cumulative independence (Cao,

Nobandegani, & Shultz, 2022), the three contextual effects of

similarity, attraction, and compromise (da Silva Castanheira,

Nobandegani, Shultz, & Otto, 2019), the Allais, St. Peters-

burg, and Ellsberg paradoxes (Nobandegani & Shultz, 2020b,

2020c; Nobandegani et al., 2021), cooperation in Prisoner’s

Dilemma (Nobandegani et al., 2019a), and human coordina-

tion behavior in coordination games (Nobandegani & Shultz,

2020a). Notably, SbEU is the ﬁrst, and thus far the only,

resource-rational process model that bridges between risky,

value-based, and game-theoretic decision-making.

4 Training RL Agents in UG

In this section, we substantiate the idea of cognitive mod-

els as simulators in the context of moral decision-making,

by having RL agents learn about fairness through interacting

with a cognitive model of UG Responder (Nobandegani et

al., 2020), as a proxy for human Responders, thereby making

their training process both less costly and faster.

To train RL Proposers, we leverage the broad framework

of multi-armed bandits in reinforcement learning (Katehakis

& Veinott, 1987; Gittins, 1979), and adopt the well-known

Thompson Sampling method (Thompson, 1933). Speciﬁ-

cally, we assume that RL Proposer should decide what per-

centage of the total money Tthey are willing to offer to

SbEU Responder. For ease of analysis, here we assume

that RL Proposer chooses between a ﬁnite set of options:

A={0,

10 ,

10 ,··· ,

10 ,T}.

In reinforcement learning terminology, RL Proposer

learns, through trial and error while striking a balance be-

tween exploration and exploitation, which option a∈A

yields the highest mean reward. Here, we train RL Proposers

using Thompson Sampling, a well-established method in the

reinforcement learning literature enjoying strong optimality

guarantees (Agrawal & Goyal, 2012, 2013); see Algorithm 1.

Algorithm 1 can be described in simple terms as follows.

At the start, i.e., prior to any learning, the number of times

an offer a∈Ais so far accepted, Sa(S for success), and the

number of times it is rejected, Fa(F for failure), are both set to

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CognitiveModelsasSimulators:TheCaseofMoralDecision-MakingArdavanS.Nobandegani1;2;5,ThomasR.Shultz1;3,&IrinaRish4;5fardavan.salehinobandegani,thomas.shultzg@mcgill.cafirina.rishg@mila.quebec1DepartmentofPsychology,McGillUniversity2DepartmentofElectrical&ComputerEngineering,McGillUniversity3SchoolofCo...

展开>> 收起<<

Cognitive Models as Simulators The Case of Moral Decision-Making Ardavan S. Nobandegani125 Thomas R. Shultz13 Irina Rish45 fardavan.salehinobandegani thomas.shultz gmcgill.ca.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Cognitive Models as Simulators The Case of Moral Decision-Making Ardavan S. Nobandegani125 Thomas R. Shultz13 Irina Rish45 fardavan.salehinobandegani thomas.shultz gmcgill.ca

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: