Cognitive Models as Simulators The Case of Moral Decision-Making Ardavan S. Nobandegani125 Thomas R. Shultz13 Irina Rish45 fardavan.salehinobandegani thomas.shultz gmcgill.ca

2025-04-27 1 0 1.18MB 7 页 10玖币
侵权投诉
Cognitive Models as Simulators: The Case of Moral Decision-Making
Ardavan S. Nobandegani1,2,5, Thomas R. Shultz1,3, & Irina Rish4,5
{ardavan.salehinobandegani, thomas.shultz}@mcgill.ca
{irina.rish}@mila.quebec
1Department of Psychology, McGill University
2Department of Electrical & Computer Engineering, McGill University
3School of Computer Science, McGill University
4Department of Computer Science & Operations Research, Universit´
e de Montr´
eal
5Mila - Quebec AI Institute
Abstract
To achieve desirable performance, current AI systems of-
ten require huge amounts of training data. This is especially
problematic in domains where collecting data is both expen-
sive and time-consuming, e.g., where AI systems require hav-
ing numerous interactions with humans, collecting feedback
from them. In this work, we substantiate the idea of cogni-
tive models as simulators, which is to have AI systems in-
teract with, and collect feedback from, cognitive models in-
stead of humans, thereby making their training process both
less costly and faster. Here, we leverage this idea in the con-
text of moral decision-making, by having reinforcement learn-
ing (RL) agents learn about fairness through interacting with
a cognitive model of the Ultimatum Game (UG), a canonical
task in behavioral and brain sciences for studying fairness. In-
terestingly, these RL agents learn to rationally adapt their be-
havior depending on the emotional state of their simulated UG
responder. Our work suggests that using cognitive models as
simulators of humans is an effective approach for training AI
systems, presenting an important way for computational cog-
nitive science to make contributions to AI.
Keywords: reinforcement learning; moral decision-making;
Ultimatum game; fairness; emotions; cognitive models
1 Introduction
Recent years have witnessed artificial intelligence (AI) sys-
tems with remarkable abilities (e.g., Devlin et al., 2018; Sil-
ver et al., 2016; Goyal et al., 2021), whose success critically
depends on having access to huge amounts of training data.
Examples include the famous Google BERT language model
pre-trained on 800M words from BooksCorpus and 2,500M
words from Wikipedia (Devlin et al., 2018), the DeepMind
AlphaGo system trained on over 30M expert moves (Silver
et al., 2016), the OpenAI GPT-3 model pre-trained on 300
billion tokens (Brown et al., 2020), and the recent Facebook
SEER image recognition model trained on one billion images
from Instagram photos (Goyal et al., 2021).
Indeed, an influential subfield of AI, called reinforcement
learning (RL), requires AI agents to learn by having interac-
tions with their environment to collect feedback, in the form
of rewards (Sutton & Barto, 2018). This is especially chal-
lenging in settings where the environment consists of human
agents, resulting in these interactions being both expensive
and time-consuming, thus exacerbating the training process.
Could we instead use cognitive models, as a proxy for hu-
mans, to address this issue?
In this work, we substantiate the idea of using cognitive
models as simulators, which is to have AI systems interact
with, and collect feedback from, cognitive models instead
of humans, thereby making their training process both less
costly and faster. Here, for the first time in the literature, we
leverage this idea in the context of moral decision-making
(Haidt, 2007; Lapsley, 2018), by having RL agents learn
about fairness through interacting with a cognitive model of
the Ultimatum Game (UG), a well-established game in be-
havioral and brain sciences for studying fairness (e.g., Sanfey,
2009; Battigalli et al., 2015; Vavra et al., 2018; Sanfey et al.,
2003; Xiang et al., 2013; Chang & Sanfey, 2013). Interest-
ingly, these RL agents learn to rationally adapt their behavior
depending on the emotional state of their UG Responder (see
Sec. 1 for an explanation of how UG works). Our work sug-
gests that using cognitive models as simulators of humans is
an effective approach for training AI systems, presenting an
important way for computational cognitive science to make
contributions to the field of AI.
We begin by describing UG and presenting an overview of
the relevant psychological findings on the role of emotions
in UG (Sec. 2). We then discuss in Sec. 3 a process model
of UG Responder under a variety of emotional states (Liz-
zotte, Nobandegani, & Shultz, 2021; Nobandegani, Destais,
& Shultz, 2020), and subsequently present our RL train-
ing results under various UG Responder’s emotional states
(Sec. 4). We conclude by discussing the implications of our
work for the fields of cognitive science and AI, and synergis-
tic interactions between the two (Sec. 5).
2 UG and the Role of Emotions in UG
The Ultimatum Game (UG; G¨
uth et al., 1982) is a canonical
task for studying fairness, and has been extensively studied in
psychology (e.g., Sanfey, 2009; Battigalli et al., 2015; Vavra
et al., 2018), neuroscience (Sanfey et al., 2003; Xiang et al.,
2013; Chang & Sanfey, 2013), philosophy (Guala, 2008), and
behavioral economics (e.g., G¨
uth et al., 1982; Thaler, 1988;
Camerer & Thaler, 1995; Fehr & Schmidt, 1999; Sutter et
al., 2003; Camerer & Fehr, 2006). UG has a simple design:
Two players, Proposer and Responder, must agree on how to
split a sum of money. Proposer makes an offer. If Responder
accepts, the deal goes through; if Responder rejects, neither
player gets anything. In both cases, the game is over.
An extensive body of empirical work has established that
UG Proposers predominantly respect fairness by offering
about 50% of the endowed amount, and that this split is al-
arXiv:2210.04121v1 [cs.AI] 8 Oct 2022
most invariably accepted by UG Responders (see Camerer,
2011). Relatedly, UG Responders often reject offers below
30%, presumably as retaliation for being treated unfairly
(G¨
uth et al., 1982; Thaler, 1988; G¨
uth & Tietz, 1990; Bolton
& Zwick, 1995; Nowak et al., 2000; Camerer & Fehr, 2006).
A growing body of experimental work has revealed that in-
duced emotions strongly affect UG Responder’s accept/reject
behavior, with positive emotions increasing the chance of low
offers being accepted (e.g., Riepl et al., 2016; Andrade &
Ariely, 2009), and negative emotions decreasing the chance
of low offers being accepted (e.g., Bonini et al., 2011; Harl´
e
& Sanfey, 2010; Liu et al., 2016; Moretti & Di Pellegrino,
2010; Vargas et al., 2019). Experimentally, these emotions
are often induced by a movie clip or recall task.
3 A Computational Model of UG Responder
Recently, Nobandegani et al. (2020) presented a process
model of UG Responder, called sample-based expected util-
ity (SbEU). SbEU provides a unified account of several dis-
parate empirical findings in UG (i.e., the effects of expecta-
tion, competition, and time pressure on UG Responder), and
also explains the effect of a wide range of emotions on UG
Responder (Lizzotte, Nobandegani, & Shultz, 2021).
Nobandegani et al.s process-level account rests on two
main assumptions. First, UG Responder uses SbEU to esti-
mate the expected-utility gap between their expectation and
the offer, i.e., E[u(offer)u(expectation)], where u(·)de-
notes Responder’s utility function. If this estimate is pos-
itive — indicating that the offer made is, on average, higher
than Responder’s expectation — Responder accepts the offer;
otherwise, Responder rejects the offer. This assumption is
supported by substantial empirical evidence showing that Re-
sponder’s expectation serves as a reference point for subjec-
tive valuation of offers (Sanfey, 2009; Battigalli et al., 2015;
Vavra et al., 2018; Xiang et al., 2013; Chang & Sanfey, 2013).
The second assumption is that negative emotions elevate
loss-aversion while positive emotions lower loss-aversion
(Lizotte et al., 2021). Again, this assumptions is supported by
mounting empirical evidence (e.g., De Martino et al., 2010;
Sokol-Hessner et al., 2015, 2009) suggesting that emotions
modulate loss-aversion — the tendency to overweight losses
as compared to gains (Kahneman & Tverskey, 1979).
Concretely, SbEU assumes that an agent estimates ex-
pected utility:
E[u(o)] = Zp(o)u(o)do,(1)
using self-normalized importance sampling (Nobandegani et
al., 2018; Nobandegani & Shultz, 2020b, 2020c), with its im-
portance distribution qaiming to optimally minimize mean-
squared error (MSE):
ˆ
E=1
s
j=1wj
s
i=1
wiu(oi),i:oiq,wi=p(oi)
q(oi),(2)
q(o)p(o)|u(o)|s1+|u(o)|s
|u(o)|s.(3)
MSE is a standard measure of estimation quality, widely used
in decision theory and mathematical statistics (Poor, 2013).
In Eqs. (1-3), odenotes an outcome of a risky gamble, p(o)
the objective probability of outcome o,u(o)the subjective
utility of outcome o,ˆ
Ethe importance-sampling estimate of
expected utility given in Eq. (1), qthe importance-sampling
distribution, oian outcome randomly sampled from q, and s
the number of samples drawn from q.
SbEU has so far explained a broad range of empirical
findings in human decision-making, e.g., the fourfold pat-
terns of risk preferences in both outcome probability and out-
come magnitude (Nobandegani et al., 2018), risky decoy and
violation of betweenness (Nobandegani et al., 2019c), vio-
lation of stochastic dominance (Xia, Nobandegani, Shultz,
& Bhui, 2022), violation of cumulative independence (Cao,
Nobandegani, & Shultz, 2022), the three contextual effects of
similarity, attraction, and compromise (da Silva Castanheira,
Nobandegani, Shultz, & Otto, 2019), the Allais, St. Peters-
burg, and Ellsberg paradoxes (Nobandegani & Shultz, 2020b,
2020c; Nobandegani et al., 2021), cooperation in Prisoner’s
Dilemma (Nobandegani et al., 2019a), and human coordina-
tion behavior in coordination games (Nobandegani & Shultz,
2020a). Notably, SbEU is the first, and thus far the only,
resource-rational process model that bridges between risky,
value-based, and game-theoretic decision-making.
4 Training RL Agents in UG
In this section, we substantiate the idea of cognitive mod-
els as simulators in the context of moral decision-making,
by having RL agents learn about fairness through interacting
with a cognitive model of UG Responder (Nobandegani et
al., 2020), as a proxy for human Responders, thereby making
their training process both less costly and faster.
To train RL Proposers, we leverage the broad framework
of multi-armed bandits in reinforcement learning (Katehakis
& Veinott, 1987; Gittins, 1979), and adopt the well-known
Thompson Sampling method (Thompson, 1933). Specifi-
cally, we assume that RL Proposer should decide what per-
centage of the total money Tthey are willing to offer to
SbEU Responder. For ease of analysis, here we assume
that RL Proposer chooses between a finite set of options:
A={0,
T
10 ,
2T
10 ,··· ,
9T
10 ,T}.
In reinforcement learning terminology, RL Proposer
learns, through trial and error while striking a balance be-
tween exploration and exploitation, which option aA
yields the highest mean reward. Here, we train RL Proposers
using Thompson Sampling, a well-established method in the
reinforcement learning literature enjoying strong optimality
guarantees (Agrawal & Goyal, 2012, 2013); see Algorithm 1.
Algorithm 1 can be described in simple terms as follows.
At the start, i.e., prior to any learning, the number of times
an offer aAis so far accepted, Sa(S for success), and the
number of times it is rejected, Fa(F for failure), are both set to
摘要:

CognitiveModelsasSimulators:TheCaseofMoralDecision-MakingArdavanS.Nobandegani1;2;5,ThomasR.Shultz1;3,&IrinaRish4;5fardavan.salehinobandegani,thomas.shultzg@mcgill.cafirina.rishg@mila.quebec1DepartmentofPsychology,McGillUniversity2DepartmentofElectrical&ComputerEngineering,McGillUniversity3SchoolofCo...

展开>> 收起<<
Cognitive Models as Simulators The Case of Moral Decision-Making Ardavan S. Nobandegani125 Thomas R. Shultz13 Irina Rish45 fardavan.salehinobandegani thomas.shultz gmcgill.ca.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:1.18MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注