MBTI Personality Prediction for Fictional Characters Using Movie Scripts Yisi Sang1Xiangyang Mou2Mo Yu3Dakuo Wang4Jing Li5Jeffrey Stanton1

2025-05-02 0 0 1.63MB 11 页 10玖币

侵权投诉

MBTI Personality Prediction for Fictional Characters Using Movie

Scripts

Yisi Sang1∗Xiangyang Mou2∗Mo Yu3∗Dakuo Wang4Jing Li5Jeffrey Stanton1

1Syracuse University 2Rensselaer Polytechnic Institute 3Pattern Recognition Center, WeChat AI

4IBM Research, Northeastern University 5New Jersey Institute of Technology

yisang@syr.edu moux4@rpi.edu moyumyu@tencent.com

Abstract

An NLP model that understands stories should

be able to understand the characters in them.

To support the development of neural models

for this purpose, we construct a benchmark,

Story2Personality. The task is to pre-

dict a movie character’s MBTI or Big 5 per-

sonality types based on the narratives of the

character. Experiments show that our task

is challenging for the existing text classiﬁca-

tion models, as none is able to largely outper-

form random guesses. We further proposed

a multi-view model for personality prediction

using both verbal and non-verbal descriptions,

which gives improvement compared to using

only verbal descriptions. The uniqueness and

challenges in our dataset call for the develop-

ment of narrative comprehension techniques

from the perspective of understanding charac-

ters.1

1 Introduction

Character comprehension is commonly regarded

as the cornerstone to comprehending stories in psy-

chology and education (Bower and Morrow,1990;

Paris and Paris,2003;Zhao et al.,2022). The NLP

community has done some work on character com-

prehension in reading comprehension tasks, but

most of the existing studies focus on short or expos-

itory texts (e.g., story summaries) (Urbanek et al.,

2019;Brahman et al.,2021). Moreover, most of

them are limited in factoid understanding of char-

acters, such as coreference resolution (Chen and

Choi,2016) and character relationships (Iyyer et al.,

2016), and few studies have explored deeper com-

prehension of characters’ persona (Flekova and

Gurevych,2015;Sang et al.,2022a), on which hu-

mans can generally do well.

∗

Authors contributed equally to this paper. Mo Yu is the

corresponding author.

Our code and data are released at

https://github.

com/YisiSang/Story2Personality

Figure 1:

An example excerpt from the movie script of “The

Matrix”. Blue utterances are the character Morpheus’s

scene

descriptions

,red are his

dialogues

.Morpheus’s MBTI per-

sonality was rated as ENFJ by 300 user votes.

We propose

Story2Personality

, a new

narrative understanding benchmark to encourage

the study of character understanding. The goal of

Story2Personality

is to predict personality

according to the character’s narrative texts in the

script.

Personality prediction from narratives has many

challenges. First, stories often use a variety of

narrative clues (e.g., scenery changes), sequence

(e.g., ﬂashback) and rhetorical techniques (e.g.,

metaphor) (Xu et al.,2022b). Second, the inputs of

the task are long (

10K words on average), chal-

lenging the applications of Transformer-based mod-

els (Vaswani et al.,2017). Third, both the scene

descriptions and dialogues are informative for the

prediction, requiring models to jointly consider

multiple views of inputs.

This study makes the following contributions:

•

We establish a large-scale dataset for personality

prediction of narrative characters that can support

the development of neural models. Our dataset

consists of 3,543 characters from 507 movies

arXiv:2210.10994v1 [cs.AI] 20 Oct 2022

with MBTI labels of four dimensions. In compar-

ison, the only existing related dataset (Flekova

and Gurevych,2015) contains only 298 charac-

ters and focuses on a single dimension. Our

dataset is proved challenging — on this binary

classiﬁcation task, none of the baselines achieve

higher than 60% macro-F1.

•

We develop a movie script parser to automati-

cally process a script to a structured form with

the verbal character dialogues and the non-verbal

scene descriptions illustrating backgrounds. Hu-

man study shows that our parser is more accurate

compared to previous rule-based tools.

•

We propose an extension to BERT classiﬁer (De-

vlin et al.,2018) to handle the long and multi-

view (verbal and non-verbal) inputs. Our model

improves 2-3% over the baselines. This shows

the potential of exploiting both verbal and non-

verbal narratives of characters, which is consis-

tent with psychological theory (McCroskey and

Richmond,1996;Richmond et al.,2008); and

suggests directions of future model design.

2 Related Work

Character-Centric Narrative Understanding

There have been existing studies on character-

centric narrative understanding. While many of

them (Massey et al.,2015;Srivastava et al.,2016;

Brahman et al.,2021) work on summaries of sto-

ries or summaries of characters. Their scopes thus

have a different assessment purpose from ours, and

have the challenge on understanding long narrative

inputs greatly reduced.

For works that use long narratives, most of them

study the inter-character relationship (Elson et al.,

2010;Elsner,2012;Elangovan and Eisenstein,

2015;Iyyer et al.,2016;Chaturvedi et al.,2016,

2017;Kim and Klinger,2019). Inter-character re-

lationship is also related to social network theories.

Various of relationships have been considered in

these studies, while most of them rely on unsuper-

vised learning and do not provide labeled data for

a direct automatic evaluation. TVSHOWGUESS

explored multiple perspectives of persona using

long narratives but the task format is different from

us (Sang et al.,2022b).

Finally, there is work on fundamental NLP anno-

tating techniques over books and screenplays, such

as named entity recognition (Bamman et al.,2019),

coreference resolution (Chen and Choi,2016),

event-centric extraction (Xu et al.,2022a), and

entity-centric natural language modeling (Clark

et al.,2018) which is different from narrative un-

derstanding. Their techniques can be helpful to our

task but the scope of their research is different from

character-centric comprehension.

Latent Persona Induction

Besides (Flekova

and Gurevych,2015) that is similar to our work

in terms of the focus on personality classiﬁcation,

there is another line of related work on latent per-

sona induction (Bamman et al.,2013). The work

learns a topic model over character behaviors from

books, and each latent topic corresponds to an in-

duced persona. The induced persona vectors can

be then applied to potential applications as a type

of character representation.

From the perspective of practicality, our work

and (Bamman et al.,2013) have their own strengths.

From our motivation of story comprehension as-

sessment, the difference is whether we provide a

direct evaluation of the character understanding or

evaluate it in down-streaming tasks – similar to the

aforementioned relationship detection work, it is

also difﬁcult to provide an automatic and objective

evaluation for the task of (Bamman et al.,2013).

The advantage of our task is that it supports direct

automatic evaluation by itself, without the need for

further downstream tasks; and it can be also used to

evaluate the methods for the task of (Bamman et al.,

2013). Moreover, compared to a direct evaluation,

the performance on a down-streaming task can be

affected by other factors other than persona so a

good performance on downstream tasks may not

come directly caused by a good persona represen-

tation. The cons of our task is that it is limited to

the personality types that have human annotations.

3 Background of MBTI

Personality is a “stable and measurable” individ-

ual characteristic (Vinciarelli and Mohammadi,

2014) which can “distinguish internal properties

of the person from overt behaviors” (Matthews

et al.,2003). Understanding the personalities of

the characters is essential for grasping the story’s

greater message. The Myers–Briggs Type Indicator

(MBTI) (Myers,1962) and the Big-5 Personality

are two of the most popular personality scales. We

used MBTI as the annotation criteria since despite

some validity controversy in self-report measure-

ment, research shows that a person’s friend can

accurately judge his/her MBTI personality (Cohen

et al.,1981). In our narrative comprehension sce-

nario, a ﬁctional character’s MBTI personality is

judged by other human raters in an online com-

munity, which is quite similar to

the third-person

evaluation

scenario, and should yield a reasonable

validity. We also conducted our study on Big-5 and

reported the results in Appendix 6.

MBTI assess the psychological preferences in

how people perceive the world and make decisions

in four dimensions: E/I: extravert (E) is seen as

being generally active and objective while the in-

travert (I) is seen as generally passive and subjec-

tive (Sipps and Alexander,1987). S/N: sensing (S)

is seen as attending to sensory stimuli; intuition (N)

describes a more detached, insightful analysis of

events and stimuli (Boyle,1995). T/F: thinking (T)

involves logical reasoning and decision making;

feeling (F) involves a more subjective and interper-

sonal approach (Thomas,1983). J/P: judging (J)

attitude is associated with prompt decision making;

perception (P) involves greater patience and wait-

ing for more information before making a decision.

An individual’s MBTI type has a label based on her

dominant preference for each dimension. In Figure

1, Morpheus is an extraversion person, understand-

ing the world with intuition, dealing with things

with feeling, and organize the world around him by

judging. Together gives an ENFJ type.

4Story2Personality Dataset

We constructed our dataset in three stages: extract-

ing movie scripts from the Internet Movie Script

Database (IMSDB

), parsing the collected movie

scripts into dialogue and scene sections, matching

characters’ personality types from The Personality

Database(PDB3) with their dialogues and scenes.

4.1 Movie Scripts Collection

We collected HTML ﬁles from IMSDB combined

with movie scripts in NarrativeQA (Koˇ

cisk

y et al.,

2018). After removing corrupted or empty ﬁles, we

got 1,464 usable scripts.

4.2 Our Statistical Movie Script Parser

As shown in Figure 1, a movie script usually

has four basic format elements (Riley,2009):

Scene Headings

, one line description of each

scene’s type, location, and time (i.e.,

INT. ROOM

1313

);

Scene description

, the description of the

2https://imsdb.com/

3https://www.personality-database.com/

actions of the characters (i.e., text in blue);

Dia-

logues

, names of characters and actual words they

speak (i.e., text in red);

Transitions

, instructions

for linking scenes together (i.e., FADE IN ON).

In order to extract dialogues and scene descrip-

tions in a structured form, we ﬁrst split the scripts

to sections, i.e., text chunks between two adjacent

bolded chunks which are scene headings or char-

acter names and stored the bolded texts as section

titles. Then we designed a statistical method to

classify the section types:

Rule-Based Pre-Processing

We start with a rule

to classify the sections into dialogues and scenes.

As Figure 1shows, a common format of movie

scripts is to align the shot headings, transitions

and scene descriptions vertically, and uses a larger

indentat for dialogues. So, the indent size can be

used to identify dialogues. Since the indentat size

may vary across different scripts. Our rule assumes

the sections as dialogues if they have larger indent

compared to

FADE IN

in the same script and the

others as scenes.

Silver Parses Construction

The rule-based pre-

processing introduces many noises. We then de-

signed a statistical method to automatically deter-

mine the threshold indent of dialogues. First, we

compute the averaged ratio

of dialogues in a

script and its standard variation

. Second, we keep

adding sections with the largest indent sizes to the

set of dialogues, until the ratio of added sections

becomes larger than

µ+σ

. Finally, we keep the

left sections as scenes. If none of the indentation

size can reach the ratio of dialogues in the range of

µ±σ

, the movie script was seen as a failure case.

We designated the successfully processed scripts

with the dialogues/scene labels as the “silver” set

which consists of 29% of the scripts.

Section Classiﬁer

For the failure scripts from the

previous step and the scripts without

FADE IN

markers, we trained a BERT-based section classi-

ﬁer using 137,042 labeled sections from the silver

set to label them. The classiﬁer achieved 99.31%

accuracy on a held out validation set. The outputs

are our ﬁnal parses.

4.3 Personality Collection and Mapping

We collect human rated MBTI types from PDB.

Movie scripts are the blueprint for the actor’s per-

formance. An actor’s body language, dialogue,

and contexts are all described in the scripts (Jhala,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MBTIPersonalityPredictionforFictionalCharactersUsingMovieScriptsYisiSang1XiangyangMou2MoYu3DakuoWang4JingLi5JeffreyStanton11SyracuseUniversity2RensselaerPolytechnicInstitute3PatternRecognitionCenter,WeChatAI4IBMResearch,NortheasternUniversity5NewJerseyInstituteofTechnologyyisang@syr.edumoux4@rpi....

展开>> 收起<<

MBTI Personality Prediction for Fictional Characters Using Movie Scripts Yisi Sang1Xiangyang Mou2Mo Yu3Dakuo Wang4Jing Li5Jeffrey Stanton1.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

MBTI Personality Prediction for Fictional Characters Using Movie Scripts Yisi Sang1Xiangyang Mou2Mo Yu3Dakuo Wang4Jing Li5Jeffrey Stanton1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: