4HAMIT BASGOL1, INCI AYHAN2, EMRE UGUR3
tion to event segmentation capability, our model could form
event representations.
1.3 Event representations
Representations are mental objects with semantic proper-
ties (Pitt, 2020). To express the strength of the relationships,
a representational space can be formed by taking the pairwise
distance between all representations (Shepard, 1980, 1987;
Shepard & Arabie, 1979). This two-way relationship makes
the similarity a valuable metric to reveal how the system or-
ganizes knowledge as representations form the basis of cat-
egorization and generalization. One aim of the artificial in-
telligence is to learn valuable and representative information
from the data (Bengio, Courville, & Vincent, 2014). Multi-
layer perceptrons (i.e., deep neural networks) can learn dis-
tributed and semantically meaningful representations (Ben-
gio et al., 2014; Urban & Gates, 2021). The similarity be-
tween representations (i.e., semantic relationships between
represented entities) of a deep learning model can be found
by the Euclidean distance or cosine similarity. For exam-
ple, the semantic relationship between words and sentences
(Mikolov, Sutskever, Chen, Corrado, & Dean, 2013; Rogers
& McClelland, 2005), objects (Deselaers & Ferrari, 2011),
scenes (Eslami et al., 2018), and episodes (Rothfuss, Fer-
reira, Aksoy, Zhou, & Asfour, 2018) can be captured with
the help of representations learned by a deep learning system.
Since representations give researchers a gist about how hu-
mans organize knowledge, generalize between instances, and
make analogical transfers (Blough, 2001; Nosofsky, 1992;
Shepard, 1980, 1987; Tversky, 1977), they have a fundamen-
tal place in cognitive science. As could be expected, re-
searchers exploited human similarity judgments to achieve
human mental representations (Shepard, 1980, 1987; Shep-
ard & Arabie, 1979). The role of representations and similar-
ity judgments in artificial intelligence and cognitive science
suggest that they might provide a basis for comparing people
and machines. In fact, recent research provides excellent ex-
amples of this comparison (Hebart, Zheng, Pereira, & Baker,
2020; Peterson, Abbott, & Griffiths, 2018).
Event representation literature is very rich and represents
a diverse set of studies (Blom, Feuerriegel, Johnson, Bode,
& Hogendoorn, 2020; Day & Bartels, 2008; Fivush, Kuebli,
& Clubb, 1992; Kominsky, Baker, Keil, & Strickland, 2021;
Schütz-Bosbach & Prinz, 2007; Sheldon & El-Asmar, 2018;
Wang, Cherkassky, & Just, 2017). In the context of computa-
tional modeling, recent studies use (Shen, Fu, Deng, & Ino,
2020) and learn (Dias & Dimiccoli, 2018) event representa-
tions. In contrast, despite the interest received by event rep-
resentations, event similarity judgments is a concealed area
under the action similarity judgments (Tarhan, de Freitas,
Alvarez, & Konkle, 2020; Tarhan & Konkle, 2018). In our
work, utilizing this possibility, we compare the event repre-
sentations of our computational model and participants by
exploiting event similarity judgments.
1.4 Our contribution
In this study, inspired from the EST (Zacks et al., 2007),
the predictive processing (Clark, 2013; Wiese & Metzinger,
2017), and the Gumbsch’s robotic model (Gumbsch et al.,
2016, 2017), we developed a novel computational model for
event segmentation. Our model consists of multi-layer per-
ceptrons (i.e., event models) that are managed by a cognitive
mechanism and consequently, determining the event bound-
aries. As our contribution to the literature, (1) our model
is capable of learning to represent and predict multi-modal
event segments with sensory associations in passive observa-
tion unlike the models developed by (Gumbsch et al., 2016,
2017) which segment unimodal events based on actions in
a simulation environment. (2) With the help of a parame-
ter, changing sensitivies of event models to prediction error
signals, our model can also segment events in varying gran-
ularities, which was not addressed by Reynolds et al. (2007)
and Metcalf and Leake (2017). (3) Moreover, segmentation
and representation capabilities of our model were tested by
ground-truth data received from psychological experiments.
A multi-layer perceptron is a plain deep neural network
which consists of an input layer, an intermediate (hidden)
layer or layers, and an output layer. The network learns
the relationship between inputs and outputs by updating
weights in each iteration. Thanks to the hidden units, multi-
layer perceptrons can classify complex patterns (Lippmann,
1989), approximating non-linear functions (Hornik, Stinch-
combe, & White, 1989). Moreover, the knowledge devel-
oped throughout the training is stored in weights and what is
learned by the model can be explored by analyzing the rep-
resentations of the network (Fleming & Storrs, 2019; Hebart