Exploring Interactions and Regulations in Collaborative Learning An Interdisciplinary Multimodal Dataset

2025-04-24 0 0 7.33MB 17 页 10玖币
侵权投诉
Exploring Interactions and Regulations in Collaborative Learning: An
Interdisciplinary Multimodal Dataset
YANTE LI, YANG LIU, KHÁNH NGUYEN, HENGLIN SHI, EIJA VUORENMAA, SANNA JARVELA,
and GUOYING ZHAO, University of Oulu, Finland
Collaborative learning is an educational approach that enhances learning through shared goals and working together. Interaction and
regulation are two essential factors related to the success of collaborative learning. Since the information from various modalities can
reect the quality of collaboration, a new multimodal dataset with cognitive and emotional triggers is introduced in this paper to
explore how regulations aect interactions during the collaborative process. Specically, a learning task with intentional interventions
is designed and assigned to high school students aged 15 years old (N=81) in average. Multimodal signals, including video, Kinect, audio,
and physiological data, are collected and exploited to study regulations in collaborative learning in terms of individual-participant-
single-modality, individual-participant-multiple-modality, and multiple-participant-multiple-modality. Analysis of annotated emotions,
body gestures, and their interactions indicates that our multimodal dataset with designed treatments could eectively examine
moments of regulation in collaborative learning. In addition, preliminary experiments based on baseline models suggest that the
dataset provides a challenging in-the-wild scenario, which could further contribute to the elds of education and aective computing.
Additional Key Words and Phrases: multimodal dataset, Collaborative learning, Facial expression, Gesture, Physiological signal
ACM Reference Format:
Yante Li, Yang Liu, Khánh Nguyen, Henglin Shi, Eija Vuorenmaa, Sanna Jarvela, and Guoying Zhao. 2022. Exploring Interactions and
Regulations in Collaborative Learning: An Interdisciplinary Multimodal Dataset. 1, 1 (October 2022), 17 pages. https://doi.org/10.1145/
nnnnnnn.nnnnnnn
1 INTRODUCTION
Collaborative learning is a social system in which groups of learners solve problems or construct knowledge by
working together [
4
]. Recent ndings demonstrate that collaborative learning can promote higher-level thinking, oral
communication, leadership skills, student-faculty interaction, and student responsibility [
40
]. Although many factors
can aect collaborative learning, social interaction has been considered one of the most important [
31
,
38
]. To succeed
in collaboration, learners should actively exchange their ideas, experience, resources, skills, and feelings within a team
[
35
,
36
]. According to the research on promises of interactivity [
36
], interactions enable collaborators to learn and
encourage them to be focused, participative and dedicated to interchange ideas with each other. To this end, studying
and promoting the interactions in a collaborative setting will provide valuable insight into the quality of collaboration
and be signicant and helpful in various elds, especially education research [7].
There is a growing interest in studying interactions in a collaborative learning context by utilizing emotional
and physiological measures in recent years [
3
,
4
]. Thanks to the development of the hardware and AI technologies
Authors’ address: Yante Li, yante.li@oulu.; Yang Liu, yang.liu@oulu.; Khánh Nguyen, Andy.Nguyen@oulu.; Henglin Shi, henglin.shi@oulu.; Eija
Vuorenmaa, eija.vuorenmaa@oulu.; Sanna Jarvela, sanna.jarvela@oulu.; Guoying Zhao, guoying.zhao@oulu., University of Oulu, Oulu, Finland.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request permissions from permissions@acm.org.
©2022 Association for Computing Machinery.
Manuscript submitted to ACM
Manuscript submitted to ACM 1
arXiv:2210.05419v1 [cs.CV] 11 Oct 2022
2
(a) (b)
Fig. 1. The data collection setup. (a) An illustration of the seating plan and the location of the devices; (b) The real environment of
data collection.
[
14
16
,
20
,
21
], it is convenient to unobtrusively and automatically capture the physiological and visual signals of team
members, which makes it possible to study the correspondence between multimodal signals for observing interactive
processes during collaborative learning.
A variety of research has studied emotional interactions among learners [
3
,
17
]. The results have revealed that
positive emotional interactions are related to better collaboration. However, previous studies only focus on the facial
expressions and the physiological signals in interactions independently. Many studies illustrate that the body gesture
can also provide emotional clues [
18
,
22
], and body gestures in interactions can contribute to solving cooperation
and collaborative problem [
5
]. In this paper, we design a collaborative learning task and collect a multimodel dataset,
including video, Kinect video, audio, and physiological data, to analyse collaborative learning interactions, as shown in
Fig. 1.
Another critical factor for successful learning is regulation [
9
]. Socially shared regulation can promote productive
collaborative learning. Thus, we introduce regulation processes in collaborative learning by designing interventions,
i.e., the cognitive trigger and the emotional trigger in our task setting. Through analyzing dierent features such
as emotional reection during triggers, researchers can inspect whether and how those external events inuence
interactions of group members.
In this paper, a new dataset is collected in terms of multiple modalities to comprehensively explore the regulation of
learning and the aroused emotional interactions in collaborative learning. As far as we know, this is the rst multimodal
dataset for studying regulation in collaborative learning with regulatory triggers. The main contributions are as follows:
We design a collaborative learning task with two kinds of triggers, i.e., cognitive and emotional triggers, to study
the regulation inference in collaborative learning.
We collect a multimodal dataset consisting of video, Kinect, audio, and physiological data to analyse collaborative
learning interctions.
Statistical analysis and baseline experiments demonstrate that the regulation signicantly impacts interactions
during the collaborative process.
Manuscript submitted to ACM
Exploring Interactions and Regulations in Collaborative Learning: An Interdisciplinary Multimodal Dataset 3
2 PRIOR WORK
2.1 Collaborative learning
Collaborative learning is an educational approach for enhanced learning [
7
] where two or more learners work together
to solve problems, complete tasks, or learn new concepts. Learners work as a group rather than individually to obtain a
complete understanding by interchanging their ideas, processing and synthesizing information instead of using rote
memorization of texts [
35
]. According to recent studies, collaborative learning can boost higher-level thinking, oral
communications, leadership skills, student-faculty interactions, self-esteem and responsibility of students [13].
Various interactions emerging in the collaborative learning process are essential features of eective learning. Many
researchers have studied emotional interactions in collaborative learning. Webb et al. [
42
] found that positive emotional
interactions, like support and respect, are related to better collaboration. Dindar et al. [
3
] revealed that video-based
facial emotion recognition helped explain social and aective dynamics in collaborative learning research. Besides
emotions, physiological signals also serve crucial functions in studying collaborative processes [
4
]. Several studies have
investigated the interaction in collaborative learning by identifying the synchronizing extent of physiological signals
[
3
]. Additional work veried that gestures served a variety of signalling functions in collaborative problem-solving
communication and had a diagnostic role for team members [5, 34].
Due to the contribution of dierent modalities, existing research has focused on studying the collaborative learning
process by exploring multimodal signals [
25
,
33
]. Nguyen et al. [
25
] developed a deep learning model to automatically
detect interaction types for regulations in collaborative learning by analyzing electrodermal activities (EDA), video,
and audio data. Reilly et al. [
33
] studied the Kinect and speech data and demonstrated how specic movements and
gestures positively correlate with collaboration and learning gains. Unlike previous methods introducing only two or
three modalities, in this paper, we collected a multimodal dataset including facial video, audio, gesture, EDA, heart rate
(HR), and accelerometer to explore the collaborative process comprehensively.
2.2 Regulation in collaborative learning
Recent research has highlighted the importance of co-regulation and socially shared regulation of learning to the
group’s collaborative learning success [
23
]. While self-regulated learning depicts the individual process of monitoring,
reecting, and correcting one’s emotion, motivation, and cognition towards attaining learning goals, co-regulation
and socially shared regulation refer to this process in collaborative learning at the group level [
47
]. Co-regulation of
learning relates to the co-operation of regulation in which self-regulated learning occurs with support from another
learner. However, socially shared regulation of learning involves learners in a group interdependently regulating the
group’s collaborative learning process and jointly regulating individual learning processes through social interactions
[
6
]. Therefore, besides multiple modalities, we also study regulation in collaborative learning tasks by introducing
designed interventions.
2.3 Relevant datasets
There are currently various multimodal datasets used for studying emotion and gesture in collaborative learning. CMU
Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) dataset [
43
] contains more than 23,500 sentence
utterance videos from online YouTube speakers. It has three modalities: language, visual, and acoustic, annotated
for six basic emotions and ve sentiments. The EmoReact Dataset [
27
] is a multimodal emotion dataset of children
which contains 1102 audio-visual clips of 17 dierent emotional states: six basic emotions, neutral, valence, and nine
Manuscript submitted to ACM
4
complex emotions, including uncertainty, curiosity, and frustration. Moreover, the Persuasive Opinion Multimedia
(POM) corpus [
29
] consists of 1,000 movie review videos obtained from a social multimedia website. This dataset
includes three modalities, video, text, and acoustic, and annotates for multiple speaker traits. Although the above
multimodal datasets are compatible for analyzing emotions in collaborative learning, they only consider individuals
instead of interactions among multiple members in a group. Zhang et al. [
45
] proposed a dataset studying the social
relation between two or more people in one image or video. Alternatively, Kosti et al. [
12
] presented the EMOTIC
dataset that involved scene context in addition to facial expression and body pose for extra information on emotion
perception. However, these datasets only consider the visual modality and aim to study social or semantic relations.
By contrast, our work establishes a multimodal dataset in collaborative learning scenario. It explores the interactions
among group members and introduces regulations by designed interventions, which provides an interdisciplinary
platform for studying emotion regulations and their impacts in collaborative learning.
3 DATASET COLLECTION
To systematically and comprehensively study the process of collaborative learning, we collect a multimodal dataset that
contains facial videos, audio, physiological signals (including EDA, heart rate, and accelerometer), and Kinect data. This
multimodal dataset is valuable for exploring:
Whether dierent modalities have underlying correlations in the collaborative process.
Whether the fusion of various data sources could facilitate the task of collaborative learning.
Whether the regulation has impacts on multiple modalities during the collaborative process.
Details of the data collection and annotation are explained in the following subsections.
3.1 Equipment setup and data synchronization
Our data recording was held in a laboratory studio, and the setup is shown in Fig. 1 (a) and (b). Specically, three
participants sit in front of laptops. Two-meter COVID social distance was kept between participants during data
collection.
A360
camera (Insta360 Pro) that contains six camera spots and a microphone was placed in the center. The six
cameras are hardware synchronized, and the grabbed frames from the six channels are used for building the whole
environment in 360 degrees. During the collection, each participant was facing one camera directly. This way, we could
have a compact frontal face for every participant, as shown in Fig. 2. Resolutions of individual video and reconstructed
video are 3840 x 2160 and 1920 x 960, respectively, with an average recording rate of 30 fps. Furthermore, a surveillance
camera was applied to monitor and recall. Three individual microphones were employed to record the audio data.
Two Kinect cameras (Azure Kinect DK) were utilized to capture the gesture of the three participants. The two devices
were denoted as ’Master Kinect’ and ’Salve Kinect’ and synchronized by a cable automatically. Its average fps was
around 30. Five sensor streams are aggregated in the Kinect camera, including a depth camera, a color camera, an
infrared camera, IMU (Inertial Measurement Unit) and microphones. The Azure Kinect Viewer can visualize all the
streams, as shown in Fig. 3 (a).
Physiological data, including EDA, HR, and accelerometer, were captured by Physiological sensors (Shimmer GSR3+)
as shown in Fig. 3 (b). All the signals were collected at the sampling rate of 128 Hz, which could be used to reveal new
insights into the emotional and cognitive processes in collaborative learning regulation [4].
Manuscript submitted to ACM
摘要:

ExploringInteractionsandRegulationsinCollaborativeLearning:AnInterdisciplinaryMultimodalDatasetYANTELI,YANGLIU,KHÁNHNGUYEN,HENGLINSHI,EIJAVUORENMAA,SANNAJARVELA,andGUOYINGZHAO,UniversityofOulu,FinlandCollaborativelearningisaneducationalapproachthatenhanceslearningthroughsharedgoalsandworkingtogether...

展开>> 收起<<
Exploring Interactions and Regulations in Collaborative Learning An Interdisciplinary Multimodal Dataset.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:7.33MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注