
interactive visualization tool extracted from and made available
through open source software.
2 RELATED WORK AND BACKGROUND
Open Source Audio and Visual Feature Extraction Tools.
The
system to measure the nature and intensity of vocal and facial ex-
pressions is advancing from manual raters to computerized toolk-
its [23, 42]. These audio and visual toolkits are made possible by
leveraging advancements in machine learning and artificial intelli-
gence techniques, such as natural language processing and computer
vision [10, 15, 30, 32]. A growing number of open source software
projects are starting to make vocal and facial feature extraction
toolkits freely available online. For vocal feature extractions, Parsel-
mouth [29], Natural Language Toolkit [34], LexicalRichness [44],
and VaderSentiment [56] have been cited for calculating a whole
host of speech and acoustic DBM variables. For facial feature ex-
tractions, OpenFace is a commonly cited behavior analysis toolkit
for detecting and measuring facial landmarks, facial action units,
head pose estimation, and gaze estimation [3, 4, 20]. Collectively,
these software toolkits provide a rich and diverse suite of extracted
features for a more comprehensive analysis of emotional communi-
cation behavior over time. However, none of these projects provide
visualization tools that can aid data interpretation. The project pre-
sented in this paper uses the OpenDBM solution, integrating all of
the previously mentioned vocal and facial toolkits for generating a
visualization of these extracted audiovisual features collectively.
Visualization in Healthcare.
Visualization techniques in health-
care informatics often target cohort data exploration, covering ap-
plications in disease evolution from electronic medical records
[28, 50, 51, 54], heterogeneous longitudinal clinical data [19, 27], or
volumetric patient data [25,52]. However, prior work in healthcare
visual analytics mostly focus on chronic conditions such as can-
cer [9], stroke [33], diabetes [17], or on infectious disease control due
to the COVID-19 pandemic [5, 45], and less on psychiatric and neu-
rodegenerative disorders. Our work incorporates a new approach to
promote individual patient data exploration, while incorporating past
working approaches for cohort data exploration. There is work in
visual analytics for facial activity and head movement and separately,
for voice acoustics and speech measurements [13, 36, 46, 47, 55],
however, some of it doesn’t use video data, and none accounts for all
four measurement categories together. We aim to provide efficient
tools for psychiatric and neurodegenerative health studies using
heterogeneous, audiovisual, behavioral biomarker measurements
extracted during clinical assessments.
3 DESIGN PROCESS AND REQUIREMENTS
The design process followed an Activity-Centered-Design approach
[35]. Our team held remote meetings for nine weeks with five re-
search groups in DBM therapeutic areas, collectively representing
academia, clinics, and industry. While most collaborators were prin-
cipal investigators with faculty positions, conducting behavioral or
biomedical research, all of them were familiar with the OpenDBM
software. Throughout this process, the team iteratively gained in-
sight into user approaches to explore mappings between DBMs
and conditions and disorders of interest (e.g., major depression and
schizophrenia), gathered functional specifications for a DBM inter-
face, and prototyped and evaluated the interface. Due to the large
variety in patient behavior for these disorders, we gathered many
specific requirements. However, we focused on the following subset
of high-level requirements to serve all our collaborators and the open
source community:
R1:
Provide flexibility in showing details about any subset of
DBM variables available through the OpenDBM pipeline. For in-
stance, for early detection of Parkinson’s disease, head movement
measurements are of greater importance than other DBM, such as
voice acoustics. Adaptability to different workflows is an essential
factor in open source. Additionally, analyzing hundreds of vari-
ables can be very challenging, and sometimes researchers don’t
know where to start their analyses. Thus, having the means and the
freedom to choose what to explore visually is very important.
R2:
Support interactive visualizations for both raw and derived
data. Visualizing derived, mean variables is important for getting
an effective overview of the cohort data and context for individual
patients, while visualizing raw, temporal variables supports in-depth
analysis for individual patients. This is critical for checking the
quality of the data. For example, researchers might want to exclude
from their analyses videos where the audio or the patient’s face was
not captured.
R3:
Emphasize trends and outliers in DBM data. As an exam-
ple, patients should show negative emotions when talking about
unpleasant or uncomfortable subjects. Domain experts should easily
observe patterns between patients, which is helpful for further stud-
ies. Furthermore, highlighting correlations between biomarkers is
fundamental for better understanding these conditions.
4 VISUALIZATION DESIGN
The visual system is open source and can be operated through the
OpenDBM Github project
2
from the visualization interface folder.
It is not part of the DBM extraction pipeline, but serves as a comple-
mentary application that visualizes the output of the DBM extraction.
The interface has two interactive panels: the Cohort Panel and the In-
dividual Panel. These panels are composed of multiple coordinated
views that support brushing and linking operations.
4.1 Data Description
Vocal and facial expressions convey emotion and communication
behavior and are one of the most researched topics in psychology
and related disciplines; as a result, audiovisual DBMs extend from
these basic and applied science measurement tools [23]. When a
video is processed through OpenDBM, the several vocal and facial
feature extraction toolkits combine to present hundreds of unique
variable categories relevant to four different audiovisual DBM do-
mains: speech, acoustics, facial expression, and head movement.
Each audiovisual DBM domain provides two sets of quantitative
variables: raw, captured as a frame-by-frame time sequence mea-
surement, and derived, capturing summary statistics on the total
collection of frames. These raw and derived variables provide a
wide range of objective behavioral cues, such as transcription and
lexical richness for speech, jitter and shimmer for acoustics, eye
blink and facial tremor for head movement, and facial action units
and facial asymmetry for facial expressions. The proposed interface
uses these raw and derived variables to display relevant details and
statistics about video cohorts and individual videos using two panels:
the Cohort and the Invididual Panels. The official documentation
2
provides the full list of DBM variables extracted by OpenDBM.
4.2 Cohort Panel
The Cohort Panel (Fig. 2) has three main views and functions: pro-
vide a cohort overview based on a selected set of variables, observe
variable distributions, and find correlations between variables.
Two query subpanels are available for variable and video ID selec-
tion, with the variable query subpanel (Fig. 2.A) having three alterna-
tive components for each of the three main views (Fig. 2.B,D,E). In
the video ID query subpanel (Fig. 2.C), selected IDs are highlighted
in the other views while unselected videos can be hidden from the
rest. All views have accompanying print buttons to generate plot
images that can be used in further studies.
PCA View.
This view (Fig. 2.B.1) uses a scatterplot for a cohort
overview by arranging videos in 2D based on a selected set of
biomarker variables (R1, R2, R3). The axes correspond to the
2https://github.com/AiCure/open_dbm