only begun to crop up in recent years, in recognition of
the significant domain gap between infant and adults from
the point of view of computer vision representations. In this
paper, we employ an infant face landmark estimation model
from [20] together with a infant body landmark estimation
model from [8], both of which employ domain adaptation
techniques to tune existing adult-focused models to the infant
domain. We work with a subset of the InfAnFace dataset
from [20], and compare the values of our six geometric
symmetry measures derived from both predicted and ground
truth landmarks. We propose modifications of these measures
from their original definitions in the literature to enable
compatibility with the landmarks used in pose estimation
techniques.
Our findings show that predictions derived from the recent
infant-domain pose models exhibit “strong” or “very strong”
Spearman’s ρranked correlation with the ground truth values
on a precisely labeled test set of infant faces in the wild, with
best performance on the gaze angle (ga) between the line
connecting the outer corners of the eyes and the midsternal
plumb line, and on the habitual head deviation (hhd), the
angle between the eye line and the acromion process (shoul-
der) line. Predictions of three other measurements (including
non-angles) were strong, but we found only moderate success
in the predictions of the orbit slopes angle (osa), the angle
between the lines connecting the outer and inner corners of
the eyes—this being arguably the most subtle metric. Based
on this and our further analysis involving more performance
metrics in Section V, we conclude that computer vision
infant pose estimation techniques can successfully measure
a range of quantities pertaining to torticollis.
II. BACKGROUND: TORTICOLLIS AND INFANT
DEVELOPMENT
A. Quantifying torticollis
While there is a large corpus of research and established
methodology in the diagnosis and treatment of torticollis, it
is largely based on in-person physical assessments by experts
and follow-ups with imaging or other deeper techniques [11].
By contrast, our work is inspired by a smaller cluster of
papers dealing explicitly with measuring signs and symptoms
of torticollis geometrically from still images.
We start with congenital muscular torticollis (CMT). The
author in [14] studied the effectiveness of a specific ther-
apeutic intervention for CMT by comparing changes in
an infant’s “habitual head deviation” (also “head tilt”) as
measured by hand from still photographs. The same author
later studied the reliability of this photograph-based method
itself, in [15]. Separately, [3] measured the “gaze angle”
and “transformational deformity” of child subjects, again
from still photographs, to gauge improvement in response
to surgical intervention. Measurements from photographs
offer researchers a repeatable, objective way to quantify the
change in severity of torticollis after an intervention.
Sometimes torticollis is not congenital (present from birth)
but rather acquired, as is the case with ocular torticollis,
where the abnormal head posture is adopted to compensate
for a defect in vision. In such cases, diagnoses often occur
only in adulthood, and can be informed by examination of
head pose in childhood photographs. Correspondingly, oph-
thalmologists have also studied the quantification of facial
asymmetry via geometric measurements from still images.
An overview of such methods and quantities is given in [1],
who in turn cite [6] for definitions of facial measurements
such as the “orbit slopes angle,” “relative face size,” and
“facial bulk mass,” and [22] for definitions of the “facial
angle” and the “nasal tip deviation.” These measurements are
studied not as a means to quantify the effect of interventions,
but as part of a more comprehensive study on the differential
diagnosis of ocular torticollis and other conditions related to
facial asymmetries, especially superior oblique palsy [1].
B. Computer vision for infant health and development
We are not aware of prior computer vision research
intended to detect torticollis or gauge head asymmetry in
infants. The closest in spirit might be a pair of related papers,
[19], [24], in which researchers employ computer vision
to analyze head posture and tremor with a view towards
algorithmic understanding of cervical dystonia (also known
as spasmodic torticollis), with incidence largely in the adult
population. Accordingly, those studies can take advantage
of far more mature adult-focused head pose estimation
techniques like OpenFace 2.0 [2], whereas our efforts are
highly constrained by data scarcity in the infant domain. In
the infant domain, there is prior work on bodies but not
faces: [4] develops an infant-specific body pose estimation
deep network to extract body motion information from infant
videos, in an attempt to assess infant neuromotor risk; and
[10] uses 3D infant pose estimation techniques to assess
infant body symmetry, with a view towards applications in
infant development and torticollis, but without a specific im-
plementation of such. As mentioned, all of this work comes
in the context of recent attention in computer vision to the
small data domain problem of infant pose comprehension,
for both faces [20] and bodies [7], [8], [9], [13].
III. CONCEPTS: MEASUREMENTS OF
SYMMETRY
From the sources cited in Section II-A, we identified all
clearly defined geometric symmetry measures used by re-
searchers in the study of torticollis and facial asymmetries—
six in all. We altered the definitions to base them explicitly
on the 68 facial landmark coordinates and two body joint
(shoulder) coordinates used by our pose estimators, as illus-
trated and enumerated in Fig. 2, which also enables more
consistent comparisons. The final measures are defined in
Table I and illustrated in Fig. 1. In the rest of this section,
we clarify and discuss these definitions.
A. Assumptions and context
Since we work with both ground truth and estimated land-
mark coordinates in two dimensions, we generally assume
that every infant is facing forward into the camera, so that the
infant’s face plane is roughly parallel with the camera image
plane. In principle, all of the measurements we consider are