monic series. The spatial accuracy increases with the num-
ber of harmonics being used. A first-order Ambisonic
signal set is four channels wide, third-order is sixteen
channels, fifth order is 36, and so forth. Each increase in
Ambisonic order adds spherical harmonics to the signal
set and increases the spatial accuracy of the representation
of the sound field. We use a shorthand notation to spec-
ify the signal set. For example 3H2V means third-order
horizontal, second-order vertical, with the set of spherical
harmonics according to the HV convention [2].
Once an Ambisonic signal set has been captured or
generated, appropriate speaker feeds are produced by a
decoder. Designing an optimal decoder, specifically the
low- and high-frequency matrices, for a given signal set
and loudspeaker array is the central topic of this paper.
Other aspects of decoder design have been covered in
earlier papers by the present authors [3].
2.1 Mixed-Order Ambisonics
A physical encoder (an Ambisonic microphone) needs
to have enough capsules covering the sphere to accurately
sample the spherical harmonics of the order it is intended to
capture. Conversely, a speaker array needs to have enough
loudspeakers covering the sphere to excite the spherical
harmonics for the maximum order it is intended to repro-
duce. That is not always the case, leading to arrays with
different densities of transducers in different directions.
The consequence is that the order that can be encoded or
decoded will change according to the direction.
For example, nine years ago, one of the present authors
published the design for a second-order ambisonic micro-
phone [
4
]. There have been four proprietary [
5
,
6
,
7
,
8
]
and one free and open-source implementation [
9
] of this
design. A compromise made was to use only eight cap-
sules. This simplifies calibration and allows the use of
widely-available eight-channel recorders.
While commonly referred to as a second-order micro-
phone, only eight of the nine spherical harmonic compo-
nents needed for the second-order signal set can be derived
from the capsule signals. The missing spherical harmonic
is degree 2 and order 0, which is called “R” in the Furse-
Malham convention. R is a “zonal” harmonic and varies
only with elevation. Eliminating this component coarsens
the description of the sound field at elevations other than
horizontal, making it a 2HV1 mixed-order encoder. As we
shall see, decoding this signal set with a decoder designed
for full second order is suboptimal.
Small speaker arrays with a limited number of speakers
in the vertical direction are another case in which the array
does not have uniform density of speakers and cannot
excite the spherical harmonics in all directions equally.
Physical restrictions in the placement of speakers can also
dictate that an array might not be capable of rendering the
same order in both the horizontal and vertical directions.
Such an array will need a mixed-order decoder.
3 Ambisonic Decoders
The task of the decoder is to create the best perceptual
impression possible that the sound field is being repro-
duced accurately, given the available resources. In practi-
cal terms, the following criteria are necessary[10]:
1. Constant amplitude gain for all source directions
2. Constant energy gain for all source directions
3.
At low-frequencies, correct reproduced wavefront
direction and velocity (Gerzon’s velocity-model lo-
calization vector, rV)
4.
At high-frequencies, maximum concentration of en-
ergy in the source direction (Gerzon’s energy-model
localization vector, rE)
5.
Matching high- and low-frequency perceived direc-
tions ( ˆ
rE=ˆ
rV)
Recent work shows that (4) is the most important [
11
]; it
is also the most difficult to get right. After that, (2) and
(5) are important, as it is thought that we use a majority
voting system to resolve conflicting directional cues [
10
].
Decoders that ignore (5) can be fatiguing due to conflict-
ing perceptual cues [
12
]. Note that to satisfy all of these
criteria we must use decoders that have different gain ma-
trices for high and low frequencies, so-called “two-band”
or “Vienna” decoders [13].
The ADT includes a full-featured decoder engine writ-
ten in the FAUST DSP specification language [
14
] that
implements dual-band decoding, near-field correction, and
level and time-of-arrival compensation. The ADT incor-
porates several design techniques that produce decoders
that perform well according to these criteria for partial-
coverage loudspeaker arrays, such as domes and stacked
rings, but assumes that within those limits the speakers are
(more or less) uniformly distributed. It also assumes that
the decoders produced by these techniques are optimal for
mixed-order signal sets.
2