
RIGID-BODY SOUND SYNTHESIS WITH DIFFERENTIABLE MODAL RESONATORS
Rodrigo Diaz, Ben Hayes, Charalampos Saitis, Gy¨
orgy Fazekas, Mark Sandler
Centre for Digital Music, Queen Mary University of London
ABSTRACT
Physical models of rigid bodies are used for sound synthesis
in applications from virtual environments to music produc-
tion. Traditional methods such as modal synthesis often rely
on computationally expensive numerical solvers, while re-
cent deep learning approaches are limited by post-processing
of their results. In this work we present a novel end-to-end
framework for training a deep neural network to generate
modal resonators for a given 2D shape and material, using a
bank of differentiable IIR filters. We demonstrate our method
on a dataset of synthetic objects, but train our model using
an audio-domain objective, paving the way for physically-
informed synthesisers to be learned directly from recordings
of real-world objects.
Index Terms—differentiable signal processing, machine
learning, sound synthesis, physical modelling
1. INTRODUCTION
The synthesis of contact sounds from rigid bodies and mate-
rials has been of continual interest in applications from music
production, through sound design, to the rendering of object
sounds in virtual environments [1, 2]. For this reason, the prob-
lem has been extensively studied, and physically-based numer-
ical methods are often selected for its solution. However, such
approaches typically incur significant computational cost, as
well as storage cost for cached solutions if real-time interac-
tion is required. These limitations inhibit flexibility in such
applications, where adapting to new object shapes and materi-
als requires prohibitively time-consuming computation of the
new solution.
Of these numerical approaches, which ultimately rely on
solving the wave equation, the finite difference time domain
and the finite element method (FEM) are most commonly used
due to their adaptability. FEM solvers are used to precompute
the vibrational modes of arbitrarily shaped rigid objects, allow-
ing contact sounds for these objects to be synthesized using
modal synthesis.
The computation of the modes is usually posed as a general-
ized eigenvalue problem using the mass and stiffness matrices
of an object. Using the solution, sound can be rendered with
an oscillator bank by projecting an impulse onto the modes at
Rodrigo Diaz and Ben Hayes are supported by UK Research and Inno-
vation [grant number EP/S022694/1].
each discrete location within the object. This approach to com-
puting an object’s modes can, however, be computationally ex-
pensive. Moreover, every time the shape or material character-
istics of the object change, the system must be solved again.
Thus, numerous approaches have been proposed to accelerate
this process [3, 4, 5].
1.1. Data-driven methods
A recent generative method, proposed by Traer et al. [6], used
a perceptually derived statistical model to approximate object
impulse responses by approximating the modes within some
degree of perceptual tolerance. While this method circumvents
the need for a numerical solver and uses filters to account for
different object interactions, it relies on abstract intermediate
representations of an object’s physical characteristics, and does
not take into account object geometry.
Jin et al. [7, 8, 9] trained a deep neural network to pro-
duce an object’s modes, circumventing the need for a FEM
solver. Their model predicted eigenvalues and eigenvectors
from sparsely voxelized objects, using as supervision the cor-
responding modes obtained from the solver. Additionally, in
order to generalize to different sizes and materials, the authors
suggest the adaptive scaling of the modes based on material pa-
rameters (except for Poisson’s ratio) and the size of the object.
Similarly, our method uses a discretized shape representation
as input to a neural network. However, unlike Jin et al.’s model,
no post-processing is required to account for different mate-
rial parameters. Further, our model is not limited to querying
modes at discrete positions in the shape, allowing for arbitrary
co-ordinate input.
1.2. Differentiable resonators
While it is possible to synthesize modes using an oscillator
bank where each damped sinusoid corresponds to an impulsed
mode, resonant filter banks are commonly used for synthesis
due to their flexibility, as they can be excited with a variety of
signals [10]. Realising such a filter bank, however, requires
the corresponding eigenvalue and eigenvector problems to be
solved in order to derive the filter coefficients. In this work, we
propose an end-to-end learning method for generating resonant
filter banks without explicitly solving the system.
Our method relies on the use of differentiable infinite im-
pulse response (IIR) filters, which allow a neural network to
arXiv:2210.15306v2 [cs.SD] 28 Oct 2022