
SINUSOIDAL FREQUENCY ESTIMATION BY GRADIENT DESCENT
Ben Hayes, Charalampos Saitis, Gy¨
orgy Fazekas
Centre for Digital Music, Queen Mary University of London
ABSTRACT
Sinusoidal parameter estimation is a fundamental task in appli-
cations from spectral analysis to time-series forecasting. Esti-
mating the sinusoidal frequency parameter by gradient descent
is, however, often impossible as the error function is non-convex
and densely populated with local minima. The growing family of
differentiable signal processing methods has therefore been un-
able to tune the frequency of oscillatory components, preventing
their use in a broad range of applications. This work presents a
technique for joint sinusoidal frequency and amplitude estimation
using the Wirtinger derivatives of a complex exponential surro-
gate and any first order gradient-based optimizer, enabling end-
to-end training of neural network controllers for unconstrained
sinusoidal models.
Index Terms—differentiable signal processing, machine
learning, sinusoidal parameter estimation
1. INTRODUCTION
Estimating sinusoidal parameters from a signal is a crucial step
in numerous signal processing algorithms, and a wealth of tech-
niques have been proposed in both the single and multiple sinu-
soid formulations. Most seek the maximum likelihood (ML) es-
timate of sinusoidal model parameters in the presence of white
Gaussian noise, the statistical properties of which are well estab-
lished [1].
Estimators of sinusoidal frequency must circumvent the non-
linearity of the model and non-convexity of the corresponding ob-
jective as a function of the frequency parameter. The most com-
mon approach is thus to apply a multi-stage algorithm in which
an initial frequency estimate is obtained through search heuris-
tics [2, 3, 4], spectral peak interpolation [5], discrete-time Fourier
transform (DTFT) decorrelation [6], or other procedures [7, 8, 9],
and then refined using an optimization method. Alternate ap-
proaches include iteratively updating a model-based relaxation of
the problem [10, 8], linearizing the problem using delay operators
[11, 12], or defining a surrogate model where an equivalence can
be drawn between solutions [13].
Such methods achieve accurate estimates but are unsuitable for
use in the context of end-to-end models fit by gradient descent,
where integrating derivative-free operations or complex heuris-
tics is challenging and often unstable. In particular, the recent
Ben Hayes is supported by UK Research and Innovation [grant num-
ber EP/S022694/1]. This research utilised Queen Mary’s Apocrita HPC fa-
cility, supported by QMUL Research-IT. http://doi.org/10.5281/
zenodo.438045
proliferation of models applying differentiable digital signal pro-
cessing (DDSP) [14] – a family of techniques which allow neural
networks to directly control digital signal processors – highlights
the need for a method for sinusoidal frequency estimation by gra-
dient descent.
Applications of DDSP have included providing high level con-
trols for harmonic-plus-noise synthesizers [14], controlling digi-
tal synthesis methods with neural networks [15, 16], modelling
[17] and controlling [18] audio effects and direct filter design
[19]. Yet, despite success at these complex tasks, DDSP-based
models have so far been unable to predict sinusoidal frequency
parameters. Aspects of the problem have been acknowledged in
the literature. Turian & Henry [20] showed that frequency domain
distances lack a stable and informative frequency gradient, whilst
Engel et al. [21] used a parameter regression pretraining scheme
to circumvent issues with local minima when optimizing sinu-
soidal frequencies. Caspe et al. [16] similarly note that gradient
descent fails to tune the modulation frequencies of a differentiable
FM synthesizer due to ripple in the error function.
In this work, we propose a simple surrogate to the sinusoidal
oscillator with gradients that allow first-order gradient based op-
timization. With this approach, we take a first step towards end-
to-end learning of neural network controllers for a broader family
of differentiable audio synthesizers and signal processors.
2. SINUSOIDAL FREQUENCY ESTIMATION
We are concerned with modelling the class of discrete-time sig-
nals that can be expressed as:
xn=vn+X
k∈K
αkcos (ωkn+φk),(1)
where vn∼ N 0, σ2, and αk, ωk, φkare the amplitude, fre-
quency, and phase parameters, respectively, of unordered sinu-
soidal components with index set K⊆N. Following the stan-
dard ML derivations, finding estimates ˆαk,ˆωk,ˆ
φkis equivalent
to minimizing the mean squared error of the model. In many ap-
plications of machine learning to audio, we are concerned with
other formulations of the error. These can be accounted for by
expressing the likelihood in terms of other signal representations,
such as the discrete Fourier transform (DFT).
It is well established that when ωkand φkare known, this
problem is linear in αk[22] – a property which, for example, al-
lows DDSP models to directly predict harmonic amplitudes [14].
When φis unknown, an optimal estimate can be found by evaluat-
ing the DTFT at the known frequencies ωk. In the case where fre-
quency is unknown, however, the optimization problem is more
arXiv:2210.14476v2 [eess.SP] 18 Nov 2022