
1.2 Other related work
Narasimhan et al. [2015] investigate when influence is PAC learnable. Basu et al. [2020b] use second
order influence functions and find they make better predictions than first order influence functions.
Cohen et al. [2020] use influence functions to detect adversarial examples. Kong et al. [2021] propose
an influence based re-labeling function that can relabel harmful examples to improve generalization
instead of just discarding them. Zhang and Zhang [2022] use Neural Tangent Kernels to understand
influence functions rigorously for highly overparametrized nets.
Pruthi et al. [2020] give another notion of influence by tracing the effect of data points on the loss
throughout gradient descent. Chen et al. [2020] define multi-stage influence functions to trace influ-
ence all the way back to pre-training to find which samples were most helpful during pre-training.
Basu et al. [2020a] find that influence functions are fragile, in the sense that the quality of influence
estimates depend on the architecture and training procedure. Alaa and Van Der Schaar [2020] use
higher order influence functions to characterize uncertainty in a jack-knife estimate. Teso et al. [2021]
introduce Cincer, which uses influence functions to identify suspicious pairs of examples for interactive
label cleaning. Rahaman et al. [2019] use harmonic analysis to decompose a neural network into a
piecewise linear Fourier series, thus finding that neural networks exhibit spectral bias.
Other instance based interpretability techniques include Representer Point Selection [Yeh et al.,2018],
Grad-Cos [Charpiat et al.,2019], Grad-dot [Hanawa et al.,2020], MMD-Critic [Kim et al.,2016], and
unconditional counter-factual explanations [Wachter et al.,2017].
Variants on influence functions have also been proposed, including those using Fisher kernels [Khanna
et al.,2019], tricks for faster and more scalable inference [Guo et al.,2021,Schioppa et al.,2022], and
identifying relevant training samples with relative influence [Barshan et al.,2020].
Discrete influence played a prominent role in the surprising discovery of long tail phenomenon in Feld-
man [2020], Feldman and Zhang [2020]: the experimental finding that in large datasets like ImageNet,
a significant fraction of training points are atypical, in the sense that the model does not easily learn
to classify them correctly if the point is removed from the training set.
2 Harmonic analysis, influence functions and datamodels
In this section we introduce notations for the standard harmonic analysis for functions on the hy-
percube [O’Donnell,2014], and establish connections between the corresponding fourier coefficients,
discrete influence of data points and linear datamodels from Ilyas et al. [2022].
2.1 Preliminaries: harmonic analysis
In the conceptual framework of Section 1.1, let [N] := {1,2,3, ..., N }. Viewing f:{±1}N→Ras
a vector in R2N, for any distribution Don {±1}N, the set of all such functions can be treated as
a vector space with inner product defined as hf, giD=Ex∼D[f(x)g(x)], leading to a norm defined
as kfkD=pEx∼D[f(x)2]. Harmonic analysis involves identifying special orthonormal bases for this
vector space. We are interested in f’s values at or near p-biased points x∈ {±1}N, where xis viewed
as a random variable whose each coordinate is independently set to +1 with probability p. We denote
this distribution as Bp. Properties of fin this setting are best studied using the orthonormal basis
functions {φS:S⊆[N]}defined as φS(x) = Qi∈Sxi−µ
σ, where µ= 2p−1 and σ2= 4p(1−p) are the
mean and variance of each coordinate of x. Orthonormality implies that Ex[φS(x)] = 0 when S6=∅and
hφS, φS0iBp=S=S0. Then every f: [N]→Rcan be expressed as f=PS⊆[N]b
fSφS. Our discussion
will often refer to b
fS’s as “Fourier” coefficients of f, when the orthonormal basis is clear from context.
3