tions in question are rare. Chandrasekhar (2016) argues that many economic networks are
sparse, providing evidence from commonly used social network data (e.g. AddHealth; Kar-
nataka Villages (Banerjee et al. 2013); Harvard social network (Leider et al. 2009)). Sparsity
poses a challenge to estimation and inference: if networks are largely empty, there might not
be enough variation in centrality measures to identify the parameters of interest. Despite
its importance, sparsity has received relatively little attention in the network econometrics
literature.
Secondly, the observed network may differ from the true network of interest. Centrality
measures are often calculated on data which are obtained by survey or constructed using
some proxy for interaction between agents, though subsequent analysis would frequently treat
the true network as known. Ignoring measurement error may thus lead to estimates that
perform poorly. A growing literature works with networks that are assumed to be measured
with error. However, they generally do not consider sparse settings. This is important
since sparsity and measurement error are mutually reinforcing: sparser networks contain
weaker signals, which are in turn more difficult to pick out from noisy measurements. The
upshot is that OLS estimators computed on sparse, noisy networks may have particularly
poor properties. Asymptotic theory that ignore these features will provide similarly poor
approximations to their finite sample behavior. Consequently, estimation and inference
procedures based on these theories may lead to invalid conclusions about the economic
significance of centrality measures.
This paper studies the statistical properties of OLS on centrality measures in an asymp-
totic framework which features both measurement error and sparsity. Our analysis is centered
on degree, diffusion and eigenvector centralities, which are among the most popular mea-
sures. Our contribution is threefold: (1) We characterize the amount of sparsity at which
OLS estimators become inconsistent with and without measurement error, finding that this
threshold varies depending on the centrality measure used. Specifically, regression on eigen-
vector centrality is less robust to sparsity than that on degree and diffusion. This suggests
that researchers should be cautious about comparing regressions on different centrality mea-
sures, since they may differ in statistical properties in addition to economic significance. (2)
We develop distributional theory for OLS estimators under measurement error and sparsity.
We restrict ourselves to sparsity ranges under which OLS is consistent, but we find that
asymptotic bias can be large even in this case. Furthermore, the bias may be of larger order
than variance, in which case bias correction would be necessary for obtaining non-degenerate
asymptotic distributions. Additionally, we find that under sparsity, the estimator converges
at a slower rate than is reflected by the usual heteroskedasticity-consistent(hc)/robust stan-
dard errors, requiring a different estimator. (3) In view of the distributional theory, we
3