Examples of such centrality measures are degree, betweenness [13], closeness [2, 3, 11],
eigenvector [5, 6], PageRank [22], and Katz centrality [15]. The latter is the focus of this
paper.
Katz centrality, developed by Leo Katz in 1953 [15], has been used in numerous applica-
tions [10, 24]. The Katz centrality of a node is a weighted count of all walks of any length
starting at the node. Each walk of length kis weighted by αk, where αis called the Katz
parameter. We give a formal definition of Katz centrality in Section 1.1. Since the Katz
parameter has a decaying effect, we can approximate the Katz centrality by ignoring the
contribution of walks past a given length L. In [19] and [20], the authors numerically explore
this type of approximation. In Section 2, we give a lower bound on this value L(in terms of
α) that guarantees a desired level of accuracy in terms of the Katz centrality and its node
ranking.
As an application, we look at a model developed by the National Aeronautics and Space
Administration (NASA) called the Susceptibility Inference Network (SIN). This network
models a set of medical conditions that may progress from one to another with some prob-
ability. The application of Katz centrality in this setting estimates the effect of a medical
condition and its consequent progressions on astronaut productivity. NASA wishes to use
the progression risk to identify a condition (or group of conditions) that highly influences
the risk of the crew members.
This paper is organized as follows. Section 1.1 reviews useful graph theory concepts,
definitions and basic results. Section 1.2 introduces a network of medical conditions created
by subject matter experts, which we will later use as a case study. In Section 2, we bound
the error generated from approximating the Katz centrality by restricting the number of
steps allowed in a walk, and develop a relationship between that number and the Katz
parameter α. An example of the relationship between αand the α-Katz centrality node
ranking is in Section 2 and our medical application in Section 3. Additionally, we assess
the upper bound given in Section 2 to the true length in Section 4. Finally, the results and
future work are addressed in Section 5.
1.1 Definitions
In this section, we provide basic definitions for the graph theoretical structures and tools
used for the results in Section 2.
Definition 1. (Weighted, directed network) Let N= (V, E, w)be an edge-weighted,
directed network consisting of V, the set of n nodes,E⊆V×V, the set of edges, and a
weight function w:E→R+.
We represent such a network by an adjacency matrix A=A(N), where the entry Aij
is the weight of the edge from node ito node j, or Aij = 0 if there is no edge from ito j
in N. Let Wbe an n-dimensional vector of non-negative node weights. In a setting where
edges and/or nodes are unweighted, weights in Aand Ware all set to 1.
For example, in our application in Section 3, our weighted, directed network has nodes
that represent medical conditions and the node weights represent their severity, while edge
weights represent the probability of one medical condition progressing to another.
The spectral radius ρof N(or A) is the maximum modulus of the eigenvalues of A. A
walk of length kfrom node uto vis a sequence of kedges (vi, vi+1)∈E,i∈[1, k]such that
v1=uand vk+1 =w. The distance from uto vis the length of a shortest walk from uto
v. The k-hop neighborhood of a node v∈Vis the set of nodes at a distance less than or
2