when playing against team j, and, by symmetry, the amount of yards team jallowed
in the game against team i. In case teams iand jhad to face each other Lij times
during the season, we add another index l. Then, yijl corresponds to performance of
team iin category xduring its lth meeting with team j,i, j = 1, . . . , n, l = 1, . . . , Lij .
Next, let hijl denote a homefield indicator for lth game between teams iand j,
taking on value 1 if team iis at home, 0 if the game site is neutral, −1 if team i
is at home. Such numerical encoding was intuitive (typically easier to play at home
than at a neutral site, and at a neutral site than on the road) and also got confirmed
by running a dummy-variable encoding scheme, having shown increases (in points,
touchdowns scored, yards gained) for home and decreases for road games compared to
the neutral site baseline.
To incorporate adjustment for strength of the schedule, we introduce concepts of of-
fensive (defensive) worth of the ”league-average opponent”, and offensive (defensive)
margin for a team. In regards to a particular statistical category y, one can define
the league-average opponent via two parameters - offensive and defensive worth. E.g.
for points per game, offensive (defensive) worth of the average opponent is the aver-
age points per game scored (allowed) by all teams across all games that could have
been played against one another throughout the course of the season. Due to sym-
metry (team iscoring yij points against team jis equivalent to team jallowing yij
points to team i), both offensive and defensive worth represent the same value for the
league-average opponent, which we denote as µ. Now, for each team iwe can posit
parameters capturing two aspects of its performance within a statistical category -
offensive margin αiand defensive margin βi. Offensive (defensive) margin describes by
how much a team would outperform the aforementioned defensive (offensive) worth µ
of the average opponent. The main assumption when adjusting for strength of schedule
is that performance of team iagainst team jin category yis attributable to both the
offensive margin αiof team iand defensive margin βjof team j.
Lastly, presuming that we consider Ccomplementary football statistics, let’s use
xc,jil, c = 1, . . . , C, to denote the value of cth statistic that’s complementary to the
yijl, meaning that xc,jil is obtained when the defense (complementary unit for the
offense) of ith team and offense (complementary unit for the defense) of jth team were
on the field during their lth game of the season between these two teams.
2.2.2. Natural cubic splines
To model potentially non-linear effects of complementary football features, as a well-
known method we utilized natural cubic splines [15], where one uses a mixture of piece-
wise cubic and linear polynomials, smoothly connected at a set of Kknots placed across
the range of the explanatory variable. It results into each complementary statistic xc
being represented by a set of basis functions N1(xc), N2(xc), . . . , NK−1(xc). For more
detail, see [15], keeping in mind that the intercept basis function N(xc) = 1 for each
individual complementary statistic xcin our case is omitted from the basis due to
being folded into the overall model’s intercept. We chose to use K= 5 knots placed at
0.00-, 0.25-, 0.50-, 0.75- and 1.00-quantiles, providing just enough flexibility to capture
any clear non-linearity, while decreasing chances of overfitting and low interpretability
that come with overly flexible fits. That results into each complementary statistic xc
being represented by four basis functions, with its partial effect on response calculated
via a linear combination of these functions.
4