
methods. By reducing the discriminatory bias dramatically, our method outperforms the state-of-
the-art methods while maintaining reasonable short interval lengths.
Related works.
Existing approaches for building a fair mean regression broadly fall into three
classes: pre-processing, in-processing and post-processing. In particular, preprocessing methods
focus on transforming the data to remove any unwanted bias [
6
,
34
,
50
]; in-processing methods aim
to build in fairness constraints into the training step [
2
,
5
,
28
,
32
]; post-processing methods target
to modify the trained predictor [
12
,
13
,
31
]. As few previous works have focused on the quantile
fairness of and fair prediction interval, the most related are Yang et al.
[46]
, where a different fairness
measure was used. While Agarwal et al.
[2]
mentioned that their reduction-based approach can be
adapted into quantile regression, Williamson and Menon
[44]
brought forward a novel conditional
variance at risk fairness measure aiming to control the largest subgroup risk. For interval fairness
measure, the approach by Romano et al.
[37]
achieved equalized coverage among groups without
fairness on interval endpoints. Methodologically, integrating algorithmic fairness with Wasserstein
distance based barycenter problem has been studied in [
3
,
12
,
13
,
21
,
26
]. Both in-processing [
2
,
26
]
and post-processing [
12
,
13
] methods were proposed to solve classification and mean regression
problems. As a post-processing method, our work is distinct from above-mentioned methods by
constructing the DP-fairness for each population quantile, and generating a fair prediction interval
accordingly.
Notations.
We denote by
[K]
the set
{1, . . . , K}
for arbitrary integer
K
.
|S|
represents the cardinality
for a finite set
S
.
E
and
P
represent the expectation and probability and
{·}
is the indicator function.
Let
{Zn}∞
n=1
be a sequence of random variables, and
{kn}∞
n=1
be a sequence of positive numbers,
we say that
Zn=Op(kn)
, if
limT→∞ lim supn→∞ P(|Zn|> T kn)=0
, then
Zn/kn=Op(1)
. To
denote the equality in distribution of two random variables Aand B, we write Ad
=B.
2 Problem statement
Consider the regression problem where a “sensitive characteristic”
S
is available, which by the
U.S. law [
21
,
37
] can be enumerated as sex, race, age, disability, etc. We observe the triplets
(X1, S1, Y1),...,(Xn, Sn, Yn)
, denote
(Xi, Si, Yi)
by
Zi
,
i= 1, . . . , n
and
Zi
is a random variable
in
Rp×[K]×R
. The aim is to predict the unknown value of
Yn+1
at a test point
Xn+1, Sn+1
. Let
P
be the joint distribution of
Z
, we assume that all the samples
{Zi}n+1
i=1
are drawn exchangeable,
where i.i.d. is a special case.
Our goal is to construct a marginal distribution-free prediction band
C(Xn+1, Sn+1)⊆R
that is
likely to cover the unknown response
Yn+1
with finite-sample (nonasymptotic) validity. Formally,
given a desired miscoverage rate α, the predicted interval satisfies
P{Yn+1 ∈C(Xn+1, Sn+1)} ≥ 1−α(1)
for any joint distribution
P
and any sample size
n
, while the left and right endpoint of
C(Xn+1, Sn+1)
satisfies the fairness constraint of Demographic Parity concerning the sensitive variable S.
Demographic Parity.
We introduce the quantitative definition of DP in fair regression and connect
the DP-fairness with a quantile regressor
qα
. The result that
qα
can be projected to the fair counterparts
using optimal transport will be invoked later.
Given a fixed quantile level
α
(it may refer to
αlo
or
αhi
indicating the upper and lower quantile
estimates for the prediction band
C(Xn+1, Sn+1)
). Let
qα:Rp×[K]→R
represent an arbitrary
conditional quantile predictor. Denote by
νqα|s
the distribution of
(qα(X, S)|S=s)
, the Cumulative
Distribution Function (CDF) of νqα|sis given by
Fνqα|s(t) = P(qα(X, S)≤t|S=s).(2)
The quantile function
Qνqα|s=F−1
νqα|s: [0,1] →R
,namely, the generalized inverse of
Fνqα|s
, can
thus be defined as for all levels t∈(0,1],
Qνqα|s(t) = inf{y∈R:Fνqα|s(y)≥t}with Qνqα|s(0) = Qνqα|s(0+).(3)
To simplify the notations, we will write
Fqα|s
and
Qqα|s
instead of
Fνqα|s
and
Qνqα|s
respectively,
for any prediction rule qα.
3