
the empirical risk fj(θ) = 1
njPnj
i=1 `(θ,ξji)over the parameter of interest θfrom data {ξji}nj
i=1 ∼ Pj.
We propose to minimize an augmented objective
Fw,λ(Θ,β) =
m
X
j=1
wj[fj(θj) + λj||θj−β||2]
jointly over task-specific estimators Θ= (θ1,···,θm)∈Rd×mand a multi-task center β∈Rdto be
learned. Weighting hyperparameters wjare specified to reflect the importance of the information
regarding each individual task (e.g. wj=nj). The regularization term λj||θj−β||2drives the
estimator of each individual task θjtowards a common center β, with strength parameterized by
(λ1,··· , λm). It is straightforward to see that our method interpolates between minimizing the risk
of the individual task as λjapproaches zero, and a robust pooling of the individual minimizers as
λjincreases.
Our key contribution is the analysis of the aforementioned procedure in a wide range of problems
where the losses {fj}m
j=1 are convex but allowed to be nonsmooth. We prove that the proposed
estimator automatically adapts to the unknown similarity among the tasks. In our motivating
example, the cost function is naturally a piecewise linear nonsmooth convex function which is
closely related to quantile regression. Other examples include linear max-margin classifiers as well
as threshold regression models. Among these models, since the objective function is not differentiable
at many places, technical challenges arise in the uniform concentration results and convergence rates
as the subgradient is now a set-valued mapping and not continuous. Nonetheless, with statistical
modeling, a theoretical analysis under such scenarios becomes possible.
In addition to the theoretical guarantees, we experiment with the numerical procedures on both
synthetic data and a real-world dataset of the newsvendor problem in Section 5. The experiment
reveals a steady and reliable benefit of the performance of the proposed method over benchmark
ones, with significant improvement over STL where the data are scarce, and over blindly pooling
the data together. This proposed method offers a reliable procedure for practitioners to leverage
the possible relatedness between tasks in inventory decision-making, financial risk management, and
many other applications.
1.1 Related Work
Multi-task learning based on parameter augmentation, such as the introduction of the common
center βin our method, has achieved great empirical success (Evgeniou and Pontil,2004;Jalali
et al.,2013;Chen et al.,2011). Our estimator originates from the framework of Adaptive and
Robust Multi-task Learning (ARMUL) proposed by Duan and Wang (2022), while we relaxed the
smoothness and strong convexity condition on the empirical risks fj, such that we can extend
the analysis to many real-world applications from statistical learning to inventory decision-making,
to financial risk management. The motivating inventory management example, often known as
the data-driven newsvendor problem, can be expressed as a quantile regression problem with the
quantile level determined by a ratio of per unit holding cost versus the backordering one (Levi et al.,
2007,2015;Ban and Rudin,2019). The objective function, also known as the “check function”, is
convex but not differentiable. These applications coincide with the classical quantile regression in
statistics and econometrics literature, dated back to Koenker and Bassett Jr (1978), which estimates
the conditional quantile of the response variable across values of predictive covariates. Besides the
aforementioned newsvendor problems in inventory management, quantile regression finds a wide
range of applications in survival data analysis (Koenker and Geling,2001;Wang and Wang,2014),
financial risk management (Engle and Manganelli,1999;Rockafellar et al.,2000) and many other
2