1 Introduction
Matching problems are a core area of discrete optimization. In the 90s, a seminal paper by Karp
et al. [30] introduced online bipartite maximum matching problems and showed that, in the worst-
case scenario, no deterministic algorithm can beat a simple greedy procedure, and no randomized
algorithm can beat ranking, which is a greedy procedure preceded by a random shuffling of the order
of the nodes. These elegant results and their natural application to online advertising spurred much
research, especially from the late 2000s on (see, e.g., [39] and the references therein for a survey).
While more complex algorithms have been devised for models other than worst-case analysis, greedy
techniques are often used as a competitive benchmark for comparisons, see, e.g., [17,34,50].
In the last few years, motivated by the surge of ride-sharing platforms, a second online matching
paradigm has received much attention: online (bipartite) minimum cost matching. In this class
of problems, one side of the market is composed of servers (sometimes called drivers) and is fully
known at time 0. Nodes from the other side, often called requests or customers, arrive one at a
time. When request iarrives, we must match it to one of the servers j, and incur a cost cij . Server
jis then removed from the list of available servers, and the procedure continues. The goal is to
minimize the total cost of the matching.
Given the motivating application to ride-sharing, it is natural to impose the condition that
both servers and requests belong to some metric space (e.g., [26,29,44,49]). Many algorithms in
this area involve non-trivial, in some cases computationally expensive, procedures like randomized
tree embeddings [40,7], iterative segmentation of the space [29] or primal-dual arguments based
on the computation of offline optimal matchings at each time step [45]. Other algorithms use
randomization to bypass worst-case scenarios for deterministic algorithms [21].
The predominant objective of this line work has been to design algorithms that achieve the
strongest possible performance guarantees in terms of quality of the solution found, which is mea-
sured by an algorithm’s competitive ratio. However, there are other important considerations when
deploying systems that match individuals in real time, such as simplicity, strategyproofness, run-
ning time, and explainability. An extremely simple algorithm that is highly desirable with respect
to all these factors is the greedy algorithm, also called nearest neighbor, that matches each incoming
request to the closest available server. But does it perform well?
Somehow surprisingly, this algorithm often works very well in practice: experiments have shown
that greedy was more effective than other existing algorithms in most tests and has outstanding
scalability [48]. This performance substantiates the choice of many ride-sharing platforms to actu-
ally implement greedy procedures, in combination with other techniques [8,25]. However, current
theory exhibits a mismatch with such strong computational results: if we assume that nservers
and nrequests are adversarially placed on a line, the greedy algorithm only achieves a 2n−1
competitive ratio [26,31,49]. It is therefore important to develop a theory that closes the gap with
practice and gives solid ground to the use of the greedy algorithm. This motivates the first guiding
question of the paper.
Can we find a theoretical justification for the strong practical performance of the greedy
algorithm for online minimum cost matching problems?
A standard approach to the question above is to make a distributional assumption on the input.
Obviously, stronger assumptions may lead to stronger positive results – but such assumptions may
not be verified in practice. Ideally, we would like to identify the hypotheses that are necessary
1