Dynamically meeting performance objectives for
multiple services on a service mesh
Forough Shahab Samani †and Rolf Stadler†
†Dept. of Computer Science, KTH Royal Institute of Technology, Sweden
Email: {foro, stadler}@kth.se
October 11, 2022
Abstract—We present a framework that lets a service provider
achieve end-to-end management objectives under varying load.
Dynamic control actions are performed by a reinforcement
learning (RL) agent. Our work includes experimentation and
evaluation on a laboratory testbed where we have implemented
basic information services on a service mesh supported by the
Istio and Kubernetes platforms. We investigate different manage-
ment objectives that include end-to-end delay bounds on service
requests, throughput objectives, and service differentiation. These
objectives are mapped onto reward functions that an RL agent
learns to optimize, by executing control actions, namely, request
routing and request blocking. We compute the control policies not
on the testbed, but in a simulator, which speeds up the learning
process by orders of magnitude. In our approach, the system
model is learned on the testbed; it is then used to instantiate
the simulator, which produces near-optimal control policies for
various management objectives. The learned policies are then
evaluated on the testbed using unseen load patterns.
Index Terms—Performance management, reinforcement learn-
ing, service mesh, digital twin
I. INTRODUCTION
End-to-end performance objectives for a service are difficult
to achieve on a shared and virtualized infrastructure. This
is because the service load often changes in an operational
environment, and service platforms do not offer strict resource
isolation, so that the resource consumption of various tasks
running on a platform influences the service quality.
In order to continuously meet performance objectives for a
service, such as bounds on delays or throughput for service
requests, the management system must dynamically perform
control actions that re-allocate the resources of the infras-
tructure. Such control actions can be taken on the physical,
virtualization, or service layer, and they include horizontal and
vertical scaling of compute resources, function placement, as
well as request routing and request dropping.
The service abstraction we consider in this paper is a
directed graph, where the nodes represent processing functions
and the links communication channels. This general abstrac-
tion covers a variety of services and applications, such as a
network slice on a network substrate, a service chain on a
softwarized network, a micro-service based application, or a
pipeline of machine-learning tasks. We choose the service-
mesh abstraction in this work and apply it to micro-service
based applications.
In this paper, we propose a framework for achieving end-
to-end management objectives for multiple services that con-
currently execute on a service mesh. We apply reinforcement
learning (RL) techniques to train an agent that periodically per-
forms control actions to reallocate resources. A management
objective in this framework is expressed through the reward
function in the RL setup. We develop and evaluate the frame-
work using a laboratory testbed where we run information
services on a service mesh, supported by the Istio and Kuber-
netes platforms [1],[2]. We investigate different management
objectives that include end-to-end delay bounds on service
requests, throughput objectives, and service differentiation.
Training an RL agent in an operational environment (or
on a testbed in our case) is generally not feasible due to
the long training time, which can extend to weeks, unless
the state and action spaces of the agent are very limited.
We address this issue by computing the control policies in
a simulator rather than on the testbed, which speeds up the
learning process by orders of magnitude for the scenarios
we study. In our approach, the RL system model is learned
from testbed measurements; it is then used to instantiate
the simulator, which produces near-optimal control policies
for various management objectives, possibly in parallel. The
learned policies are then evaluated on the testbed using unseen
load patterns (i.e. patterns the agent has not been trained on).
We make two contributions with this paper. First, we present
an RL-based framework that computes near-optimal control
policies for end-to-end performance objectives on a service
graph. This framework simultaneously supports several ser-
vices with different performance objectives and several types
of control operations.
Second, as part of this framework, we introduce a simulator
component that efficiently produces the policies. Through
experimentation, we study the tradeoff of using the simulator
versus learning the policies on a testbed. We find that while
we lose some control effectiveness due to the inaccuracy of
the system model we gain by significantly shortening the
training time, which makes the approach suitable in practice.
To the best of our knowledge, this paper is the first to
advocate a simulation phase a part of implementing a dynamic
performance management solution on a real system.
Note that, when developing and presenting our framework,
we aim at simplicity, clarity, and rigorous treatment, which
helps us focus on the main ideas. For this reason, we choose
a small service mesh for our scenarios (which still includes
key complexities of larger ones), we consider only two types
1
arXiv:2210.04002v1 [cs.LG] 8 Oct 2022