A Quantitative Theory of Bottleneck Structures for Data Networks Technical Report August 2021 New York USA Jordi Ros-Giralt1 Noah Amsel2 Sruthi Yellamraju3 James Ezick3 Richard Lethin3

2025-04-27 0 0 4.43MB 20 页 10玖币

侵权投诉

A Quantitative Theory of Bottleneck Structures for Data Networks

Technical Report, August 2021, New York, USA

Jordi Ros-Giralt1, Noah Amsel2, Sruthi Yellamraju3, James Ezick3, Richard Lethin3

Yuang Jiang4, Aosong Feng4, Leandros Tassiulas4

contact: jros@qti.qualcomm.com,

1Qualcomm Europe, Inc

2Formerly of Qualcomm Technologies, Inc

3Qualcomm Technologies, Inc

4Yale, University

Abstract—The conventional view of the congestion control

problem in data networks is based on the principle that a

ﬂow’s performance is uniquely determined by the state of

its bottleneck link, regardless of the topological properties

of the network. However, recent work has shown that the

behavior of congestion-controlled networks is better explained

by models that account for the interactions between bottleneck

links. These interactions are captured by a latent bottleneck

structure, a model describing the complex ripple effects that

changes in one part of the network exert on the other parts.

In this paper, we present a quantitative theory of bottleneck

structures (QTBS), a mathematical and engineering framework

comprising a family of polynomial-time algorithms that can be

used to reason about a wide variety of network optimization

problems, including routing, capacity planning and ﬂow con-

trol. QTBS can contribute to trafﬁc engineering by making

clear predictions about the relative performance of alternative

ﬂow routes, and by providing numerical recommendations for

the optimal rate settings of trafﬁc shapers. A particularly

novel result in the domain of capacity planning indicates

that previously established rules for the design of folded-Clos

networks are suboptimal when ﬂows are congestion controlled.

We show that QTBS can be used to derive the optimal rules for

this important class of topologies, and empirically demonstrate

the correctness and efﬁcacy of these results using the BBR and

Cubic congestion-control algorithms.

1. Introduction

Most research on the problem of congestion control for

data networks is based on the principle that the performance

of a ﬂow is solely determined by the state of its bottleneck

link. This view was presented in the original congestion

control algorithm by Jacobson [1], which helped the Internet

recover from congestion collapse in 1988, and it persisted

throughout the more than 30 years of research and develop-

ment that followed, including Google’s new BBR algorithm

[2]. While it is certainly true that a ﬂow’s performance

is limited by the state of its bottleneck link, recent work

[3] reveals a deeper view of network behavior, describing

how bottlenecks interact with each other through a latent

structure—called the bottleneck structure—that depends on

the topological, routing and ﬂow control properties of the

network. This latent structure explains how the performance

of one bottleneck can affect other bottlenecks, and provides

a framework to understand how perturbations in the capacity

of a link or the rate of a ﬂow propagate through a network,

affecting other links and ﬂows.

While [3] introduced the concept of bottleneck struc-

ture, the analysis provided was qualitative. In this paper

we present a quantitative theory of bottleneck structures

(QTBS), a mathematical framework that yields a set of

polynomial time algorithms for quantifying the ripple effects

of perturbations in a network. Perturbations can either be

unintentional (such as the effect of a link failure or the

sudden arrival of a large ﬂow in a network) or intentional

(such as the upgrade of a network link to a higher capacity

or the modiﬁcation of a route with the goal of optimizing

performance). With QTBS, a network operator can quantify

the effect of such perturbations and use this information to

optimize network performance.

The theoretical contributions of this paper are as follows:

•A new generalized bottleneck structure called gradient

graph is studied in detail. A key difference with the bottle-

neck structure introduced in [3] is that the gradient graph

allows us to not only qualify the inﬂuences that ﬂows and

bottlenecks exert on each other, but also to quantify them.

This leads to the development of a quantitative theory of

bottleneck structures (QTBS), introduced in this paper.

(Section 2.2)

•A novel, fast algorithm to compute the gradient graph

is developed. This algorithm constitutes an asymptotic

speed-up compared to those presented in [3], allowing

us to scale our methodology to large production networks

(Sections 2.2.)

•The concepts of link and ﬂow gradient are introduced.

These mathematical operators quantify the effects of in-

ﬁnitesimally small perturbations in a network, the core

building blocks of QTBS. A new, fast method to efﬁ-

arXiv:2210.03534v1 [cs.NI] 6 Oct 2022

ciently compute the gradients by leveraging the bottleneck

structure is presented. (Section 2.3.)

Applications demonstrating the practical implications of

QTBS are provided in the areas of routing, capacity planning

and ﬂow control. In each of these applications, we show

how QTBS can potentially alter some of the established

conventional best practices. Our practical contributions are

as follows:

•In the routing application, we introduce an algorithm to

ﬁnd maximal-throughput routes by anticipating the effects

of the congestion control algorithm. While in traditional

trafﬁc engineering approaches (e.g., [4]) the problems of

routing and ﬂow control are considered independently, we

show how QTBS can help resolve them jointly, allowing

operators to design routes that are efﬁcient from a con-

gestion control standpoint. (Section 3.1.)

•In the capacity planning application, we use QTBS to

optimize the bandwidth allocation between the spine and

leaf links of a fat-tree (also known as folded-Clos [5]).

We demonstrate that, due to the effects of congestion

control, the optimal design differs from the full fat-tree

conﬁguration proposed by Leiserson [6]. (Section 3.2.)

•In the ﬂow control application, we show that QTBS can

be used to precisely compute the rate reduction that a

set of trafﬁc shapers must impose on the network’s low

priority ﬂows in order to achieve a quantiﬁable positive

impact on the high-priority ﬂows. (Section 3.3.)

•To demonstrate that networks behave according to QTBS,

we carry out experiments for each application we consider

using production TCP/IP code and the widely adopted

BBR [2] and Cubic [7] congestion control algorithms.

(Section 3.)

2. Quantitative Theory of Bottleneck Struc-

tures (QTBS)

2.1. Network Model

In their simplest form, networks can be modeled using

two kinds of elements: links, which are communication

resources with a limited capacity; and ﬂows, which make

use of these communication resources. We formalize the

deﬁnition of a network as follows:

Deﬁnition 1. Network. We say that a tuple N=

hL,F,{cl,∀l∈ L}i is a network if:

•Lis a set of links of the form {l1, l2, ..., l|L|},

•Fis a set of ﬂows of the form {f1, f2, ..., f|F|}, and

•clis the capacity of link l, for all l∈ L.

Each ﬂow ftraverses a subset of links Lf⊂ L and,

similarly, each link lis traversed by a subset of ﬂows

Fl⊂ F. Finally, each ﬂow ftransmits data at a rate rf

and the capacity constraint P∀f∈Flrf≤clmust hold for

all l∈ L.

A core concept upon which our mathematical framework

rests is the notion of a bottleneck link. Intuitively, a ﬂow is

bottlenecked at a link if bypassing the link would allow its

transmission rate to increase. A link whose capacity is fully

utilized is always a bottleneck of at least one ﬂow, though

not necessarily of all the ﬂows traversing it. In this work,

we adopt the following formal deﬁnition:

Deﬁnition 2. Bottleneck link. Let N=hL,F,{cl,∀l∈ L}i

be a network where each ﬂow f∈ F transmits data at a rate

rfas determined by a congestion control algorithm (e.g.,

TCP’s algorithm [1]). We will say that ﬂow fis bottlenecked

at link l—equivalently, that link lis a bottleneck of ﬂow

f—if and only if:

•Flow ftraverses link l.

•∂rf/∂c−

l6= 0. That is, the transmission rate of ﬂow f

changes upon small reductions in link l’s capacity.1

This characterization of bottlenecks is a generalization

of some of the classic deﬁnitions found in the literature.

Unlike previous work, however, it is based on the notion of

aperturbation, mathematically expressed as a derivative of a

ﬂow rate with respect to the capacity of a link (∂rf/∂cl). As

an example to illustrate that our deﬁnition of bottleneck is

relatively loose, in Appendix A we show that it generalizes

the classic max-min deﬁnition of Bertsekas and Gallager

[8]. The generality of the deﬁnition of bottlenecks used

in this paper suggests that our framework can be applied

to a wide variety of rate allocation schemes—not only to

max-min fairness [8], proportional fairness [9] and speciﬁc

algorithms (e.g., BBR [2], Cubic [7], Reno [10], etc.), but to

other classes of congestion control solutions that meet the

conditions of Deﬁnition 2. We leave this promising direction

for future work, and focus on the classic max-min setting

considered in [3].

We complete the description of our network model by

deﬁning the concept of a link’s fair share:

Deﬁnition 3. Fair share of a link. Let N=hL,F,{cl,∀l∈

L}i be a network. The fair share slof a link l∈ L is the

rate of the ﬂows that are bottlenecked at link l.

As we will see throughout this work, the concept of link

fair share is dual to the concept of ﬂow rate, in that many

of the mathematical properties that are applicable to the rate

of a ﬂow are also applicable to the fair share of a link.

2.2. The Gradient Graph

Our objective is to derive a mathematical framework

capable not just of detecting but also of quantifying the

inﬂuences that links and ﬂows exert on each other. In [3], the

authors introduced two bottleneck structures, the bottleneck

precedence graph (BPG) and the gradient graph, and demon-

strated that data networks qualitatively operate according to

the BPG structure. The authors brieﬂy described the concept

of the gradient graph, but their work focused mostly on

the mathematical properties of the bottleneck precedence

1. We use the notation ∂y/∂x−to denote the left derivative. This

subtlety is necessary because a ﬂow can have multiple bottleneck links.

In this case, decreasing the capacity of only one bottleneck would affect

the rate of the ﬂow, while increasing its capacity would not; thus, the

(two-sided) derivative would not exist.

graph. In our paper, we instead focus on a modiﬁed version

of the gradient graph structure. Our work stems from the

insight that, as we will show, this structure enables not just

qualitative analysis, as in [3], but also quantitative analysis,

providing a framework to better understand and optimize

network performance.

We start with the deﬁnition of the gradient graph:

Deﬁnition 4. Gradient graph. Let N=hL,F,{cl,∀l∈ L}i

be a network. The gradient graph is a directed graph such

that:

1) There exists a vertex for each bottleneck link and each

ﬂow in the network.

2) For every ﬂow f∈ F:

a) If fis bottlenecked at link l∈ L, then there exists

a directed edge from lto f;

b) If ftraverses link l∈ L, then there exists a directed

edge from fto l;

For ease of exposition, in this paper we will use the terms

gradient graph and bottleneck structure interchangeably.

This deﬁnition is borrowed from [3], except for a subtle

but relevant modiﬁcation of 2b. (The rest of the theoretical

developments presented in this work are new contributions.)

Previously, edges were only included from ﬂows to links

that they traverse, but that do not bottleneck them. In this

work, we also include edges from ﬂows to their bottleneck

links. We call these “backward edges”, and we introduce

them because they are required by several of our theorems

and algorithms.

The utility of our deﬁnition of gradient graph as a data

structure for understanding network performance is captured

in the following theorem:

Theorem 1. Propagation of network perturbations. Let

x, y ∈ L∪F be a pair of links or ﬂows in the network. Then

a perturbation in the capacity cx(for x∈ L) or transmission

rate rx(for x∈ F) of xwill affect the fair share sy(for

y∈ L) or transmission rate ry(for y∈ F) of yif only

if there exists a directed path from xto yin the gradient

graph.

Proof. See Appendix B.

Intuitively, the gradient graph of a network describes

how perturbations in link capacities and ﬂow transmission

rates propagate through the network. Imagine that ﬂow f

is bottlenecked at link l. From Deﬁnition 2, this necessarily

implies that a perturbation in the capacity of link lwill cause

a change on the transmission rate of ﬂow f,∂rf/∂cl6= 0.

This is reﬂected in the gradient graph by the presence of

a directed edge from a link lto a ﬂow f(Condition 2a in

Deﬁnition 4). A change in the value of rf, in turn, affects all

the other links traversed by ﬂow f. This is reﬂected by the

directed edges from fto the links it traverses (Condition 2b).

This basic process of (1) inducing a perturbation in a vertex

(either in a link or a ﬂow vertex) followed by (2) propagating

the effects of the perturbation along the departing edges of

the vertex creates a ripple effect in the bottleneck structure

as described in Theorem 1. Leveraging this result, we can

formally introduce the concept of region of inﬂuence:

Deﬁnition 5. Regions of inﬂuence in a network. The region

of inﬂuence of a link or ﬂow x∈ L ∪ F, denoted R(x), is

the set of links and ﬂows ythat are reachable from xin the

gradient graph.

The region of inﬂuence is an important concept in

network performance analysis and optimization because it

describes what parts of a network are affected by perturba-

tions in the performance of a link or a ﬂow. In Section 2.3,

we will also see how such inﬂuences can be quantiﬁed.

We now introduce the GradientGraph() algorithm (Al-

gorithm 1), a procedure that constructs the gradient graph of

a network. The algorithm begins with crude estimates of the

fair share rates of the links, and iteratively reﬁnes them until

all the capacity in the network has been allocated and the

rate of each ﬂow reaches its ﬁnal value. In the process, the

gradient graph is constructed level by level. The algorithm

starts by initializing the available capacity of each link (line

3), estimating its fair share (line 4) and adding all links to

a min-heap by taking their fair share value as the key (line

5). At each iteration, the algorithm picks the unresolved link

with the lowest fair share value from the min-heap (line 8).

Once this link is selected, all unresolved ﬂows remaining in

the network that traverse it are resolved. That is, their rates

are set to the fair share of the link (line 12) and they are

added to the set of vertices of the gradient graph V(line

13). In addition, directed edges are added in the gradient

graph between the link and all the ﬂows bottlenecked at it

(line 10) and from each of these ﬂows to the other links that

they traverse (line 15). Lines 16-17-18 update the available

capacity of the link, its fair share, and the position of the link

in the min-heap according to the new fair share. Finally, the

link itself is also added as a vertex in the gradient graph (line

22). This iterative process is repeated until all ﬂows have

been added as vertices in the gradient graph (line 7). The

algorithm returns the gradient graph G, the fair share of each

link {sl,∀l∈ L} and the rate of each ﬂow {rf,∀f∈ F}.

We conclude this section stating the time complexity of

the GradientGraph() algorithm:

Lemma 1. Time complexity of GradientGraph(). The time

complexity of running GradientGraph() is O(|L|log |L|·H),

where His the maximum number of ﬂows that traverse a

single link.

Proof. See Appendix E.

2.3. Link and Flow Gradients

In this section, we focus on the problem of quantifying

the ripple effects created by perturbations in a network.

Because networks are composed of links and ﬂows, there

are two kinds of perturbations: (1) those originating from

changes to the capacity of a link and (2) those originating

from changes to the rate of a ﬂow. When such changes

occur, the congestion control algorithm adjusts its allocation

Algorithm 1 GradientGraph(N=hL,F,{cl,∀l∈ L}i)

1: V=∅;E=∅;rf=∞,∀f∈ F;

2: for l∈ L do

3: al=cl; # available capacity

4: sl=al/|Fl|; # fair share

5: MinHeapAdd(key = sl,value= l);

6: end for

7: while F 6⊆ V do

8: l= MinHeapPop();

9: for f∈ Flsuch that rf≥sldo

10: E=E∪ {(l, f),(f, l)};

11: if f6∈ V then

12: rf=sl;

13: V=V ∪ {f};

14: for l0∈ Lfsuch that rf< sl0do

15: E=E∪ {(f, l0)}

16: al0=al0−sl

17: sl0=al/|Fl\ V|;

18: MinHeapUpdateKey(value = l0,newKey = sl0);

19: end for

20: end if

21: end for

22: V=V ∪ {l};

23: end while

24: return hG =hV, Ei,{sl,∀l∈ L},{rf,∀f∈ F}i;

of bandwidth to the ﬂows so as to maintain two objec-

tives: (1) maximizing network utilization while (2) ensuring

fairness among competing ﬂows. The congestion control

algorithm acts like a function mapping network conditions

(including its topology, link capacities, and ﬂow paths) to

rate allocations. Large changes in any of these inputs can

have complicated ripple effects on the ﬂow rates, but for

sufﬁciently small changes, the bandwidth allocation function

is linear.2This local linearity property naturally motivates

the concept of link and ﬂow gradients:

Deﬁnition 6. Link and ﬂow gradients. Let N=

hL,F,{cl,∀l∈ L}i be a network. We deﬁne:

•The gradient of a link l∈ L with respect to another link

l∗∈ L as ∇l∗(l) = ∂sl/∂cl∗;

•The gradient of a ﬂow f∈ F with respect to some link

l∗∈ L as ∇l∗(f) = ∂rf/∂cl∗;

•The gradient of a link l∈ L with respect to a ﬂow f∗∈ F

as ∇f∗(l) = ∂sl/∂rf∗;

•The gradient of a ﬂow f∈ F with respect to another

ﬂow f∗∈ F as ∇f∗(f) = ∂rf/∂rf∗.

Intuitively, the gradient with respect to a link measures

the impact that a small perturbation in its capacity has on

another link or ﬂow. In real networks, this corresponds to the

scenario of physically upgrading a link or, in programmable

networks (e.g., [11]), logically modifying the capacity of

a virtual link. Thus, link gradients can generally be used

to resolve network design and capacity planning problems.

Similarly, the gradient with respect to a ﬂow measures the

impact that a perturbation in its rate has on a link or another

ﬂow. This scenario corresponds, for instance, to the case

of trafﬁc shaping a ﬂow to alter its transmission rate or

changing the route of a ﬂow—which can be seen as dropping

2. Technically, it is piecewise linear, like the absolute value function, so

picking a linear function that locally approximates it requires knowing the

direction of the change.

the rate of that ﬂow down to zero and adding a new ﬂow

with a different path. Thus, ﬂow gradients can generally be

used to resolve trafﬁc engineering problems. In Section 3

we will see applications in real networks that illustrate each

of these scenarios.

We now present an algorithm called ForwardGrad()

(Algorithm 2) for calculating link and ﬂow gradients. The

algorithm takes a set of links and ﬂows, the gradient graph

of the corresponding network, a link or ﬂow xwith respect

to which to compute the gradients, and a direction ∆xof the

perturbation. It outputs the gradients of all links and ﬂows

in the network with respect to x.ForwardGrad() takes inspi-

ration from forward mode automatic differentiation (“For-

ward Prop”) [12], an algorithm that uses directed acyclic

graphs to represent complicated mathematical functions as

compositions of simpler functions, whose derivatives can

be composed by repeatedly applying the chain rule. In the

case of congestion control, we do not have a closed-form

mathematical formula that relates network conditions (the

inputs) to the ﬂow rates and fair share values (the outputs),

but we can use the gradient graph to break down and

optimize this function.

The thrust of the algorithm is as follows. For all l∈ L,

let ∆lbe the change in the fair share rate of link l. For

all f∈ F, let ∆fbe the change in the rate of ﬂow f. We

call these variables the “drifts” caused by a perturbation.

Before the perturbation, ∆l= ∆f= 0 for all links and

ﬂows. To begin the algorithm, we make an inﬁnitesimally

small perturbation in the independent variable (the one in

the “denominator” of the derivative) that can be positive

or negative. If the independent variable xis a ﬂow f, we

set ∆f=δ(line 2). If it is a link l, and Slis the set

of direct successors of node lin the gradient graph, we set

∆l=δ/|Sl|(line 3). This is done since, by deﬁnition of the

gradient graph, |Sl|is the number of ﬂows bottlenecked at l

and the change in l’s capacity will be distributed evenly

among these ﬂows. To determine how this perturbation

propagates to the rest of the network, we follow all directed

paths from that vertex and update the drifts according to the

following two invariants:

•Invariant 1: Flow Equation. A ﬂow’s drift ∆fequals

the minimum drift of its bottleneck links. That is, ∆f=

minl∈Pf∆l, where Pfis the set of links visited directly

before ﬂow vertex fon a path from the starting vertex x

(the predecessors in the graph).

•Invariant 2: Link Equation. A link’s drift ∆lis the

negative of the ﬂow drifts entering its vertex, divided

by the number of ﬂow drifts leaving it. That is, ∆l=

−Pf∈Pl∆f/|Sl|, where Plis the set of ﬂow vertices

visited directly before link vertex land Slis the set of

ﬂow vertices visited directly after link vertex lon a path

from the starting vertex x.

Finally, the derivative of a given variable with respect to the

independent variable that we perturbed can be calculated by

dividing its drift by δ. In particular, assume the capacity

of link lis the independent variable that we perturbed and

let the rate of ﬂow fbe the dependent variable in which

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AQuantitativeTheoryofBottleneckStructuresforDataNetworksTechnicalReport,August2021,NewYork,USAJordiRos-Giralt1,NoahAmsel2,SruthiYellamraju3,JamesEzick3,RichardLethin3YuangJiang4,AosongFeng4,LeandrosTassiulas4contact:jros@qti.qualcomm.com,1QualcommEurope,Inc2FormerlyofQualcommTechnologies,Inc3Qualcom...

展开>> 收起<<

A Quantitative Theory of Bottleneck Structures for Data Networks Technical Report August 2021 New York USA Jordi Ros-Giralt1 Noah Amsel2 Sruthi Yellamraju3 James Ezick3 Richard Lethin3.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Quantitative Theory of Bottleneck Structures for Data Networks Technical Report August 2021 New York USA Jordi Ros-Giralt1 Noah Amsel2 Sruthi Yellamraju3 James Ezick3 Richard Lethin3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: