Its Time to Replace TCP in the Datacenter John Ousterhout Stanford University

2025-05-06 0 0 172.12KB 8 页 10玖币

It’s Time to Replace TCP in the Datacenter

John Ousterhout

Stanford University

January 18, 2023

This position paper has been updated since its original publication in October of 2022 in order to correct errors and add

clariﬁcation. Updates are in italics; none of the original text has been modiﬁed. The paper has triggered discussion and dissent;

for pointers to comments on the paper, see the Homa Wiki: https: // homa-transport. atlassian. net/ wiki/ spaces/

HOMA/ overview# replaceTcp .

Abstract

In spite of its long and successful history, TCP is a poor trans-

port protocol for modern datacenters. Every signiﬁcant el-

ement of TCP, from its stream orientation to its expectation

of in-order packet delivery, is wrong for the datacenter. It

is time to recognize that TCP’s problems are too fundamen-

tal and interrelated to be ﬁxed; the only way to harness the

full performance potential of modern networks is to introduce

a new transport protocol into the datacenter. Homa demon-

strates that it is possible to create a transport protocol that

avoids all of TCP’s problems. Although Homa is not API-

compatible with TCP, it should be possible to bring it into

widespread usage by integrating it with RPC frameworks.

1 Introduction

The TCP transport protocol [9] has proven to be phenome-

nally successful and adaptable. At the time of TCP’s design

in the late 1970’s, there were only about 100 hosts attached

to the existing ARPANET, and network links had speeds of

tens of kilobits/second. Over the decades since then, the In-

ternet has grown to billions of hosts and link speeds of 100

Gbit/second or more are commonplace, yet TCP continues to

serve as the workhorse transport protocol for almost all ap-

plications. It is an extraordinary engineering achievement to

have designed a mechanism that could survive such radical

changes in underlying technology.

However, datacenter computing creates unprecedented

challenges for TCP. The datacenter environment, with mil-

lions of cores in close proximity and individual applications

harnessing thousands of machines that interact on microsec-

ond timescales, could not have been envisioned by the de-

signers of TCP, and TCP does not perform well in this envi-

ronment. TCP is still the protocol of choice for most datacen-

ter applications, but it introduces overheads on many levels,

which limit application-level performance. For example, it is

well-known that TCP suffers from high tail latency for short

messages under mixed workloads [2]. TCP is a major contrib-

utor to the “datacenter tax” [3, 12], a collection of low-level

overheads that consume a signiﬁcant fraction of all processor

cycles in datacenters.

This position paper argues that TCP’s challenges in the dat-

acenter are insurmountable. Section 3 discusses each of the

major design decisions in TCP and demonstrates that every

one of them is wrong for the datacenter, with signiﬁcant neg-

ative consequences. Some of these problems have been dis-

cussed in the past, but it is instructive to see them all together

in one place. TCP’s problems impact systems at multiple lev-

els, including the network, kernel software, and applications.

One example is load balancing, which is essential in datacen-

ters in order to process high loads concurrently. Load bal-

ancing did not exist at the time TCP was designed, and TCP

interferes with load balancing both in the network and in soft-

ware.

Section 4 argues that TCP cannot be ﬁxed in an evolution-

ary fashion; there are too many problems and too many in-

terlocking design decisions. Instead, we must ﬁnd a way to

introduce a radically different transport protocol into the dat-

acenter. Section 5 discusses what a good transport protocol

for datacenters should look like, using Homa [19, 21] as an

example. Homa was designed in a clean-slate fashion to meet

the needs of datacenter computing, and virtually every one of

its major design decisions was made differently than for TCP.

As a result, some problems, such as congestion in the network

core fabric, are eliminated entirely. Other problems, such as

congestion control and load balancing, become much easier

to address. Homa demonstrates that it is possible to solve all

of TCP’s problems.

Complete replacement of TCP is unlikely anytime soon,

due to its deeply entrenched status, but TCP can be displaced

for many applications by integrating Homa into a small num-

ber of existing RPC frameworks such as gRPC [6]. With

this approach, Homa’s incompatible API will be visible only

to framework developers and applications should be able to

switch to Homa relatively easily.

2 Requirements

Before discussing the problems with TCP, let us ﬁrst review

the challenges that must be addressed by any transport proto-

col for datacenters.

Reliable delivery. The protocol must deliver data reliably

from one host to another, in spite of transient failures in the

arXiv:2210.00714v2 [cs.NI] 19 Jan 2023

network.

Low latency. Modern networking hardware enables round-

trip times of a few microseconds for short messages. The

transport protocol must not add signiﬁcantly to this latency,

so that applications experience latencies close to the hardware

limit. The transport protocol must also support low latency at

the tail, even under relatively high network loads with a mix

of trafﬁc. Tail latency is particularly challenging for trans-

port protocols; nonetheless, it should be possible to achieve

tail latencies for short messages within a factor of 2–3x of the

best-case latency [19].

High throughput. The transport protocol must support high

throughput in two different ways. Traditionally, the term

“throughput” has referred to data throughput: delivering large

amounts of data in a single message or stream. This kind of

throughput is still important. In addition, datacenter appli-

cations require high message throughput: the ability to send

large numbers of small messages quickly for communication

patterns such as broadcast and shufﬂe [15]. Message through-

put has historically not received much attention, but it is es-

sential in datacenters.

In order to meet the above requirements, the transport pro-

tocol must also deal with the following problems:

Congestion control. In order to provide low latency, the

transport protocol must limit the buildup of packets in net-

work queues. Packet queuing can potentially occur both at

the edge (the links connecting hosts to top-of-rack switches)

and in the network core; each of these forms of congestion

creates distinct problems.

Efﬁcient load balancing across server cores. For more than

a decade, network speeds have been increasing rapidly while

processor clock rates have remained nearly constant. Thus it

is no longer possible for a single core to keep up with a single

network link; both incoming and outgoing load must be dis-

tributed across multiple cores. This is true at multiple levels.

At the application level, high-throughput services must run

on many cores and divide their work among the cores. At the

transport layer, a single core cannot keep up with a high speed

link, especially with short messages. Load balancing impacts

transport protocols in two ways. First, it can introduce over-

heads (e.g. the use of multiple cores causes additional cache

misses for coherence). Second, load balancing can lead to hot

spots, where load is unevenly distributed across cores; this is a

form of congestion at the software level. Load balancing over-

heads are now one of the primary sources of tail latency [21],

and they are impacted by the design of the transport protocol.

NIC ofﬂoad. There is increasing evidence that software-

based transport protocols no longer make sense; they simply

cannot provide high performance at an acceptable cost. For

example:

• The best software protocol implementations have end-

to-end latency more than 3x as high as implementations

where applications communicate directly with the NIC

via kernel bypass.

• Software implementations give up a factor of 5–10x

in small message throughput, compared with NIC-

ofﬂoaded implementations.

• Driving a 100 Gbps network at 80% utilization in both

directions consumes 10–20 cores just in the networking

stack [16, 21]. This is not a cost-effective use of re-

sources.

Thus, in the future, transport protocols will need to move

to special-purpose NIC hardware. The transport protocol

must not have features that preclude hardware implementa-

tion. Note that NIC-based transports will not eliminate soft-

ware load balancing as an issue: even if the transport is in

hardware, application software will still be spread across mul-

tiple cores.

3 Everything about TCP is wrong

This section discusses ﬁve key properties of TCP, which cover

almost all of its design:

• Stream orientation

• Connection orientation

• Bandwidth sharing (“fair” scheduling)

• Sender-driven congestion control

• In-order packet delivery

Each of these properties represents the wrong decision for a

datacenter transport, and each of these decisions has serious

negative consequences.

3.1 Stream orientation

The data model for TCP is a stream of bytes. However, this is

not the right data model for most datacenter applications. Dat-

acenter applications typically exchange discrete messages to

implement remote procedure calls. When messages are serial-

ized in a TCP stream, TCP has no knowledge about message

boundaries. This means that when an application reads from

a stream, there is no guarantee that it will receive a complete

message; it could receive less than a full message, or parts of

several messages. TCP-based applications must mark mes-

sage boundaries when they serialize messages (e.g., by pre-

ﬁxing each message with its length), and they must use this

information to reassemble messages on receipt. This intro-

duces extra complexity and overheads, such as maintaining

state for partially-received messages.

The streaming model is disastrous for software load balanc-

ing. Consider an application that uses a collection of threads

to serve requests arriving across a collection of streams. Ide-

ally, all of the threads would wait for incoming messages

on any of the streams, with messages distributed across the

threads. However, with a byte stream model there is no guar-

antee that a read operation returns an entire message. If mul-

tiple threads both read from a stream, it is possible that parts

of a single message might be received by different threads. In

principle it might be possible for the threads to coordinate and

reassemble the entire message in one of the threads, but this

is too expensive to be practical.

Instead, TCP applications must use one of two inferior

forms of load balancing, in which each stream is owned by a

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

It'sTimetoReplaceTCPintheDatacenterJohnOusterhoutStanfordUniversityJanuary18,2023ThispositionpaperhasbeenupdatedsinceitsoriginalpublicationinOctoberof2022inordertocorrecterrorsandaddclarication.Updatesareinitalics;noneoftheoriginaltexthasbeenmodied.Thepaperhastriggereddiscussionanddissent;forpoint...

展开>> 收起<<

Its Time to Replace TCP in the Datacenter John Ousterhout Stanford University.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Its Time to Replace TCP in the Datacenter John Ousterhout Stanford University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: