Its Time to Replace TCP in the Datacenter John Ousterhout Stanford University

2025-05-06 0 0 172.12KB 8 页 10玖币
侵权投诉
It’s Time to Replace TCP in the Datacenter
John Ousterhout
Stanford University
January 18, 2023
This position paper has been updated since its original publication in October of 2022 in order to correct errors and add
clarification. Updates are in italics; none of the original text has been modified. The paper has triggered discussion and dissent;
for pointers to comments on the paper, see the Homa Wiki: https: // homa-transport. atlassian. net/ wiki/ spaces/
HOMA/ overview# replaceTcp .
Abstract
In spite of its long and successful history, TCP is a poor trans-
port protocol for modern datacenters. Every significant el-
ement of TCP, from its stream orientation to its expectation
of in-order packet delivery, is wrong for the datacenter. It
is time to recognize that TCP’s problems are too fundamen-
tal and interrelated to be fixed; the only way to harness the
full performance potential of modern networks is to introduce
a new transport protocol into the datacenter. Homa demon-
strates that it is possible to create a transport protocol that
avoids all of TCP’s problems. Although Homa is not API-
compatible with TCP, it should be possible to bring it into
widespread usage by integrating it with RPC frameworks.
1 Introduction
The TCP transport protocol [9] has proven to be phenome-
nally successful and adaptable. At the time of TCP’s design
in the late 1970’s, there were only about 100 hosts attached
to the existing ARPANET, and network links had speeds of
tens of kilobits/second. Over the decades since then, the In-
ternet has grown to billions of hosts and link speeds of 100
Gbit/second or more are commonplace, yet TCP continues to
serve as the workhorse transport protocol for almost all ap-
plications. It is an extraordinary engineering achievement to
have designed a mechanism that could survive such radical
changes in underlying technology.
However, datacenter computing creates unprecedented
challenges for TCP. The datacenter environment, with mil-
lions of cores in close proximity and individual applications
harnessing thousands of machines that interact on microsec-
ond timescales, could not have been envisioned by the de-
signers of TCP, and TCP does not perform well in this envi-
ronment. TCP is still the protocol of choice for most datacen-
ter applications, but it introduces overheads on many levels,
which limit application-level performance. For example, it is
well-known that TCP suffers from high tail latency for short
messages under mixed workloads [2]. TCP is a major contrib-
utor to the “datacenter tax” [3, 12], a collection of low-level
overheads that consume a significant fraction of all processor
cycles in datacenters.
This position paper argues that TCP’s challenges in the dat-
acenter are insurmountable. Section 3 discusses each of the
major design decisions in TCP and demonstrates that every
one of them is wrong for the datacenter, with significant neg-
ative consequences. Some of these problems have been dis-
cussed in the past, but it is instructive to see them all together
in one place. TCP’s problems impact systems at multiple lev-
els, including the network, kernel software, and applications.
One example is load balancing, which is essential in datacen-
ters in order to process high loads concurrently. Load bal-
ancing did not exist at the time TCP was designed, and TCP
interferes with load balancing both in the network and in soft-
ware.
Section 4 argues that TCP cannot be fixed in an evolution-
ary fashion; there are too many problems and too many in-
terlocking design decisions. Instead, we must find a way to
introduce a radically different transport protocol into the dat-
acenter. Section 5 discusses what a good transport protocol
for datacenters should look like, using Homa [19, 21] as an
example. Homa was designed in a clean-slate fashion to meet
the needs of datacenter computing, and virtually every one of
its major design decisions was made differently than for TCP.
As a result, some problems, such as congestion in the network
core fabric, are eliminated entirely. Other problems, such as
congestion control and load balancing, become much easier
to address. Homa demonstrates that it is possible to solve all
of TCP’s problems.
Complete replacement of TCP is unlikely anytime soon,
due to its deeply entrenched status, but TCP can be displaced
for many applications by integrating Homa into a small num-
ber of existing RPC frameworks such as gRPC [6]. With
this approach, Homa’s incompatible API will be visible only
to framework developers and applications should be able to
switch to Homa relatively easily.
2 Requirements
Before discussing the problems with TCP, let us first review
the challenges that must be addressed by any transport proto-
col for datacenters.
Reliable delivery. The protocol must deliver data reliably
from one host to another, in spite of transient failures in the
1
arXiv:2210.00714v2 [cs.NI] 19 Jan 2023
network.
Low latency. Modern networking hardware enables round-
trip times of a few microseconds for short messages. The
transport protocol must not add significantly to this latency,
so that applications experience latencies close to the hardware
limit. The transport protocol must also support low latency at
the tail, even under relatively high network loads with a mix
of traffic. Tail latency is particularly challenging for trans-
port protocols; nonetheless, it should be possible to achieve
tail latencies for short messages within a factor of 2–3x of the
best-case latency [19].
High throughput. The transport protocol must support high
throughput in two different ways. Traditionally, the term
“throughput” has referred to data throughput: delivering large
amounts of data in a single message or stream. This kind of
throughput is still important. In addition, datacenter appli-
cations require high message throughput: the ability to send
large numbers of small messages quickly for communication
patterns such as broadcast and shuffle [15]. Message through-
put has historically not received much attention, but it is es-
sential in datacenters.
In order to meet the above requirements, the transport pro-
tocol must also deal with the following problems:
Congestion control. In order to provide low latency, the
transport protocol must limit the buildup of packets in net-
work queues. Packet queuing can potentially occur both at
the edge (the links connecting hosts to top-of-rack switches)
and in the network core; each of these forms of congestion
creates distinct problems.
Efficient load balancing across server cores. For more than
a decade, network speeds have been increasing rapidly while
processor clock rates have remained nearly constant. Thus it
is no longer possible for a single core to keep up with a single
network link; both incoming and outgoing load must be dis-
tributed across multiple cores. This is true at multiple levels.
At the application level, high-throughput services must run
on many cores and divide their work among the cores. At the
transport layer, a single core cannot keep up with a high speed
link, especially with short messages. Load balancing impacts
transport protocols in two ways. First, it can introduce over-
heads (e.g. the use of multiple cores causes additional cache
misses for coherence). Second, load balancing can lead to hot
spots, where load is unevenly distributed across cores; this is a
form of congestion at the software level. Load balancing over-
heads are now one of the primary sources of tail latency [21],
and they are impacted by the design of the transport protocol.
NIC offload. There is increasing evidence that software-
based transport protocols no longer make sense; they simply
cannot provide high performance at an acceptable cost. For
example:
The best software protocol implementations have end-
to-end latency more than 3x as high as implementations
where applications communicate directly with the NIC
via kernel bypass.
• Software implementations give up a factor of 5–10x
in small message throughput, compared with NIC-
offloaded implementations.
Driving a 100 Gbps network at 80% utilization in both
directions consumes 10–20 cores just in the networking
stack [16, 21]. This is not a cost-effective use of re-
sources.
Thus, in the future, transport protocols will need to move
to special-purpose NIC hardware. The transport protocol
must not have features that preclude hardware implementa-
tion. Note that NIC-based transports will not eliminate soft-
ware load balancing as an issue: even if the transport is in
hardware, application software will still be spread across mul-
tiple cores.
3 Everything about TCP is wrong
This section discusses five key properties of TCP, which cover
almost all of its design:
Stream orientation
Connection orientation
Bandwidth sharing (“fair” scheduling)
Sender-driven congestion control
In-order packet delivery
Each of these properties represents the wrong decision for a
datacenter transport, and each of these decisions has serious
negative consequences.
3.1 Stream orientation
The data model for TCP is a stream of bytes. However, this is
not the right data model for most datacenter applications. Dat-
acenter applications typically exchange discrete messages to
implement remote procedure calls. When messages are serial-
ized in a TCP stream, TCP has no knowledge about message
boundaries. This means that when an application reads from
a stream, there is no guarantee that it will receive a complete
message; it could receive less than a full message, or parts of
several messages. TCP-based applications must mark mes-
sage boundaries when they serialize messages (e.g., by pre-
fixing each message with its length), and they must use this
information to reassemble messages on receipt. This intro-
duces extra complexity and overheads, such as maintaining
state for partially-received messages.
The streaming model is disastrous for software load balanc-
ing. Consider an application that uses a collection of threads
to serve requests arriving across a collection of streams. Ide-
ally, all of the threads would wait for incoming messages
on any of the streams, with messages distributed across the
threads. However, with a byte stream model there is no guar-
antee that a read operation returns an entire message. If mul-
tiple threads both read from a stream, it is possible that parts
of a single message might be received by different threads. In
principle it might be possible for the threads to coordinate and
reassemble the entire message in one of the threads, but this
is too expensive to be practical.
Instead, TCP applications must use one of two inferior
forms of load balancing, in which each stream is owned by a
2
摘要:

It'sTimetoReplaceTCPintheDatacenterJohnOusterhoutStanfordUniversityJanuary18,2023ThispositionpaperhasbeenupdatedsinceitsoriginalpublicationinOctoberof2022inordertocorrecterrorsandaddclarication.Updatesareinitalics;noneoftheoriginaltexthasbeenmodied.Thepaperhastriggereddiscussionanddissent;forpoint...

展开>> 收起<<
Its Time to Replace TCP in the Datacenter John Ousterhout Stanford University.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:172.12KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注