Kadabra Adapting Kademlia for the Decentralized Web Yunqi Zhang and Shaileshh Bojja Venkatakrishnan

2025-05-06 0 0 5.59MB 27 页 10玖币
侵权投诉
Kadabra: Adapting Kademlia for the
Decentralized Web
Yunqi Zhang and Shaileshh Bojja Venkatakrishnan
The Ohio State University
{zhang.8678, bojjavenkatakrishnan.2}@osu.edu
Abstract. Blockchains have become the catalyst for a growing move-
ment to create a more decentralized Internet. A fundamental operation
of applications in a decentralized Internet is data storage and retrieval.
As today’s blockchains are limited in their storage functionalities, in re-
cent years a number of peer-to-peer data storage networks have emerged
based on the Kademlia distributed hash table protocol. However, existing
Kademlia implementations are not efficient enough to support fast data
storage and retrieval operations necessary for (decentralized) Web appli-
cations. In this paper, we present Kadabra, a decentralized protocol for
computing the routing table entries in Kademlia to accelerate lookups.
Kadabra is motivated by the multi-armed bandit problem, and can auto-
matically adapt to heterogeneity and dynamism in the network. Exper-
imental results show Kadabra achieving between 15–50% lower lookup
latencies compared to state-of-the-art baselines.
Keywords: Multi-armed bandit ·Decentralized protocol ·Kademlia
p2p routing.
1 Introduction
Decentralized peer-to-peer applications (dapps) fueled by successes in blockchain
technology are rapidly emerging as secure, transparent and open alternatives to
conventional centralized applications. Today dapps have been developed for a
wide gamut of application areas spanning payments, decentralized finance, so-
cial networking, healthcare, gaming etc., and have millions of users and generate
billions on dollars in trade [5]. These developments are part of a growing move-
ment to create a more “decentralized Web”, in which no single administrative
entity (e.g., a corporation or government) has complete control over important
web functionalities (e.g., name resolution, content hosting, etc.) thereby provid-
ing greater power to application end users [46,11].
A fundamental operation in dapps is secure, reliable data storage and re-
trieval. Over the past two decades, the cloud (e.g., Google, Facebook, Amazon)
together with content delivery networks (CDNs; e.g., Akamai, CloudFlare) have
been largely responsible for storing and serving data for Internet applications.
Infrastructure in the cloud or a CDN is typically owned by a single provider,
making these storage methods unsuitable for dapps. Instead dapps—especially
arXiv:2210.12858v2 [cs.NI] 14 Feb 2023
2 Yunqi Zhang and Shaileshh Bojja Venkatakrishnan
those built over a blockchain (e.g., ERC 721 tokens in Ethereum)—directly re-
sort to using the blockchain for storing application data. However, mainstream
blockchains are notorious for their poor scalability which limits the range of ap-
plications that can be deployed on them. In particular, realizing a decentralized
Web that supports sub-second HTTP lookups at scale is infeasible with today’s
blockchain technology.
To fill this void, a number of recent efforts have designed decentralized peer-
to-peer (p2p) data storage networks—such as IPFS [15], Swarm [45,47,7], Hy-
percore protocol [3], Safe network [6] and Storj [43]—which are seeing rapid
mainstream adoption. E.g., the IPFS network has more than 3 million client re-
quests per week with hundreds of thousands of storage nodes worldwide as part
of the network [46]. In these networks, each unique piece of data is stored over
a vast network of servers (nodes) with each server responsible for storing only a
small portion of the overall stored data unlike blockchains. The networks are also
characterized by their permissionless and open nature, wherein any individual
server may join and participate in the network freely. By providing appropriate
monetary incentives (e.g., persistent storage in IPFS can be incentivized using
Filecoin [8,2]) for storing and serving data, the networks encourage new servers
to join which in turn increases the net storage capacities of these systems.
A key challenge in the p2p storage networks outlined above is how to effi-
ciently locate where a desired piece of data is stored in the network. Unlike cloud
storage, there is no central database that maintains information on the set of
files hosted by each server at any moment. Instead, p2p storage networks rely
on a distributed hash table (DHT) protocol for storage and retrieval by content
addressing data. While tens of DHT constructions have been proposed in the
past, in recent years the Kademlia DHT [33] has emerged as the de facto proto-
col and has been widely adopted by practitioners. For instance, IPFS, Swarm,
Hypercore protocol, Safe network and Storj are all based on Kademlia. To push
or pull a data block from the network, the hash of the data block (i.e., its con-
tent address) is used to either recursively or iteratively route a query through
the DHT nodes until a node responsible for storing the data block is found.
For latency-sensitive content lookup applications, such as the Web where a
delay of even a few milliseconds in downloading webpage objects can lead to
users abandoning the website [9], it is imperative that the latency of routing a
query through Kademlia is as low as possible. Each Kademlia node maintains
a routing table, which contains IP address references to other Kademlia nodes
in the network. The sequence of nodes queried while performing a lookup is
dictated by the choice of routing tables at the nodes. Today’s Kademlia imple-
mentations choose the routing tables completely agnostic of where the nodes are
located in the network. As a result, a query in Kademlia may take a route that
criss-crosses continents before arriving at a target node costing significant delay.
Moreover, the open and permissionless aspects makes the network inherently
heterogeneous: nodes can differ considerably in their compute, memory and net-
work capabilities which creates differences in how fast nodes respond to queries;
data blocks published over the network vary in their popularity, with demand
Kadabra: Adapting Kademlia for the Decentralized Web 3
for some data far exceeding others; the network is also highly dynamic due to
peer churn and potentially evolving user demand for data (e.g., a news webpage
that is popular today may not be popular tomorrow). Designing routing tables
in Kademlia that are tuned to the various heterogeneities and dynamism in the
network to minimize content lookup delays is therefore a highly nontrivial task.
Prior works have extensively investigated how to design location-aware rout-
ing tables in Kademlia. For example, the proximity neighbor selection (PNS) [17]
advocates choosing routing table peers that are geographically close to a node
(more precisely, peers having a low round-trip-time (RTT) ping delay to the
node), and proximity routing (PR) [17] favors relaying a query to a matching
peer with the lowest RTT in the routing table. While these location-aware vari-
ants have been shown to exhibit latency performance strictly superior to the
original Kademlia protocol [33], they are not adaptive to the heterogeneities in
the network. PNS is also prone to Sybil attacks which diminishes its practical
utility [36]—an adversary controlling a large number of fake Kademlia nodes at
a location can cause a nearby node’s routing table to be completely filled with
adversarial IP addresses. Real world Kademlia implementations in libp2p [4],
IPFS and other file sharing networks therefore have resorted to maintaining the
peer routing tables largely per the original Kademlia protocol. S/Kademlia [14]
is a particularly popular implementation which uses public-key cryptography for
authentication and proof-of-work puzzles to avoid Sybil attacks.
We propose Kadabra, a decentralized, adaptive algorithm for selecting rout-
ing table entries in Kademlia to minimize object lookup times (to push or get
content) while being robust against Sybil attacks. Kadabra is motivated by the
(combinatorial) multi-armed bandit (MAB) problem [40,16], with each Kadem-
lia node acting as an independent MAB player and the node’s routing table
configurations being the arms of the bandit problem. By balancing exploring
new routing table configurations with exploiting known configurations that have
resulted in fast lookup speeds in the past, a node is able to adaptively discover
an efficient routing table that provides fast lookups. Importantly, the discovered
routing table configuration at a node is optimized precisely to the pattern of
lookups specific to the node. Our proposed algorithm is fully decentralized, rely-
ing only on local timestamp measurements for feedback at each node (time be-
tween when a query was sent and its corresponding response received) and does
not require any cooperation between nodes. To protect against Sybil attacks,
Kadabra relies on a novel exploration strategy that explicitly avoids including
nodes that have a low RTT to a node within the node’s routing table with the
RTT threshold specified as a security parameter. At the same time, Kadabra’s
exploration strategy also avoids selecting nodes very far from a node. To accel-
erate discovery of an efficient routing table configuration, Kadabra decomposes
the problem into parallel independent MAB instances at each node, with each
instance responsible for optimizing peer entries of a single k-bucket. In summary,
the contributions of this paper are:
1. We consider the problem of efficient routing table design in Kademlia and
formulate it as an instance of the multi-armed bandit problem. Using data-
4 Yunqi Zhang and Shaileshh Bojja Venkatakrishnan
Node ID 1101
k
-bucket 1
0001
0110
0011
k
-bucket 2
1001
1011
1010
k
-bucket 3
1110
1111
k
-bucket 4
1100
(a)
Node 1101 Routing Table
k
-bucket 1
0001,
0110
, 0011
k
-bucket 2
1001, 1011, 1010
k
-bucket 3
1110, 1111
k
-bucket 4
1100
Node 0110 Routing Table
k
-bucket 1
1111, 1011, 1100
k
-bucket 2
0001, 0011, 0010
k
-bucket 3
0101
, 0100
k
-bucket 4
0111
Node 0101 Routing Table
-bucket 1
1101, 1011, 1001
-bucket 2
0000, 0010, 0001
-bucket 3
0110, 0111
-bucket 4
0100
1101 0110 0101
FINDNODE(0101)
(b)
Fig. 1: (a) Example of k-buckets at a node in a network with 4-bit node IDs.
(b) Example of a recursive routing path taken to lookup key 0101. A yellow-
highlighted node ID is a peer to which the query is forwarded for the next hop.
driven techniques for optimizing lookup speeds in structured p2p networks
has not been proposed before, to our best knowledge.
2. We propose Kadabra, a fully decentralized and non-cooperative algorithm for
learning the routing table entries to accelerate lookups. Kadabra is adaptive
to both the traffic demand patterns of the users and the heterogeneities in
the network.
3. We validate Kadabra through simulations under various network and traf-
fic settings. In each case, we observe Kadabra to consistently outperform
baselines by between 15–50% in latency.
2 Background
2.1 Kademlia
Overview. Kademlia is arguably the most popular protocol for realizing a struc-
tured p2p system on the Internet today. In a Kademlia network, each node is
assigned a unique binary node ID from a high-dimensional space (e.g., 20 byte
node IDs are common). When the network size is large, it is difficult for a node
to know the node ID of every single node in the network. A node may have
knowledge of node IDs of only a small number (such as logarithmic in network
size) of other nodes. The most basic operation supported by Kademlia is key-
based routing (KBR) wherein given a key from the node ID space as input to
a node, the protocol determines a routing path from the node to a target node
whose ID is the ‘closest’ to the input key. Closeness between a key and a node
ID in Kademlia is measured by taking the bitwise-XOR between the two binary
strings, and converting the resultant string as a base-10 integer. The basic KBR
primitive can be used to realize higher-order functions such as a distributed hash
table (DHT). In a DHT, a (key, value) store is distributed across nodes in the
network. A (key, value) pair is stored at a node whose node ID is the closest to
the key according to the XOR distance metric. To protect against node failures,
in practice a copy of the (key, value) is also stored at a small number (e.g., 20)
of sibling nodes that are nodes whose IDs are closest to the initial storing node.
Kadabra: Adapting Kademlia for the Decentralized Web 5
To store a (key, value) in the network a node invokes a Store(key, value) re-
mote procedure call (RPC), and to fetch a value corresponding to a key the node
calls a FindValue(key) RPC [33,14]. KBR is implemented as a FindNode(key)
RPC, which returns the Kademlia node having the closest ID to key.
Routing. Each Kademlia node maintains a routing table containing node ID,
IP address and port information of other peers using which Store,FindValue
or FindNode queries are routed to their appropriate target nodes. For node IDs
that are nbits long, the routing table at each node comprises of n k-buckets,
where each k-bucket contains information about kpeers. The IDs of peers in the
i-th k-bucket of a node’s routing table share the first i1 bits with the node’s
ID, while differing in the i-th bit (Fig. 1a). For a network with mnodes, it can
be shown that on average only the first log(m)k-buckets can be filled with peer
entries while the remaining k-buckets are empty due to lack of peers satisfying
the prefix constraints.
Queries in Kademlia are routed either recursively or iteratively across nodes.
In a recursive lookup, a query is relayed sequentially from one node to the next
until a target node is found. The response from the target node is then relayed
back on the reverse path to the query initiator. In an iterative lookup, a query
initiating node itself sequentially contacts nodes until a target node is found,
and receives a response directly from the target node. We focus primarily on
recursive routing in this work (Fig. 1b).
When a query for key xis received at a node v, the node searches for a peer
v0within its routing table with an ID that is closest to x. If the distance between
the IDs xand v0is less than the distance between xand v, then vforwards the
query to node v0. Later when vreceives a response to the query from v0,vrelays
the response back to the node from whom it received the query. If the distance
between xand vis less than than distance between xand v0, the node vissues an
appropriate response for the query to the node from whom vreceived the query.
To avoid lookup failures, a query initiator issues its query along α(e.g., α= 3)
independent paths. This basic lookup process described above is fundamental to
implementing the Store,FindValue and FindNode functions. We point the
reader to prior papers [33,14] for more details on the lookup process.
2.2 Lookup Latency and Node Geography
A Kademlia node may include any peer it has knowledge of within its k-buckets,
provided the peer satisfies the required ID prefix conditions for the k-bucket.
Nodes get to know of new peers over the course of receiving queries and responses
from other nodes in the network. As node IDs are assigned to nodes typically
in a way that is completely independent of where the nodes are located in the
world, in today’s Kademlia it is likely that the peers within a k-bucket belong to
diverse geographical regions around the world without any useful structure. E.g.,
a recent study [46] measuring performance on the IPFS network reports a 90-
th percentile content storing latency of 112s with 88% of it attributed to DHT
routing latency. For retrieving content, the reported 90-th percentile delay is
摘要:

Kadabra:AdaptingKademliafortheDecentralizedWebYunqiZhangandShaileshhBojjaVenkatakrishnanTheOhioStateUniversityfzhang.8678,bojjavenkatakrishnan.2g@osu.eduAbstract.Blockchainshavebecomethecatalystforagrowingmove-menttocreateamoredecentralizedInternet.Afundamentaloperationofapplicationsinadecentralized...

展开>> 收起<<
Kadabra Adapting Kademlia for the Decentralized Web Yunqi Zhang and Shaileshh Bojja Venkatakrishnan.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:27 页 大小:5.59MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注