Kadabra Adapting Kademlia for the Decentralized Web Yunqi Zhang and Shaileshh Bojja Venkatakrishnan

2025-05-06 0 0 5.59MB 27 页 10玖币

侵权投诉

Kadabra: Adapting Kademlia for the

Decentralized Web

Yunqi Zhang and Shaileshh Bojja Venkatakrishnan

The Ohio State University

{zhang.8678, bojjavenkatakrishnan.2}@osu.edu

Abstract. Blockchains have become the catalyst for a growing move-

ment to create a more decentralized Internet. A fundamental operation

of applications in a decentralized Internet is data storage and retrieval.

As today’s blockchains are limited in their storage functionalities, in re-

cent years a number of peer-to-peer data storage networks have emerged

based on the Kademlia distributed hash table protocol. However, existing

Kademlia implementations are not eﬃcient enough to support fast data

storage and retrieval operations necessary for (decentralized) Web appli-

cations. In this paper, we present Kadabra, a decentralized protocol for

computing the routing table entries in Kademlia to accelerate lookups.

Kadabra is motivated by the multi-armed bandit problem, and can auto-

matically adapt to heterogeneity and dynamism in the network. Exper-

imental results show Kadabra achieving between 15–50% lower lookup

latencies compared to state-of-the-art baselines.

Keywords: Multi-armed bandit ·Decentralized protocol ·Kademlia

p2p routing.

1 Introduction

Decentralized peer-to-peer applications (dapps) fueled by successes in blockchain

technology are rapidly emerging as secure, transparent and open alternatives to

conventional centralized applications. Today dapps have been developed for a

wide gamut of application areas spanning payments, decentralized ﬁnance, so-

cial networking, healthcare, gaming etc., and have millions of users and generate

billions on dollars in trade [5]. These developments are part of a growing move-

ment to create a more “decentralized Web”, in which no single administrative

entity (e.g., a corporation or government) has complete control over important

web functionalities (e.g., name resolution, content hosting, etc.) thereby provid-

ing greater power to application end users [46,11].

A fundamental operation in dapps is secure, reliable data storage and re-

trieval. Over the past two decades, the cloud (e.g., Google, Facebook, Amazon)

together with content delivery networks (CDNs; e.g., Akamai, CloudFlare) have

been largely responsible for storing and serving data for Internet applications.

Infrastructure in the cloud or a CDN is typically owned by a single provider,

making these storage methods unsuitable for dapps. Instead dapps—especially

arXiv:2210.12858v2 [cs.NI] 14 Feb 2023

2 Yunqi Zhang and Shaileshh Bojja Venkatakrishnan

those built over a blockchain (e.g., ERC 721 tokens in Ethereum)—directly re-

sort to using the blockchain for storing application data. However, mainstream

blockchains are notorious for their poor scalability which limits the range of ap-

plications that can be deployed on them. In particular, realizing a decentralized

Web that supports sub-second HTTP lookups at scale is infeasible with today’s

blockchain technology.

To ﬁll this void, a number of recent eﬀorts have designed decentralized peer-

to-peer (p2p) data storage networks—such as IPFS [15], Swarm [45,47,7], Hy-

percore protocol [3], Safe network [6] and Storj [43]—which are seeing rapid

mainstream adoption. E.g., the IPFS network has more than 3 million client re-

quests per week with hundreds of thousands of storage nodes worldwide as part

of the network [46]. In these networks, each unique piece of data is stored over

a vast network of servers (nodes) with each server responsible for storing only a

small portion of the overall stored data unlike blockchains. The networks are also

characterized by their permissionless and open nature, wherein any individual

server may join and participate in the network freely. By providing appropriate

monetary incentives (e.g., persistent storage in IPFS can be incentivized using

Filecoin [8,2]) for storing and serving data, the networks encourage new servers

to join which in turn increases the net storage capacities of these systems.

A key challenge in the p2p storage networks outlined above is how to eﬃ-

ciently locate where a desired piece of data is stored in the network. Unlike cloud

storage, there is no central database that maintains information on the set of

ﬁles hosted by each server at any moment. Instead, p2p storage networks rely

on a distributed hash table (DHT) protocol for storage and retrieval by content

addressing data. While tens of DHT constructions have been proposed in the

past, in recent years the Kademlia DHT [33] has emerged as the de facto proto-

col and has been widely adopted by practitioners. For instance, IPFS, Swarm,

Hypercore protocol, Safe network and Storj are all based on Kademlia. To push

or pull a data block from the network, the hash of the data block (i.e., its con-

tent address) is used to either recursively or iteratively route a query through

the DHT nodes until a node responsible for storing the data block is found.

For latency-sensitive content lookup applications, such as the Web where a

delay of even a few milliseconds in downloading webpage objects can lead to

users abandoning the website [9], it is imperative that the latency of routing a

query through Kademlia is as low as possible. Each Kademlia node maintains

a routing table, which contains IP address references to other Kademlia nodes

in the network. The sequence of nodes queried while performing a lookup is

dictated by the choice of routing tables at the nodes. Today’s Kademlia imple-

mentations choose the routing tables completely agnostic of where the nodes are

located in the network. As a result, a query in Kademlia may take a route that

criss-crosses continents before arriving at a target node costing signiﬁcant delay.

Moreover, the open and permissionless aspects makes the network inherently

heterogeneous: nodes can diﬀer considerably in their compute, memory and net-

work capabilities which creates diﬀerences in how fast nodes respond to queries;

data blocks published over the network vary in their popularity, with demand

Kadabra: Adapting Kademlia for the Decentralized Web 3

for some data far exceeding others; the network is also highly dynamic due to

peer churn and potentially evolving user demand for data (e.g., a news webpage

that is popular today may not be popular tomorrow). Designing routing tables

in Kademlia that are tuned to the various heterogeneities and dynamism in the

network to minimize content lookup delays is therefore a highly nontrivial task.

Prior works have extensively investigated how to design location-aware rout-

ing tables in Kademlia. For example, the proximity neighbor selection (PNS) [17]

advocates choosing routing table peers that are geographically close to a node

(more precisely, peers having a low round-trip-time (RTT) ping delay to the

node), and proximity routing (PR) [17] favors relaying a query to a matching

peer with the lowest RTT in the routing table. While these location-aware vari-

ants have been shown to exhibit latency performance strictly superior to the

original Kademlia protocol [33], they are not adaptive to the heterogeneities in

the network. PNS is also prone to Sybil attacks which diminishes its practical

utility [36]—an adversary controlling a large number of fake Kademlia nodes at

a location can cause a nearby node’s routing table to be completely ﬁlled with

adversarial IP addresses. Real world Kademlia implementations in libp2p [4],

IPFS and other ﬁle sharing networks therefore have resorted to maintaining the

peer routing tables largely per the original Kademlia protocol. S/Kademlia [14]

is a particularly popular implementation which uses public-key cryptography for

authentication and proof-of-work puzzles to avoid Sybil attacks.

We propose Kadabra, a decentralized, adaptive algorithm for selecting rout-

ing table entries in Kademlia to minimize object lookup times (to push or get

content) while being robust against Sybil attacks. Kadabra is motivated by the

(combinatorial) multi-armed bandit (MAB) problem [40,16], with each Kadem-

lia node acting as an independent MAB player and the node’s routing table

conﬁgurations being the arms of the bandit problem. By balancing exploring

new routing table conﬁgurations with exploiting known conﬁgurations that have

resulted in fast lookup speeds in the past, a node is able to adaptively discover

an eﬃcient routing table that provides fast lookups. Importantly, the discovered

routing table conﬁguration at a node is optimized precisely to the pattern of

lookups speciﬁc to the node. Our proposed algorithm is fully decentralized, rely-

ing only on local timestamp measurements for feedback at each node (time be-

tween when a query was sent and its corresponding response received) and does

not require any cooperation between nodes. To protect against Sybil attacks,

Kadabra relies on a novel exploration strategy that explicitly avoids including

nodes that have a low RTT to a node within the node’s routing table with the

RTT threshold speciﬁed as a security parameter. At the same time, Kadabra’s

exploration strategy also avoids selecting nodes very far from a node. To accel-

erate discovery of an eﬃcient routing table conﬁguration, Kadabra decomposes

the problem into parallel independent MAB instances at each node, with each

instance responsible for optimizing peer entries of a single k-bucket. In summary,

the contributions of this paper are:

1. We consider the problem of eﬃcient routing table design in Kademlia and

formulate it as an instance of the multi-armed bandit problem. Using data-

4 Yunqi Zhang and Shaileshh Bojja Venkatakrishnan

Node ID 1101

-bucket 1

0001

0110

0011

-bucket 2

1001

1011

1010

-bucket 3

1110

1111

-bucket 4

1100

(a)

Node 1101 Routing Table

-bucket 1

0001,

0110

, 0011

-bucket 2

1001, 1011, 1010

-bucket 3

1110, 1111

-bucket 4

1100

Node 0110 Routing Table

-bucket 1

1111, 1011, 1100

-bucket 2

0001, 0011, 0010

-bucket 3

0101

, 0100

-bucket 4

0111

Node 0101 Routing Table

-bucket 1

1101, 1011, 1001

-bucket 2

0000, 0010, 0001

-bucket 3

0110, 0111

-bucket 4

0100

1101 0110 0101

FINDNODE(0101)

(b)

Fig. 1: (a) Example of k-buckets at a node in a network with 4-bit node IDs.

(b) Example of a recursive routing path taken to lookup key 0101. A yellow-

highlighted node ID is a peer to which the query is forwarded for the next hop.

driven techniques for optimizing lookup speeds in structured p2p networks

has not been proposed before, to our best knowledge.

2. We propose Kadabra, a fully decentralized and non-cooperative algorithm for

learning the routing table entries to accelerate lookups. Kadabra is adaptive

to both the traﬃc demand patterns of the users and the heterogeneities in

the network.

3. We validate Kadabra through simulations under various network and traf-

ﬁc settings. In each case, we observe Kadabra to consistently outperform

baselines by between 15–50% in latency.

2 Background

2.1 Kademlia

Overview. Kademlia is arguably the most popular protocol for realizing a struc-

tured p2p system on the Internet today. In a Kademlia network, each node is

assigned a unique binary node ID from a high-dimensional space (e.g., 20 byte

node IDs are common). When the network size is large, it is diﬃcult for a node

to know the node ID of every single node in the network. A node may have

knowledge of node IDs of only a small number (such as logarithmic in network

size) of other nodes. The most basic operation supported by Kademlia is key-

based routing (KBR) wherein given a key from the node ID space as input to

a node, the protocol determines a routing path from the node to a target node

whose ID is the ‘closest’ to the input key. Closeness between a key and a node

ID in Kademlia is measured by taking the bitwise-XOR between the two binary

strings, and converting the resultant string as a base-10 integer. The basic KBR

primitive can be used to realize higher-order functions such as a distributed hash

table (DHT). In a DHT, a (key, value) store is distributed across nodes in the

network. A (key, value) pair is stored at a node whose node ID is the closest to

the key according to the XOR distance metric. To protect against node failures,

in practice a copy of the (key, value) is also stored at a small number (e.g., 20)

of sibling nodes that are nodes whose IDs are closest to the initial storing node.

Kadabra: Adapting Kademlia for the Decentralized Web 5

To store a (key, value) in the network a node invokes a Store(key, value) re-

mote procedure call (RPC), and to fetch a value corresponding to a key the node

calls a FindValue(key) RPC [33,14]. KBR is implemented as a FindNode(key)

RPC, which returns the Kademlia node having the closest ID to key.

Routing. Each Kademlia node maintains a routing table containing node ID,

IP address and port information of other peers using which Store,FindValue

or FindNode queries are routed to their appropriate target nodes. For node IDs

that are nbits long, the routing table at each node comprises of n k-buckets,

where each k-bucket contains information about kpeers. The IDs of peers in the

i-th k-bucket of a node’s routing table share the ﬁrst i−1 bits with the node’s

ID, while diﬀering in the i-th bit (Fig. 1a). For a network with mnodes, it can

be shown that on average only the ﬁrst log(m)k-buckets can be ﬁlled with peer

entries while the remaining k-buckets are empty due to lack of peers satisfying

the preﬁx constraints.

Queries in Kademlia are routed either recursively or iteratively across nodes.

In a recursive lookup, a query is relayed sequentially from one node to the next

until a target node is found. The response from the target node is then relayed

back on the reverse path to the query initiator. In an iterative lookup, a query

initiating node itself sequentially contacts nodes until a target node is found,

and receives a response directly from the target node. We focus primarily on

recursive routing in this work (Fig. 1b).

When a query for key xis received at a node v, the node searches for a peer

v0within its routing table with an ID that is closest to x. If the distance between

the IDs xand v0is less than the distance between xand v, then vforwards the

query to node v0. Later when vreceives a response to the query from v0,vrelays

the response back to the node from whom it received the query. If the distance

between xand vis less than than distance between xand v0, the node vissues an

appropriate response for the query to the node from whom vreceived the query.

To avoid lookup failures, a query initiator issues its query along α(e.g., α= 3)

independent paths. This basic lookup process described above is fundamental to

implementing the Store,FindValue and FindNode functions. We point the

reader to prior papers [33,14] for more details on the lookup process.

2.2 Lookup Latency and Node Geography

A Kademlia node may include any peer it has knowledge of within its k-buckets,

provided the peer satisﬁes the required ID preﬁx conditions for the k-bucket.

Nodes get to know of new peers over the course of receiving queries and responses

from other nodes in the network. As node IDs are assigned to nodes typically

in a way that is completely independent of where the nodes are located in the

world, in today’s Kademlia it is likely that the peers within a k-bucket belong to

diverse geographical regions around the world without any useful structure. E.g.,

a recent study [46] measuring performance on the IPFS network reports a 90-

th percentile content storing latency of 112s with 88% of it attributed to DHT

routing latency. For retrieving content, the reported 90-th percentile delay is

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Kadabra:AdaptingKademliafortheDecentralizedWebYunqiZhangandShaileshhBojjaVenkatakrishnanTheOhioStateUniversityfzhang.8678,bojjavenkatakrishnan.2g@osu.eduAbstract.Blockchainshavebecomethecatalystforagrowingmove-menttocreateamoredecentralizedInternet.Afundamentaloperationofapplicationsinadecentralized...

展开>> 收起<<

Kadabra Adapting Kademlia for the Decentralized Web Yunqi Zhang and Shaileshh Bojja Venkatakrishnan.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Kadabra Adapting Kademlia for the Decentralized Web Yunqi Zhang and Shaileshh Bojja Venkatakrishnan

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: