2 Background
2.1 Named Data Networking
Currently, networking for layer 3 of the OSI model [13] uses the Internet Protocol (IP) that implements the
Host-based networking (TCP/IP) paradigm in which data is transferred from one host to another using a
location-based address (IP address). In contrast to IP, Named Data Networking (NDN) [32] is one of the
most developed and implemented designs using the Information-Centric Networking (ICN) paradigm. ICN
fully removes the location requirement that was introduced with the TCP/IP paradigm and shifts the focus
from location to data.
Transferring data via NDN from one host to another is accomplished via two types of packets: Interest
packets and Data packets. Receiving data is as simple as expressing (sending) an Interest packet and receiving
a matching Data packet. The applications decide which name(s) they are going to request. Once an Interest
is expressed, NDN utilizes name-based forwarding to forward the packet toward the data source. Data is
also signed and can be verified by the recipient; therefore, it can come from anywhere: a publisher, a proxy,
or an in-network cache.
By using NDN, consumers and producers can utilize several benefits, such as data availability after
server failures, a significant decrease in server traffic, and faster data retrieval. Instead of securing a con-
nection, NDN secures the data, removing most connection-oriented attacks, including a Man-In-The-Middle
attack. In addition, serving and replicating data across nodes is built into NDN. In NDN, the names are
hierarchical, similar to the HTTP Uniform Resource Locator (URL). However, NDN names are Uniform
Resource Identifiers (URIs) - unlike URLs, they point to a piece of content and not the location of the
content. Hierarchical names provide the ability to reduce in-network state as well as make discovery easier.
All these properties make NDN an excellent mechanism for constructing a named genomics Content Delivery
Network (CDN) – the Genomics Data Lake.
2.1.1 Forwarders
In order to utilize the NDN architecture, a forwarder must be present in order to route and fulfil NDN interests
properly. NDN-DPDK [31] and NDN Forwarding Daemon (NFD) [26] are network forwarders for NDN that
support Interest and Data forwarding as well as content caching in the network. This is accomplished
by abstracting lower-level network transport technologies into NDN Faces, maintaining fundamental data
structures such as CS, PIT, and FIB, and implementing packet processing logic[25].
NDN-DPDK [31] is a high-speed NDN forwarder developed with the Data Plane Development Kit
(DPDK)[3]. DPDK includes data plane libraries and polling-mode network interface controller drivers for
offloading TCP packet processing from the operating system kernel to user-space programs. This offloading
enables better computational efficiency and packet throughput than is attainable with the kernel’s interrupt-
driven processing.
With NFD, high-speed forwarding is still a challenge due to variable-length named-based lookups as
well as packet state updates. In this project, we choose the NDN-DPDK forwarder due to its performance
advantages over NFD. While running on commodity hardware, the NDN-DPDK forwarder can reach a
forwarding speed of more than 100 Gbps [29]. This will be useful in transferring data between NDN data
lakes and cloud deployments (such as the Pacific Research Platform (PRP) Kubernetes cluster [30]).
2.2 Kubernetes
In recent years, container technology has been gaining increasing traction. A container is a unit of software
that bundles code and all dependencies required for the app to run. Containers also lend themselves well
to the architectural approach where an application is separated into multiple services that rely on each
other to perform the full desired function.[22]. Kubernetes [17] is an open-source container orchestration
framework that provides an ideal platform for automating containerized applications in different deployment
environments. Kubernetes typically deploys containers in environments with high bandwidth and low la-
tency network connections. The applications should spread across the service nodes with high availability,
2