Glowing in the dark_2

2025-04-29 1 0 3.64MB 17 页 10玖币

侵权投诉

Glowing in the Dark

Uncovering IPv6 Address Discovery and Scanning Strategies in the Wild

Hammas Bin Tanveer1, Rachee Singh2,3, Paul Pearce4, Rishab Nithyanand1

1University of Iowa, 2Microsoft, 3Cornell University, 4Georgia Tech

Abstract

In this work we identify scanning strategies of IPv6 scanners

on the Internet. We offer a unique perspective on the behav-

ior of IPv6 scanners by conducting controlled experiments

leveraging a large and unused

/56

IPv6 subnet. We selec-

tively make parts of the subnet visible to scanners by hosting

applications that make direct or indirect contact with IPv6-

capable servers on the Internet. By careful experiment design,

we mitigate the effects of hidden variables on scans sent to

our

/56

subnet and establish causal relationships between

IPv6 host activity types and the scanner attention they evoke.

We show that IPv6 host activities e.g., Web browsing, mem-

bership in the NTP pool and Tor network, cause scanners to

send a magnitude higher number of unsolicited IP scans and

reverse DNS queries to our subnet than before. DNS scanners

focus their scans in narrow regions of the address space where

our applications are hosted whereas IP scanners broadly scan

the entire subnet. Even after the host activity from our subnet

subsides, we observe persistent residual scanning to portions

of the address space that previously hosted applications.

1 Introduction

Scanning the IP address space has exposed security vulner-

abilities, enabling researchers and practitioners to develop

effective defenses. Tools for scanning the IP address space

grapple with the fundamental challenge of efﬁcient scanner

target discovery — a task made more challenging by the in-

creasing adoption of IPv6 on the Internet. The IPv6 address

space consists of

2128

possible addresses, rendering brute-

force generation of scanning targets infeasible. Recent work

has developed tools [9, 17] to make Internet-scale IPv6 scan-

ning practical by analyzing IPv6 address assignment patterns

[40, 53, 54] and developing efﬁcient scanner target generation

algorithms [40, 53, 54].

Despite the recent work on effectively scanning the IPv6

address space, little is known about the scanning strategies

deployed in the wild. We address this gap by analyzing IPv6

scanning from the perspective of IPv6 hosts on the Internet.

Our goal is to reveal target generation strategies of IPv6 scan-

ners and inform address assignment policies to mitigate the

impact of Internet-scale IPv6 scanners. Previous work with

similar goals performed observational studies using passive

measurements of unsolicited trafﬁc. While observational stud-

ies provide useful insights they (1) do not identify address

discovery strategies leveraged by scanners and (2) may not

be representative of the real scanning activity observed by

actively in-use IPv6 networks [26,34,36,46] (§2). In contrast,

we take an active approach by conducting controlled exper-

iments to evaluate the impact of IPv6 host activity on scan-

ner behavior. We begin the study by acquiring a previously-

unused

/56

IPv6 subnet owned by a university. This address

space did not originate any trafﬁc prior to the start of the study,

allowing us to conduct clean-slate controlled experiments that

make parts of the address space visible on the Internet for the

ﬁrst time during our study. A subgroup of treatment subnets

in our address space host applications that make direct or

indirect contact with potential IPv6 scanners. We measure the

effect of the treatment by capturing unsolicited IPv6 scanning

activity received by our address space. By comparing the im-

pact on scanning activity between the treatment and control

subnets, we establish causal relationships between types of

host activities and increased scanner attention (§3).

Accurately associating a measurable increase in scanner

attention to speciﬁc IPv6 host activities is challenging. First,

due to our large yet limited IPv6 address space for exper-

imentation, discerning the effect of one host activity (Web

browsing) from another (Tor relay) on increased scanner atten-

tion is hard as the observable scanning activity can result from

a combination of host activities. Second, scanning activity can

often persist after the experimental host activity has subsided,

confounding the measured effects of subsequent experiments.

Finally, Internet scanners can coincidentally scan our treat-

ment subnets, endangering false conclusions about the effect

of host activity on scanner attention. We carefully design our

controlled experiments to mitigate the effects of these hidden

variables to improve the accuracy of our conclusions (§4).

Our study spans over a year from the time we acquired the

/56

subnet and began the controlled experiments. We analyze

the IPv6 scans and reverse DNS queries destined for our

/56

address space to make the following key ﬁndings:

Host activity has a sizable impact on scanner attention.

While our subnet attracted moderate background radiation

scans before we began our controlled experiments to sim-

ulate host activity, the host activity evoked sizable scanner

attention — with IP and DNS scanning increasing to 225

and 1.6

pre-experiment rates, respectively. On conclusion

of our experiments, the rates continued to stay high at 426

and 3.7

, respectively. Moreover, host activity that makes

direct contact with IPv6 capable servers on the Internet (e.g.,

Web browsing, querying open DNS resolvers) evokes 50

arXiv:2210.02522v1 [cs.NI] 5 Oct 2022

and 13

more activity during and after experimentation, in

comparison with host activities which make indirect contact

with potential scanners (§3).

DNS scanners have a narrow attention space.

Reverse-

DNS scanners focus their attention on scanning the treatment

subnets i.e., subnets hosting the IPv6 application and send

very few scans to subnets outside the treatment subnet. In

contrast, IP scanners have a broader attention span, often

scanning both inside and outside the treatment subnets (§4).

Residual scanning after host activity subsides.

DNS scan-

ning activity persists long after the host activity that evoked

the scans has subsided. In speciﬁc, we observe high volume

DNS scans for nearly six weeks in the treatment subnets used

for browsing the Web after the Web browsing activity has

ended. The scans re-start after a break of 2-3 weeks (§4).

Random and low-byte scanning are dominant strategies.

We ﬁngerprint the IPv6 scanners that sent unsolicited trafﬁc

to our subnet during the study and ﬁnd that IPv6 scanners

have one of two main scanning strategies: they either have

equal interest in the entire address space (random scanners)

or focus more on low-byte addresses (§5).

Finally, we discuss the implications of our ﬁndings for

network operators (§6) and place our contributions in the

context of previous work (§7).

2 Background

In this section, we provide a high-level overview of IPv6

addressing and IPv6 scanning strategies.

IPv6 addressing.

An IPv6 address consists of

128

bits which

may be broken down into three parts — an

bit routing preﬁx,

bit subnet ID, and a

(= 128-

) bit interface ID (IID).

The routing preﬁx and subnet ID are used to route trafﬁc to a

local network where hosts are identiﬁed by unique IIDs [44].

Network operators face two questions while allocating IPv6

addresses to networks and hosts (1) how many bits of an IPv6

address should be allocated to host IIDs (i.e., the value of

)

and (2) how should IIDs be allocated to hosts in a subnet.

Determining IID lengths. It is considered the best practice

for network operators to leave 64 bits for IIDs according

the RFC4291 [44]. There are many compelling reasons for

this practice. First, a large number of existing IPv6 conﬁg-

uration options and RFC recommendations assume a 64-bit

IID. Therefore, not following this recommendation may result

in operational failure when using IPv6-speciﬁc features and

technologies. For example, as pointed out in RFC 4291, a

64-bit IID is required to use privacy-enhancing IPv6 features

which allow for cryptographically generated addresses (RFC

3972 [23]), or to use IPv4-to-IPv6 transition protocols such

as 6to4 (RFC 3056 [52]), or to leverage neighbor discovery

protocols implemented in accordance with RFC 4861 [51].

Assigning IIDs. IIDs may be allocated to hosts through (1)

manual conﬁguration, (2) stateless address auto conﬁgura-

tion (SLAAC deﬁned in RFC4862) [58], or (3) the DHCPv6

IP leasing protocol (deﬁned in RFC 8415 [49, 55]). Each of

these three methods may result in host addresses having dif-

ferent IID characteristics. For example, network operators

using manual conﬁgurations may assign host IPs in a pre-

dictable manner — e.g., using sequential addressing which

results in the lower bytes of the IID being populated with

all leading bytes set to 0 (as per RFC7707 [40]) or assign

host IP addresses based on their 48-bit MAC addresses (the

modiﬁed EUI-64 protocol deﬁned in RFC4291 [44]). On the

other hand, operators using SLAAC or other approaches (e.g.,

from RFC 7943 [41], RFC 7217 [39], and RFC 3972 [23])

generate pseudo-random IIDs.

Our address allocation approach. Given the importance of

a 64-bit IID length, it is reasonable to assume that scanners

assume 64-bit IIDs in their scanning targets. Therefore, in

all our subsequent experiments detailed in §3 and §4, we

assign 64-bit IIDs to all our hosts and subnets. Further, we

use both IID generation approaches (lower-byte and pseudo-

random addresses generation) to generate IPs for hosts whose

addresses are leaked to scanners. This allows us to study

the impact of address allocation methods on the subsequent

address generation strategies leveraged by IPv6 scanners —

e.g., does the discovery of a host with a pseudo-random (or,

lower-byte) IID result in scanners only sending probes to

other addresses with a pseudo-random (or, lower-byte) IIDs.

IPv6 address representation.

A common method for rep-

resenting IPv6 addresses uses 32 nybbles, each denoted in

hexadecimal and representing four contiguous bits. Every four

nybbles (two bytes) are separated by a ‘:’ and any leading

zeros in these two byte sections may be dropped.

Representation in DNS reverse zones. DNS reverse

zones map addresses to domains — i.e., the opposite

of what DNS zones do. Sending a DNS PTR query

to the corresponding IPv6 reverse zone returns the do-

mains hosted with the corresponding IP address. IPv6

PTR records are organized under

ip6.arpa

in 32 lev-

els where each nybble of an exploded IPv6 address is a

level. Therefore, the PTR record associated with the ad-

dress

2601::dead:1

is available at

1.0.0.0.d.e.a.d.<20

repeating .0s>.1.0.6.2.ip6.arpa.

Scanning strategies: address discovery.

Unlike IPv4, scan-

ning the entire IPv6 address space for live addresses is infea-

sible. Instead, scanners focus their attention on regions of the

address space near addresses where some activity has been

observed. Prior work has used techniques like monitoring

public lists of IPv6 addresses (e.g., TLD zone ﬁles which list

the IPv6 addresses of domains) or hosting public services

to receive contact from previously unseen addresses (e.g.,

hosting a web service).

Our address leaking strategies. In our controlled experiments

described in §3 and §4, we leak the ‘liveness’ of speciﬁc

regions of an IPv6 network by the following means: (1) direct

contact with IPv6 capable web services, (2) sending DNS

queries to public DNS resolvers, (3) participation in the NTP

pool protocol [11] where IP addresses of participants can be

enumerated [19], (4) participation in the NTP public server

protocol [18] where our addresses are distributed via a public

listing of NTP servers, (5) participation in the Tor network

as a middle-relay where our addresses are distributed via the

Tor consensus [13], and (6) registering domains with speciﬁc

addresses so their presence is known via the TLD zone ﬁles.

Scanning strategies: probing for liveness

Once scanners

have a speciﬁc region of the address space to focus on, they

may use a different strategy to probe for active hosts.

Traditional IP scanning. A common approach is to solicit re-

sponses from live hosts by sending probes (packets associated

with a common Internet protocol such as ICMPv6) to a set

of candidate IP addresses using tools such as zmapv6 [3] and

nmap [7]. These candidates are generated by observing pat-

terns in the already identiﬁed live addresses (e.g., known live

addresses are sequentially allocated) by off-the-shelf tools

like ipv666 [2] and the IPv6 toolkit [10].

NXDOMAIN scanning. An emerging and increasingly popu-

lar approach for scanning IPv6 spaces for liveness involves

leveraging a feature of the IPv6 reverse-DNS lookup pro-

cess detailed in RFC8020 [25]. RFC 8020 [25] states that

reverse-DNS lookups within subnets that contain no do-

mains should receive an

NXDOMAIN

response code — e.g.,

if the preﬁx

2601::dead:1/80

contains no domains, reverse-

DNS queries for any more-speciﬁc preﬁxes should return

NXDOMAIN

response code. This allows scanners to make

signiﬁcant reductions the address search space.

Our data logging approach. In our experiments, we are inter-

ested in identifying and detailing the behavior of traditional

IP scanners and NXDOMAIN scanners. Therefore, we set

up our infrastructure to capture all packets and DNS PTR

lookups for our address space.

3 Prevalence of IPv6 Scans

In this section, we answer the question: How prevalent is IPv6

scanning in the wild?

3.1 Data collection methodology

Overview.

Our study is based on scanner behavior observed

on a previously unused and unannounced

/56

IPv6 address

space (§3.1.1). We conduct controlled experiments by creat-

ing a treatment group of /64 subnets which contain publicly

advertised and visible services and a control group of subnets

with no services (§3.1.2). To log scanning behavior, we use

a logging infrastructure that captures all packets and DNS

queries sent to our /56 IPv6 address space (§3.1.3).

3.1.1 Characteristics of our IPv6 address space

Our

/56

subnet is a part of an autonomous system (AS)’s

/48

allocation. Since the

/56

subnet was previously unused and

unannounced by BGP, it should not receive any legitimate

trafﬁc [16]. We were granted access to the

/56

in 11/2020

after which the parent AS announced it to the Internet via

BGP and we setup data logging infrastructure (§3.1.3). We

left the address space idle for the next three months to bench-

mark base levels of Internet background radiation received

by our subnet. In 03/2021, we began a series of controlled

experiments to understand IPv6 scanner behavior. Figure 1

summarizes the timeline of our experiments.

Separating our /56 into treatment and control groups.

We run six different controlled experiments on our address

space — each requiring four unique end-host IPs. We ﬁrst sub-

divided the

/56

address space into four

/58

address spaces.

Then, due of the importance of using 64-bit IIDs for each host

(Cf. §2), we allocated each of our 24 end-hosts to a unique

/64

subnet and assigned them addresses from this subnet.

Therefore, each of our six experiments were conducted on four

unique end-hosts contained in four unique

/64

subnets. For

two of the four end-hosts associated with each experiment, we

allocated a pseudo-random IID. The remaining two received

lower-byte IIDs. Thus, we conduct each experiment on two

end-hosts which reﬂect the two most common forms of address

assignment in IPv6 networks. The 24

/64

subnets associated

with our experiments are our treatment subnets and every

other subnet is a control subnet. All treatment subnets were

randomly chosen from the same

/58

subnet such that no two

were adjacent to each other. This allows for proper analysis of

treatment effects (§4) and facilitates retries on the remaining

three /58 subnets if an experiment had to be repeated due to

failures. Fortunately, the latter was not required.

3.1.2 Attracting scanner attention

Following the initial three month period of inactivity from

11/2020 to 03/2021, we simulated host activity from our treat-

ment subnets by launching services (i.e., experiments

) that

mimic speciﬁc types of host behaviors on the Internet. We ran

one service at a time on the four corresponding treatment sub-

nets for at least 2 weeks to measure the impact on scanning

behavior caused by the speciﬁc host activity that is mimicked

by the service. More details on the methods for measuring

service effects are in §4.

Experiments (services) deployed.

The goal of our experi-

ments is to identify the effect of IPv6 host activity on scanning

behaviors. We achieve this goal by simulating six types of

host activity from the 24

/64

treatment subnets. Each ex-

periment uses a different method to leak the liveness of the

treatment

/64

subnets to scanners. These methods are based

on ﬁndings from prior work which highlight the sources of

IPv6 addresses leveraged for efﬁcient IPv6 scanning (§7). We

leak our addresses using a combination of direct and indirect

scanner contact approaches. Direct contact approaches send

packets directly to IPv6 addresses with the expectation of

1In the remainder of this paper, experiments and services are used interchangeably.

receiving scanner attention in return. In comparison, indirect

contact approaches enlist our services in public lists that may

be monitored by scanners seeking to discover new IPv6 ad-

dresses. Our six experiment deployments are described below.

Experiment 1: Web crawls to popular websites. With this

direct contact experiment, we mimic web browsing from a

standard home network where users make connections to web

servers that scanners may be operating or tapped into for

sources of IPv6 addresses [56]. We ﬁrst identiﬁed all IPv6-

capable websites in the Alexa Top 10K websites obtained

in 02/2021 by collecting their

AAAA

DNS records and check-

ing them for validity. In total, we found 2.6K IPv6-capable

websites which were the subject of our crawls. Following the

recommendations of Ahmad et al. [22], we conducted crawls

using a simple CLI crawler which did not load third-party

or dynamic content (

wget

) and a full-ﬂedged browser using

OpenWPM [31]. Our

wget

crawls did not load third-party

content and therefore only established direct connections with

the web servers of each website while the OpenWPM con-

nections used Firefox to also load dynamic content and make

connections with all third-party web servers associated with

a website. Therefore, each crawl leaked the same four host

addresses to a different (but overlapping) set of web servers.

Each crawl was conducted 2 weeks apart.

Experiment 2: Querying DNS open resolvers. In this direct

contact experiment, we leak our treatment subnet liveness

to open IPv6-capable DNS resolvers. Since no such list of

resolvers exists for IPv6, we used the approach of Hendriks

et al. [43] to identify IPv6-capable resolvers from IPv4 open

resolver lists. In total, we obtained 9K IPv6-capable open

resolvers and queried each of them for the

AAAA

record of

www.google.com. These queries were repeated every day for

a two week period. Therefore, this experiment leaked the live-

ness of four treatment subnets to over 9K IPv6 open resolvers.

Experiment 3: NTP pool servers. In this indirect contact ex-

periment, we hosted four instances of NTP pool servers in

four treatment subnets. To ensure that each NTP pool instance

used a different egress IP address, we created four network

namespaces on our Linux machine to isolate the NTP servers

we hosted. Network namespaces ensure separate ports and

IP addresses are assigned to them, allowing each of our NTP

pool servers to use a different IP address associated with one

of the four treatment subnets allocated to this experiment.

Each server was initially conﬁgured with the NTP default

parameters which set rate limits on our responses to liveness

probes from other NTP servers for the ﬁrst two weeks. These

rate limits prevented it from achieving the maximum pool

score of 20 during this period. During this time, the server

was usable by clients (and therefore discoverable by scan-

ners) but not recommended due to a low pool score. In §4

we refer to this part of the experiment as ‘NTP

pool

’. We re-

moved the rate-limit after two weeks and consequently our

servers immediately achieved the maximum pool score of 20

and was recommended for client use. We carried this phase

Figure 1: Timeline of our experiments that simulate host activity

from a /56 IPv6 subnet we own.

of the experiment for another two weeks and refer to it as

‘NTP

pool−20

’. Note that NTP pool servers are not publicly

listed on a website, but are possible to enumerate [19]. There-

fore, this indirect experiment leaked the liveness of its four

treatment subnets to scanners that enumerate NTP pool server

lists for scanning destinations.

Experiment 4: NTP public servers. While our NTP pool

servers were used by clients for synchronizing time, they were

not publicly listed and require additional effort to enumerate.

In this indirect contact experiment, we launched four NTP

Stratum 2 public servers that remained active for two weeks.

Unlike pool servers, these are published on an archived list

making them more visible to scanners [12]. This indirect

contact experiment leaked the liveness of its four treatment

subnets to scanners monitoring NTP server lists.

Experiment 5: Tor relays. For this indirect contact experiment,

we launched four Tor [28] middle relays with unrestricted

bandwidth in our subnet. These relays remained operational

for a two week period. Since our main purpose was to enlist on

the public Tor consensus, we chose middle relays as opposed

to entry or exit relays. We made this decision because entry

relays receive information about the clients connecting to Tor

and exit relays receive information about the destinations of

Tor trafﬁc. We did not ﬁnd such information appropriate to

gather and analyze. For the additional safety of Tor users, we

discarded any non-scanner trafﬁc (deﬁned in §3.1.3) destined

for the deployed relays. Therefore, this indirect contact exper-

iment leaked the liveness of the four treatment subnets to any

IPv6 scanners monitoring the Tor consensus.

Experiment 6: DNS zone ﬁles. Finally, we registered four

domains, two each with a

.com

and

.net

TLD. The AAAA

records of all four domains pointed to addresses from four

of our treatment subnets. Registering these domains with the

com

and

net

TLDs results in them getting added to the largest

TLD zone ﬁles. Since prior work has leveraged these lists to

identify web services with IPv6 addresses, we expect to make

indirect contact with scanners monitoring these zone ﬁles.

Limitations of our experimental setup.

We selected the

above-mentioned services for two reasons: (1) they provide a

range of common host activities that expose their IP addresses

to potential scanners and (2) based on their use in IPv6 target

generation and address extraction in prior work [15,37, 40].

These services allow us to "leak“ our address space to scan-

ners and achieve two goals. First, they help us identify how

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GlowingintheDarkUncoveringIPv6AddressDiscoveryandScanningStrategiesintheWildHammasBinTanveer1,RacheeSingh2,3,PaulPearce4,RishabNithyanand11UniversityofIowa,2Microsoft,3CornellUniversity,4GeorgiaTechAbstractInthisworkweidentifyscanningstrategiesofIPv6scannersontheInternet.Weofferauniqueperspectiveont...

展开>> 收起<<

Glowing in the dark_2.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Glowing in the dark_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: