Glowing in the dark_2

2025-04-29 0 0 3.64MB 17 页 10玖币
侵权投诉
Glowing in the Dark
Uncovering IPv6 Address Discovery and Scanning Strategies in the Wild
Hammas Bin Tanveer1, Rachee Singh2,3, Paul Pearce4, Rishab Nithyanand1
1University of Iowa, 2Microsoft, 3Cornell University, 4Georgia Tech
Abstract
In this work we identify scanning strategies of IPv6 scanners
on the Internet. We offer a unique perspective on the behav-
ior of IPv6 scanners by conducting controlled experiments
leveraging a large and unused
/56
IPv6 subnet. We selec-
tively make parts of the subnet visible to scanners by hosting
applications that make direct or indirect contact with IPv6-
capable servers on the Internet. By careful experiment design,
we mitigate the effects of hidden variables on scans sent to
our
/56
subnet and establish causal relationships between
IPv6 host activity types and the scanner attention they evoke.
We show that IPv6 host activities e.g., Web browsing, mem-
bership in the NTP pool and Tor network, cause scanners to
send a magnitude higher number of unsolicited IP scans and
reverse DNS queries to our subnet than before. DNS scanners
focus their scans in narrow regions of the address space where
our applications are hosted whereas IP scanners broadly scan
the entire subnet. Even after the host activity from our subnet
subsides, we observe persistent residual scanning to portions
of the address space that previously hosted applications.
1 Introduction
Scanning the IP address space has exposed security vulner-
abilities, enabling researchers and practitioners to develop
effective defenses. Tools for scanning the IP address space
grapple with the fundamental challenge of efficient scanner
target discovery — a task made more challenging by the in-
creasing adoption of IPv6 on the Internet. The IPv6 address
space consists of
2128
possible addresses, rendering brute-
force generation of scanning targets infeasible. Recent work
has developed tools [9, 17] to make Internet-scale IPv6 scan-
ning practical by analyzing IPv6 address assignment patterns
[40, 53, 54] and developing efficient scanner target generation
algorithms [40, 53, 54].
Despite the recent work on effectively scanning the IPv6
address space, little is known about the scanning strategies
deployed in the wild. We address this gap by analyzing IPv6
scanning from the perspective of IPv6 hosts on the Internet.
Our goal is to reveal target generation strategies of IPv6 scan-
ners and inform address assignment policies to mitigate the
impact of Internet-scale IPv6 scanners. Previous work with
similar goals performed observational studies using passive
measurements of unsolicited traffic. While observational stud-
ies provide useful insights they (1) do not identify address
discovery strategies leveraged by scanners and (2) may not
be representative of the real scanning activity observed by
actively in-use IPv6 networks [26,34,36,46] (§2). In contrast,
we take an active approach by conducting controlled exper-
iments to evaluate the impact of IPv6 host activity on scan-
ner behavior. We begin the study by acquiring a previously-
unused
/56
IPv6 subnet owned by a university. This address
space did not originate any traffic prior to the start of the study,
allowing us to conduct clean-slate controlled experiments that
make parts of the address space visible on the Internet for the
first time during our study. A subgroup of treatment subnets
in our address space host applications that make direct or
indirect contact with potential IPv6 scanners. We measure the
effect of the treatment by capturing unsolicited IPv6 scanning
activity received by our address space. By comparing the im-
pact on scanning activity between the treatment and control
subnets, we establish causal relationships between types of
host activities and increased scanner attention (§3).
Accurately associating a measurable increase in scanner
attention to specific IPv6 host activities is challenging. First,
due to our large yet limited IPv6 address space for exper-
imentation, discerning the effect of one host activity (Web
browsing) from another (Tor relay) on increased scanner atten-
tion is hard as the observable scanning activity can result from
a combination of host activities. Second, scanning activity can
often persist after the experimental host activity has subsided,
confounding the measured effects of subsequent experiments.
Finally, Internet scanners can coincidentally scan our treat-
ment subnets, endangering false conclusions about the effect
of host activity on scanner attention. We carefully design our
controlled experiments to mitigate the effects of these hidden
variables to improve the accuracy of our conclusions (§4).
Our study spans over a year from the time we acquired the
/56
subnet and began the controlled experiments. We analyze
the IPv6 scans and reverse DNS queries destined for our
/56
address space to make the following key findings:
Host activity has a sizable impact on scanner attention.
While our subnet attracted moderate background radiation
scans before we began our controlled experiments to sim-
ulate host activity, the host activity evoked sizable scanner
attention — with IP and DNS scanning increasing to 225
×
and 1.6
×
pre-experiment rates, respectively. On conclusion
of our experiments, the rates continued to stay high at 426
×
and 3.7
×
, respectively. Moreover, host activity that makes
direct contact with IPv6 capable servers on the Internet (e.g.,
Web browsing, querying open DNS resolvers) evokes 50
×
arXiv:2210.02522v1 [cs.NI] 5 Oct 2022
and 13
×
more activity during and after experimentation, in
comparison with host activities which make indirect contact
with potential scanners (§3).
DNS scanners have a narrow attention space.
Reverse-
DNS scanners focus their attention on scanning the treatment
subnets i.e., subnets hosting the IPv6 application and send
very few scans to subnets outside the treatment subnet. In
contrast, IP scanners have a broader attention span, often
scanning both inside and outside the treatment subnets (§4).
Residual scanning after host activity subsides.
DNS scan-
ning activity persists long after the host activity that evoked
the scans has subsided. In specific, we observe high volume
DNS scans for nearly six weeks in the treatment subnets used
for browsing the Web after the Web browsing activity has
ended. The scans re-start after a break of 2-3 weeks (§4).
Random and low-byte scanning are dominant strategies.
We fingerprint the IPv6 scanners that sent unsolicited traffic
to our subnet during the study and find that IPv6 scanners
have one of two main scanning strategies: they either have
equal interest in the entire address space (random scanners)
or focus more on low-byte addresses (§5).
Finally, we discuss the implications of our findings for
network operators (§6) and place our contributions in the
context of previous work (§7).
2 Background
In this section, we provide a high-level overview of IPv6
addressing and IPv6 scanning strategies.
IPv6 addressing.
An IPv6 address consists of
128
bits which
may be broken down into three parts — an
m
bit routing prefix,
an
n
bit subnet ID, and a
k
(= 128-
m
-
n
) bit interface ID (IID).
The routing prefix and subnet ID are used to route traffic to a
local network where hosts are identified by unique IIDs [44].
Network operators face two questions while allocating IPv6
addresses to networks and hosts (1) how many bits of an IPv6
address should be allocated to host IIDs (i.e., the value of
k
)
and (2) how should IIDs be allocated to hosts in a subnet.
Determining IID lengths. It is considered the best practice
for network operators to leave 64 bits for IIDs according
the RFC4291 [44]. There are many compelling reasons for
this practice. First, a large number of existing IPv6 config-
uration options and RFC recommendations assume a 64-bit
IID. Therefore, not following this recommendation may result
in operational failure when using IPv6-specific features and
technologies. For example, as pointed out in RFC 4291, a
64-bit IID is required to use privacy-enhancing IPv6 features
which allow for cryptographically generated addresses (RFC
3972 [23]), or to use IPv4-to-IPv6 transition protocols such
as 6to4 (RFC 3056 [52]), or to leverage neighbor discovery
protocols implemented in accordance with RFC 4861 [51].
Assigning IIDs. IIDs may be allocated to hosts through (1)
manual configuration, (2) stateless address auto configura-
tion (SLAAC defined in RFC4862) [58], or (3) the DHCPv6
IP leasing protocol (defined in RFC 8415 [49, 55]). Each of
these three methods may result in host addresses having dif-
ferent IID characteristics. For example, network operators
using manual configurations may assign host IPs in a pre-
dictable manner — e.g., using sequential addressing which
results in the lower bytes of the IID being populated with
all leading bytes set to 0 (as per RFC7707 [40]) or assign
host IP addresses based on their 48-bit MAC addresses (the
modified EUI-64 protocol defined in RFC4291 [44]). On the
other hand, operators using SLAAC or other approaches (e.g.,
from RFC 7943 [41], RFC 7217 [39], and RFC 3972 [23])
generate pseudo-random IIDs.
Our address allocation approach. Given the importance of
a 64-bit IID length, it is reasonable to assume that scanners
assume 64-bit IIDs in their scanning targets. Therefore, in
all our subsequent experiments detailed in §3 and §4, we
assign 64-bit IIDs to all our hosts and subnets. Further, we
use both IID generation approaches (lower-byte and pseudo-
random addresses generation) to generate IPs for hosts whose
addresses are leaked to scanners. This allows us to study
the impact of address allocation methods on the subsequent
address generation strategies leveraged by IPv6 scanners —
e.g., does the discovery of a host with a pseudo-random (or,
lower-byte) IID result in scanners only sending probes to
other addresses with a pseudo-random (or, lower-byte) IIDs.
IPv6 address representation.
A common method for rep-
resenting IPv6 addresses uses 32 nybbles, each denoted in
hexadecimal and representing four contiguous bits. Every four
nybbles (two bytes) are separated by a ‘:’ and any leading
zeros in these two byte sections may be dropped.
Representation in DNS reverse zones. DNS reverse
zones map addresses to domains — i.e., the opposite
of what DNS zones do. Sending a DNS PTR query
to the corresponding IPv6 reverse zone returns the do-
mains hosted with the corresponding IP address. IPv6
PTR records are organized under
ip6.arpa
in 32 lev-
els where each nybble of an exploded IPv6 address is a
level. Therefore, the PTR record associated with the ad-
dress
2601::dead:1
is available at
1.0.0.0.d.e.a.d.<20
repeating .0s>.1.0.6.2.ip6.arpa.
Scanning strategies: address discovery.
Unlike IPv4, scan-
ning the entire IPv6 address space for live addresses is infea-
sible. Instead, scanners focus their attention on regions of the
address space near addresses where some activity has been
observed. Prior work has used techniques like monitoring
public lists of IPv6 addresses (e.g., TLD zone files which list
the IPv6 addresses of domains) or hosting public services
to receive contact from previously unseen addresses (e.g.,
hosting a web service).
Our address leaking strategies. In our controlled experiments
described in §3 and §4, we leak the ‘liveness’ of specific
regions of an IPv6 network by the following means: (1) direct
contact with IPv6 capable web services, (2) sending DNS
queries to public DNS resolvers, (3) participation in the NTP
pool protocol [11] where IP addresses of participants can be
enumerated [19], (4) participation in the NTP public server
protocol [18] where our addresses are distributed via a public
listing of NTP servers, (5) participation in the Tor network
as a middle-relay where our addresses are distributed via the
Tor consensus [13], and (6) registering domains with specific
addresses so their presence is known via the TLD zone files.
Scanning strategies: probing for liveness
Once scanners
have a specific region of the address space to focus on, they
may use a different strategy to probe for active hosts.
Traditional IP scanning. A common approach is to solicit re-
sponses from live hosts by sending probes (packets associated
with a common Internet protocol such as ICMPv6) to a set
of candidate IP addresses using tools such as zmapv6 [3] and
nmap [7]. These candidates are generated by observing pat-
terns in the already identified live addresses (e.g., known live
addresses are sequentially allocated) by off-the-shelf tools
like ipv666 [2] and the IPv6 toolkit [10].
NXDOMAIN scanning. An emerging and increasingly popu-
lar approach for scanning IPv6 spaces for liveness involves
leveraging a feature of the IPv6 reverse-DNS lookup pro-
cess detailed in RFC8020 [25]. RFC 8020 [25] states that
reverse-DNS lookups within subnets that contain no do-
mains should receive an
NXDOMAIN
response code — e.g.,
if the prefix
2601::dead:1/80
contains no domains, reverse-
DNS queries for any more-specific prefixes should return
an
NXDOMAIN
response code. This allows scanners to make
significant reductions the address search space.
Our data logging approach. In our experiments, we are inter-
ested in identifying and detailing the behavior of traditional
IP scanners and NXDOMAIN scanners. Therefore, we set
up our infrastructure to capture all packets and DNS PTR
lookups for our address space.
3 Prevalence of IPv6 Scans
In this section, we answer the question: How prevalent is IPv6
scanning in the wild?
3.1 Data collection methodology
Overview.
Our study is based on scanner behavior observed
on a previously unused and unannounced
/56
IPv6 address
space (§3.1.1). We conduct controlled experiments by creat-
ing a treatment group of /64 subnets which contain publicly
advertised and visible services and a control group of subnets
with no services (§3.1.2). To log scanning behavior, we use
a logging infrastructure that captures all packets and DNS
queries sent to our /56 IPv6 address space (§3.1.3).
3.1.1 Characteristics of our IPv6 address space
Our
/56
subnet is a part of an autonomous system (AS)’s
/48
allocation. Since the
/56
subnet was previously unused and
unannounced by BGP, it should not receive any legitimate
traffic [16]. We were granted access to the
/56
in 11/2020
after which the parent AS announced it to the Internet via
BGP and we setup data logging infrastructure (§3.1.3). We
left the address space idle for the next three months to bench-
mark base levels of Internet background radiation received
by our subnet. In 03/2021, we began a series of controlled
experiments to understand IPv6 scanner behavior. Figure 1
summarizes the timeline of our experiments.
Separating our /56 into treatment and control groups.
We run six different controlled experiments on our address
space — each requiring four unique end-host IPs. We first sub-
divided the
/56
address space into four
/58
address spaces.
Then, due of the importance of using 64-bit IIDs for each host
(Cf. §2), we allocated each of our 24 end-hosts to a unique
/64
subnet and assigned them addresses from this subnet.
Therefore, each of our six experiments were conducted on four
unique end-hosts contained in four unique
/64
subnets. For
two of the four end-hosts associated with each experiment, we
allocated a pseudo-random IID. The remaining two received
lower-byte IIDs. Thus, we conduct each experiment on two
end-hosts which reflect the two most common forms of address
assignment in IPv6 networks. The 24
/64
subnets associated
with our experiments are our treatment subnets and every
other subnet is a control subnet. All treatment subnets were
randomly chosen from the same
/58
subnet such that no two
were adjacent to each other. This allows for proper analysis of
treatment effects (§4) and facilitates retries on the remaining
three /58 subnets if an experiment had to be repeated due to
failures. Fortunately, the latter was not required.
3.1.2 Attracting scanner attention
Following the initial three month period of inactivity from
11/2020 to 03/2021, we simulated host activity from our treat-
ment subnets by launching services (i.e., experiments
1
) that
mimic specific types of host behaviors on the Internet. We ran
one service at a time on the four corresponding treatment sub-
nets for at least 2 weeks to measure the impact on scanning
behavior caused by the specific host activity that is mimicked
by the service. More details on the methods for measuring
service effects are in §4.
Experiments (services) deployed.
The goal of our experi-
ments is to identify the effect of IPv6 host activity on scanning
behaviors. We achieve this goal by simulating six types of
host activity from the 24
/64
treatment subnets. Each ex-
periment uses a different method to leak the liveness of the
treatment
/64
subnets to scanners. These methods are based
on findings from prior work which highlight the sources of
IPv6 addresses leveraged for efficient IPv6 scanning (§7). We
leak our addresses using a combination of direct and indirect
scanner contact approaches. Direct contact approaches send
packets directly to IPv6 addresses with the expectation of
1In the remainder of this paper, experiments and services are used interchangeably.
receiving scanner attention in return. In comparison, indirect
contact approaches enlist our services in public lists that may
be monitored by scanners seeking to discover new IPv6 ad-
dresses. Our six experiment deployments are described below.
Experiment 1: Web crawls to popular websites. With this
direct contact experiment, we mimic web browsing from a
standard home network where users make connections to web
servers that scanners may be operating or tapped into for
sources of IPv6 addresses [56]. We first identified all IPv6-
capable websites in the Alexa Top 10K websites obtained
in 02/2021 by collecting their
AAAA
DNS records and check-
ing them for validity. In total, we found 2.6K IPv6-capable
websites which were the subject of our crawls. Following the
recommendations of Ahmad et al. [22], we conducted crawls
using a simple CLI crawler which did not load third-party
or dynamic content (
wget
) and a full-fledged browser using
OpenWPM [31]. Our
wget
crawls did not load third-party
content and therefore only established direct connections with
the web servers of each website while the OpenWPM con-
nections used Firefox to also load dynamic content and make
connections with all third-party web servers associated with
a website. Therefore, each crawl leaked the same four host
addresses to a different (but overlapping) set of web servers.
Each crawl was conducted 2 weeks apart.
Experiment 2: Querying DNS open resolvers. In this direct
contact experiment, we leak our treatment subnet liveness
to open IPv6-capable DNS resolvers. Since no such list of
resolvers exists for IPv6, we used the approach of Hendriks
et al. [43] to identify IPv6-capable resolvers from IPv4 open
resolver lists. In total, we obtained 9K IPv6-capable open
resolvers and queried each of them for the
AAAA
record of
www.google.com. These queries were repeated every day for
a two week period. Therefore, this experiment leaked the live-
ness of four treatment subnets to over 9K IPv6 open resolvers.
Experiment 3: NTP pool servers. In this indirect contact ex-
periment, we hosted four instances of NTP pool servers in
four treatment subnets. To ensure that each NTP pool instance
used a different egress IP address, we created four network
namespaces on our Linux machine to isolate the NTP servers
we hosted. Network namespaces ensure separate ports and
IP addresses are assigned to them, allowing each of our NTP
pool servers to use a different IP address associated with one
of the four treatment subnets allocated to this experiment.
Each server was initially configured with the NTP default
parameters which set rate limits on our responses to liveness
probes from other NTP servers for the first two weeks. These
rate limits prevented it from achieving the maximum pool
score of 20 during this period. During this time, the server
was usable by clients (and therefore discoverable by scan-
ners) but not recommended due to a low pool score. In §4
we refer to this part of the experiment as ‘NTP
pool
’. We re-
moved the rate-limit after two weeks and consequently our
servers immediately achieved the maximum pool score of 20
and was recommended for client use. We carried this phase
Figure 1: Timeline of our experiments that simulate host activity
from a /56 IPv6 subnet we own.
of the experiment for another two weeks and refer to it as
‘NTP
pool20
’. Note that NTP pool servers are not publicly
listed on a website, but are possible to enumerate [19]. There-
fore, this indirect experiment leaked the liveness of its four
treatment subnets to scanners that enumerate NTP pool server
lists for scanning destinations.
Experiment 4: NTP public servers. While our NTP pool
servers were used by clients for synchronizing time, they were
not publicly listed and require additional effort to enumerate.
In this indirect contact experiment, we launched four NTP
Stratum 2 public servers that remained active for two weeks.
Unlike pool servers, these are published on an archived list
making them more visible to scanners [12]. This indirect
contact experiment leaked the liveness of its four treatment
subnets to scanners monitoring NTP server lists.
Experiment 5: Tor relays. For this indirect contact experiment,
we launched four Tor [28] middle relays with unrestricted
bandwidth in our subnet. These relays remained operational
for a two week period. Since our main purpose was to enlist on
the public Tor consensus, we chose middle relays as opposed
to entry or exit relays. We made this decision because entry
relays receive information about the clients connecting to Tor
and exit relays receive information about the destinations of
Tor traffic. We did not find such information appropriate to
gather and analyze. For the additional safety of Tor users, we
discarded any non-scanner traffic (defined in §3.1.3) destined
for the deployed relays. Therefore, this indirect contact exper-
iment leaked the liveness of the four treatment subnets to any
IPv6 scanners monitoring the Tor consensus.
Experiment 6: DNS zone files. Finally, we registered four
domains, two each with a
.com
and
.net
TLD. The AAAA
records of all four domains pointed to addresses from four
of our treatment subnets. Registering these domains with the
com
and
net
TLDs results in them getting added to the largest
TLD zone files. Since prior work has leveraged these lists to
identify web services with IPv6 addresses, we expect to make
indirect contact with scanners monitoring these zone files.
Limitations of our experimental setup.
We selected the
above-mentioned services for two reasons: (1) they provide a
range of common host activities that expose their IP addresses
to potential scanners and (2) based on their use in IPv6 target
generation and address extraction in prior work [15,37, 40].
These services allow us to "leak“ our address space to scan-
ners and achieve two goals. First, they help us identify how
摘要:

GlowingintheDarkUncoveringIPv6AddressDiscoveryandScanningStrategiesintheWildHammasBinTanveer1,RacheeSingh2,3,PaulPearce4,RishabNithyanand11UniversityofIowa,2Microsoft,3CornellUniversity,4GeorgiaTechAbstractInthisworkweidentifyscanningstrategiesofIPv6scannersontheInternet.Weofferauniqueperspectiveont...

收起<<
Glowing in the dark_2.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:3.64MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注