Lost in Space: Improving Inference of
IPv4 Address Space Utilization
One challenge in understanding the evolution of Internet infrastructure is the lack of systematic mechanisms for monitoring the extent to which allocated IP addresses are actually used. In this paper we try to advance the science of inferring IPv4 address space utilization by analyzing and correlating results obtained through different types of measurements. We have previously studied an approach based on passive measurements that can reveal used portions of the address space unseen by active approaches. In this paper, we study such passive approaches in detail, extending our methodology to four different types of vantage points, identifying traffic components that most significantly contribute to discovering used IPv4 network blocks. We then combine the results we obtained through passive measurements together with data from active measurement studies, as well as measurements from BGP and additional datasets available to researchers. Through the analysis of this large collection of heterogeneous datasets, we substantially improve the state of the art in terms of: (i) understanding the challenges and opportunities in using passive and active techniques to study address utilization; and (ii) knowledge of the utilization of the IPv4 space.
Lost in Space: Improving Inference of
IPv4 Address Space Utilization
|Alberto Dainotti, Karyn Benson, Alistair King, kc claffy, Eduard Glatz,|
|Xenofontas Dimitropoulos, Philipp Richter, Alessandro Finamore, Alex C. Snoeren|
|CAIDA, UC San Diego, ETH Zurich, FORTH, TU Berlin, Politecnico di Torino|
In 2012, APNIC and RIPE exhausted their IP address pools; the other Regional Internet Registries (RIR) will likely exhaust their pools soon [?]. Running out of the Internet address space has been anticipated for decades, accompanied by intense debates over address management policy, IPv6 transition, and IPv4 address markets [?, ?, ?]. However, only one project has invested considerable effort in attempting to measure how many allocated addresses are actually being visibily used [?], where used was defined as “directly responding to an ICMP echo request”. (Answer: less than 4% of routed addresses.) In this study our objective is to build on this landmark work, by refining the definitions of used, and extending the measurement methods to include other types of data, including those that can reveal address usage not visible to ICMP.
Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization (explained in more detail in Lost in Space: Improving Inference of IPv4 Address Space Utilization) taxonomizes addresses as IETF reserved [?] and usable, the latter category we further classify according to whether they are assigned to an organization, visible to the global BGP interdomain routing system, and/or observably sending traffic. We have developed a methodology to measure and characterize IP address blocks per this taxonomy, and we use active probing and passive traffic measurement to distinguish between the lower left leaves of the tree: used and unused address blocks.
Internet-wide active probing poses at least four challenges: 1) measurement overhead, 2) potential violation of acceptable usage policies triggering complaints or blacklisting of the measurement infrastructure, 3) measurement bias due to operational filtering of scanning; and 4) inability to scale for use in a future IPv6 census. We recently showed [?] that passive measurements from darknets and an academic network reveal an additional active /24 address blocks not visibly active according to ICMP-based Internet census measurement. However, passive measurements introduce their own challenges, most notably the presence of traffic using spoofed source IP addresses, which can badly pollute estimates if not removed. In [?], we validated our methodology on two sources of traffic data available to us in 2012.
In this study, we first analyze the general applicability of passive measurements to survey Internet address usage. We extend the methodology introduced in [?], the most important aspect of which is removing spoofed traffic, to work with four different types of networks and measurement data: (i) full packet traces from a large darknet; (ii) netflow logs from a national academic network; (iii) sampled packet traces from a large IXP; (iv) traffic classification logs from residential customers of a European ISP. We analyze how inferences of active address blocks can be influenced by characteristics specific to traffic observation vantage points, such as traffic composition, size of the monitored address space, and duration and time of the measurement. We found that our VPs were reasonably robust to variations in these characteristics: we observe a substantial fraction of address space at all VPs or when observing from using smaller fractions of address spaces (where we could test that); and each VP saw a consistent number of /24 blocks over a two-year period.
After gaining confidence in our methodology, we used seven passive and active measurement datasets collected from July 2013 through Oct 2013 to perform the first extended IPv4 Census using the taxonomy in Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization. We compared our results to the state of the art represented by the ISI census [?] and found 718k previously undiscovered used /24 blocks, an increase of 15.6% over ISI. We also inferred used space from 98.9% of the ASes announcing in BGP vs. 94.9% discovered by ISI, and obtained a visible increase in intra-AS coverage (per-AS percentage of BGP-announced /24 blocks inferred as used).
Our results show that only 5.3M /24 blocks are used (37% of usable IPv4 space), and that 3.4M assigned /24 blocks are not even visible in the global BGP routing system. We analyze how unused space is distributed across RIRs, countries, continents, and ASes. We inferred that only 9.5% of the legacy /24 blocks are used and that most unused address blocks are in the U.S.
Finally, we discuss how previous scientific studies of Internet-related phenomena might change if they used this extended dataset instead of other related data sets to estimate the address space of ASes or countries.
Section Lost in Space: Improving Inference of IPv4 Address Space Utilization and Section Lost in Space: Improving Inference of IPv4 Address Space Utilization describe related work and the datasets we use in our study. Section Lost in Space: Improving Inference of IPv4 Address Space Utilization explains how we extract routed /24 address blocks from BGP and registry data. Section Lost in Space: Improving Inference of IPv4 Address Space Utilization provides a detailed evaluation of our passive traffic methodology. Section Lost in Space: Improving Inference of IPv4 Address Space Utilization combines passive and active measurement approaches to achieve unprecedented coverage in an IPv4 address space survey. Section Lost in Space: Improving Inference of IPv4 Address Space Utilization characterizes the utilization of the address space and Section Lost in Space: Improving Inference of IPv4 Address Space Utilization offers promising directions for applicability and extension of this work. We will share all datasets we are allowed to release per the AUP of their owners.
Huston [?, ?, ?, ?] has provided a wealth of statistics and projections related to IPv4 address space allocation and announcement in global routing tables, although he does not attempt to discern which routed addresses are actually used (for any defintion). Meng et al. [?] found that 90% of IPv4 prefixes allocated from 1997-2004 appeared in the global routing system within 75 days.
With respect to measurement to evaluate actual address usage, the landmark work is USC’s long-standing effort [?], based on comprehensive ICMP probing of the IPv4 space. Probing every routed IPv4 address over 30 days, repeated multiple times between 2005 and 2007, they observed only 3.6% of allocated addresses responding [?]. In developing their methodology, they compared ICMP and TCP probing to passive traffic observation of USC addresses on USC’s own campus network, finding 14% more USC IP addresses visible to ICMP than to TCP, and 28% more USC IP addresses visible to passive traffic observation than to either ICMP or TCP active probing. But each method observed some IP addresses missed by other methods. We also found active and passive methods are able to observe different subsets of addresses (Section Lost in Space: Improving Inference of IPv4 Address Space Utilization), but unlike [?], we use our passive monitors to infer usage about the entire Internet instead of only hosts internal to a network we monitor.
In 2013 Durumeric et al. [?] explored the system challenges of active Internet-wide scanning in developing Zmap, a scanner that probes the entire IPv4 address space in under 45 minutes from a single machine. Accelerated scanning was also a goal of an Internet Census illegally (and anonymously) performed in 2012 from a botnet [?], although their methods were neither well-documented nor validated [?].
Others have also explored the use of passive data to estimate specific usage characteristics of IPv4 addresses. Zander et al. [?] estimated the number of used IPv4 addresses by applying a capture-recapture method for estimating population sizes on active and passive measurement logs of IP addresses collected from sources such as web servers and spam blacklists. In contrast, we also infer which IP addresses are unused. Bartlett et al. [?] found that passive traffic observation and active probing complemented each other for the purpose of discovering active network services on campus.
Cai et al. [?] explores (and undertake several) potential applications of clustering active probes to infer address usage, including understanding how efficiently individual address blocks are used, assessing the prevalence of dynamic address management, and distinguishing low-bitrate from broadband edge links.
|Dataset||Source type||Data format||Period|
|UCSD-NT [?]||Traffic: Darknet||full pkt traces||July 23 to August 25, 2013|
|SWITCH [?]||Traffic: Live Academic Net.||Netflow logs||July 23 to August 25, 2013|
|IXP [?]||Traffic: IXP||sFlow packet samples||July 8 to July 28, August 12 to September 8, 2013|
|R-ISP [?]||Traffic: Residential ISP||Tstat[?] logs||from July 1 to September 31, 2013|
|ISI [?]||Active Probing: ICMP ping||logs||July 23 to August 25, 2013|
|HTTP [?]||Active Probing: HTTP GET||logs||October 29, 2013|
|ARK-TTL [?]||Active Probing: traceroute||logs||July to September, 2013|
|BGP [?, ?]||BGP announcements||RIBs||July to September, 2013|
|Available Blocks [?]||IANA/RIRs||IP ranges||October 1, 2013|
|NetAcuity Edge [?]||IP Geolocation||IP ranges||July 2013|
|prefix2AS [?]||BGP announcements||prefix to ASN||July 2013|
Table Lost in Space: Improving Inference of IPv4 Address Space Utilization lists characteristics of the datasets – collected between July and October 2013 – from which we extracted /24 blocks and inferred attributes.111We did not use reverse DNS PTR scans of the IPv4 space for the same reasons articulated in [?], namely that many active IP addresses lack DNS mappings, and many unused IP addresses still have (obsolete) DNS mappings.
We analyzed BGP announcements captured by all collectors (24 collectors peering with 184 peers) of the Routeviews [?] and RIPE RIS [?] projects. For each collector we took all routing tables (dumped every 2 hours by Routeviews and 8 hours by RIPE RIS) and built per-day statistics for each peer. For each /24 block, we computed the maximum number of peers that saw it reachable at any time within the full observation period of 92 days.
To determine which address blocks are available for assignment, we used a dataset compiled by Geoff Huston [?], which merges the extended delegation files from the 5 RIRs [?, ?, ?, ?, ?] with IANA’s published registries [?, ?, ?, ?, ?, ?]. We classified as available any /24 blocks falling in address ranges in this data set that were marked as either “available” (i.e., allocated to an RIR but not yet assigned to an LIR or organization) or “ianapool” (i.e., IANA has not allocated it to an RIR). This data does not have LIR granularity, thus we considered any block allocated to an LIR as assigned (i.e., not available).
We labeled as rfc5735 all /24 blocks within network ranges reserved by IETF (private networks, multicast, etc.) [?].
We apply our passive methodology for inferring used /24 blocks to the following four vantage points (VP), each of which retains traffic data in different formats (Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization) and thus requires different approaches to filtering for use in a census (Section Lost in Space: Improving Inference of IPv4 Address Space Utilization).
SWITCH. We collected unsampled NetFlow records from all the border routers of SWITCH, a national academic backbone network serving 46 single-homed universities and research institutes in Switzerland [?]. The monitored address range of SWITCH contains 2.2 million IP addresses, which correspond to a continuous block slightly larger than a /11.
R-ISP. We collected per-flow logs from a vantage point monitoring traffic of about 25,000 residential ADSL customers of a major European ISP [?]. The VP is instrumented to run Tstat, an open source passive traffic flow analyser [?] that stores transport-level statistics of bidirectional flows, and uses internal network knowledge to label flows as inbound or outbound.
UCSD-NT. We collected full packet traces from the /8 network telescope operated at the University of California San Diego [?].
IXP. Our fourth VP is a large European IXP interconnecting more than 490 networks, exchanging more than 400 PB monthly [?]. We have access to randomly sampled (1 out of 16K) packets, capturing the first 128 bytes of each sampled Ethernet frame exchanged via the public switching infrastructure of this IXP. A sample includes full Ethernet, network- and transport-layer headers, along with a few payload bytes.
ISI. We used the ISI Internet Census dataset it55w-20130723 [?], obtained by probing the routed IPv4 address space with ICMP echo requests and retaining only those probes that received an ICMP echo reply from an address that matched the one probed (as recommended [?]). Note that the ISI Census experiment was designed to report at a /32 (host) rather than /24 (subnet) granularity, but we apply the resulting data set to a /24 granularity analysis.
HTTP. We extracted IP addresses from logs of Project Sonar’s HTTP (TCP port 80) scan of the entire IPv4 address space on October 29, 2013 [?]. For each /24 block, we stored how many IP addresses responded to an HTTP GET query from the scan.
Ark-TTL. We processed ICMP traceroutes performed by CAIDA’s Archipelago to each /24 in the routed IPv4 address space between July and September 2013 [?]. Specifically, in order to add a third type of active probing data, we extracted the ICMP Time Exceeded replies sent by hops along the traceroute path.
To establish a mapping from /24 block to ASN, we merged all CAIDA’s Routeviews Prefix to AS [?] mappings files for July 2013. For each /24 in the IPv4 address space, we identified the set of overlapping prefixes and chose the most specific. We found 116k /24s (out of more than 10M) that mapped to multiple ASNs (due to multi-origin ASes and AS sets), which we omitted from our per-AS computations (Sections Lost in Space: Improving Inference of IPv4 Address Space Utilization and Lost in Space: Improving Inference of IPv4 Address Space Utilization).
We geolocated each /24 block using Digital Element’s NetAcuity Edge [?] database from 6 July 2013. For each /24, we identified the unique set of country codes to which overlapping blocks map. We found 27k /24s (out of more than 14M) that map to multiple countries, which we excluded from the geographic visualization in Section Lost in Space: Improving Inference of IPv4 Address Space Utilization.
Which address blocks can we consider globally routed? Of the unrouted space, which is assigned vs. available?
To distinguish legitimately routed address blocks from those that appear routed due to misconfigurations or hijacking, we consider a /24 block as routed only if covered by a prefix visible by at least 10 BGP peers. RIPE recommends this threshold [?], which we believe is reasonable since it removed 99.93% of the /24 blocks we previously determined are available or reserved by IETF (Section Lost in Space: Improving Inference of IPv4 Address Space Utilization and Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization) and thus could not be legitimately routed via BGP. Applying this threshold, also excludes another 4.1M /24 blocks that we would have otherwise labeled (likely incorrectly) as routed. We also filter out any other /24 blocks known to be in the available category (i.e., not assigned), even if observed by more than 10 peers. Our filtering yields 10.4M routed /24 blocks that we must further classify as used or unused.
Of the 4.1M unrouted /24 blocks (those we cannot observe in BGP), we know that .7M are available and thus unassigned, which leaves 3.4M /24 blocks that are assigned to organizations (many of whom announce other IPv4 address space) and yet not routed. In other words, 53 /8’s worth of address space are not used for the purpose of global BGP reachability.
How can we effectively extend our previous methodology [?] to different types of traffic?
We build on our initial methodology [?] that we used to analyze traffic captured in 2012 at two of the four VPs described in Section Lost in Space: Improving Inference of IPv4 Address Space Utilization and Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization (SWITCH and UCSD-NT). We extended this method to work with the fundamentally different types of traffic collected at the other two VPs (R-ISP and IXP), specially how to filter out spoofed traffic (Section Lost in Space: Improving Inference of IPv4 Address Space Utilization). We then evaluated the impact on our inferences of varying aspects of the vantage points: traffic composition size of monitored address space, and duration and times of measurement (Section Lost in Space: Improving Inference of IPv4 Address Space Utilization).
The main challenge in curating traffic data for use in a census is to remove spoofed traffic from the data sets, since it can severely distort estimates of address utilization. Since the R-ISP data retains bidirectional flow information and is guaranteed to see both directions of every flow, filtering out spoofed traffic is easy. For the IXP, the sampled data collection and the frequently asymmetric traffic flow (i.e., only one direction of a flow may traverse the IXP) mean that we cannot use the obvious and most reliable technique to infer spoofed traffic (i.e., failed TCP flow completion, variants of which we use for R-ISP and SWITCH data). Indeed, the IXP data sees only one packet for the vast majority of flows. The IXP traffic data also introduces a new challenge: filtering out packets with potentially unused destination addresses (e.g., scanning packets).
Although each VP’s data set requires its own technique, we tune and validate each technique using the same assumption: packets appearing to originate from [or destined to] unrouted blocks are potentially spoofed [or scanning] packets. As an additional source of validation, we compare our results at the SWITCH, R-ISP, and IXP VPs against network blocks that we know to be unused, i.e., the dark /24 blocks in the UCSD-NT address space 333Some addresses within this “darknet” are actually used and their traffic is not collected. (62,838 /24 blocks). To validate our estimates of spoofed traffic at the UCSD-NT VP, we use the /24 blocks from SWITCH that we infer to be dark because they did not generate a single bidirectional flow in the whole observation period (5,003 /24 blocks). We use these data only with UCSD-NT because their observation periods exactly match.
To filter spoofed traffic, we use the same heuristic we introduced in [?], which extracts from Netflow records bidirectional TCP flows with at least 5 packets and 80 bytes per packet on average. We performed a sensitivity analysis on these thresholds in [?], and found that they diminish the probability that the remote IP address is spoofed. Using this heuristic leads us to infer as used only 0.004% and 0% of the unrouted and the UCSD-NT /24 blocks, respectively (Table Lost in Space: Improving Inference of IPv4 Address Space Utilization).
Unlike the other traffic data sources, the R-ISP’s use of Tstat automatically removes essentially all spoofed traffic, since to be logged a TCP flow must complete the 3-way handshake. For UDP traffic, we extracted only bidirectional flows initiated locally with at least 1 packet with payload transmitted in both directions.
In [?] we looked deeply into several spoofing events to derive filters that would allow us to filter such events from darknet traffic in general. Two phenomena that we found to be indicators of a spoofing event were: (i) spikes in the numbers of both unrouted and overall /24 blocks per hour, and (ii) traffic using the same ports and protocols with a high fraction of unrouted source /24 blocks. We derived heuristics to filter the most recurring spoofing behavior (Table Lost in Space: Improving Inference of IPv4 Address Space Utilization), plus some specific large spoofing events (grouped in the table as “All Specific Filters”). Many types of spoofing observed in our 2012 study [?] were also present in 2013. In addition, we added two filters: TCP packets with no flags set and UDP packets without payload.
|Filter Type||Total /24s||Unrouted|
|TTL 200 and not ICMP||10,588,879||1,278,027|
|Least signif. byte|
|src addr 0||45,382||7|
|Least signif. byte|
|src addr 255||444,346||6,691|
|Same Src. and Dst. Addr.||96||0|
|No TCP Flags||3,449||638|
|UDP Without Payload||545||114|
|All Specific Filters||10,587,049||1,280,826|
After applying our filters, we observe more than 3 million /24 blocks. Table Lost in Space: Improving Inference of IPv4 Address Space Utilization shows that our filtering heuristics reduce traffic appearing to originate from unrouted or dark networks to around 0.05% (compared to 31.6% and 90.9% unrouted and dark blocks, respectively, before filtering).
For the IXP, we considered only TCP traffic and discarded TCP packets with the SYN flag set, which reduced the number of observably used /24s from 14.4M to 5.7M /24s. We then used a heuristic that tries to filter out /24s observed as used only due to spoofing noise. This heuristics is based on the number of packets and average packet size from and to a given /24 block. The first heuristic imposes a trade-off between false positives and false negatives: if we set the threshold high enough, we are more likely to filter out /24s that contain only IP addresses being used in spoofed source address packets. But we will also lose many legitimately used /24 blocks, especially since we only have 1:16K sampled packet data in to begin with. The average packet size threshold complements the packet count threshold by increasing the likelihood of retaining /24s that are actually exchanging TCP payload.
The left plot in Figure (a) shows the number of unrouted blocks that we inferred as used based on source addresses of the sampled packets (the darker the color the higher the number of unrouted blocks, log scale).
Interestingly, we see almost no packets in the IXP data set destined to unrouted /24 blocks, perhaps because there are no default routes advertised across BGP peering (vs. transit) sessions at the IXP, so only explicitly routed addresses will be observed as destinations. We can still use dark but routed destination addresses as indicators of scanning traffic; the right plot in Figure (a) shows the number of dark /24 blocks inferred as used when considering the destination addresses of packets. The average packet size is highly efficient at removing canning traffic.
To find an appropriate combination of thresholds, Figure (b) plots an ROC-like curve that shows the number of /24 blocks inferred as used (y axis) against the percentage of unrouted (for source addresses) and number of dark (destination addresses) /24 blocks. For a given requirement (e.g., “less than 0.1% unrouted”), we find the combination of thresholds (minimum number of packets and minimum average packet size) that results in the largest set of used /24 blocks.
To keep the error in our inference low, we consider a very conservative threshold (as shown as the dashed vertical lines in Figure (b)) and select used /24s from the SRC and DST addresses independently. The resulting numbers are depicted in Table Lost in Space: Improving Inference of IPv4 Address Space Utilization.444Note that blocks inferred by the source heuristic may also introduce dark /24s. Our antispoofing approach is efficient, reducing the number of unrouted and dark /24s dramatically, even for sampled traffic. Nevertheless, we point out that the number of used /24s directly depends on the thresholds applied and that the false-negative rate increases with more conservative thresholds. Hence, we likely miss used /24s with our current threshold selection.
We found similar behavior with UDP (as TCP) but we needed to set thresholds more conservatively, particularly for average packet size. We did not include UDP-based inferences in our final dataset, since the additional gain in terms of /24s was not significant compared given our lower confidence in the inferences.
After filtering spoofed traffic to the best of our ability, we analyzed the impact of four characteristics specific to a given vantage point on the number of /24s observed: traffic characteristics, size of address space monitored, and duration or specific time of monitoring. We found that our VPs were reasonably robust to variations in these characteristics, i.e, we observe a substantial fraction of address space at all VPs or when observing from smaller fractions of the address spaces (where we could test that), and each VP saw a consistent number of /24 blocks over a two-year period.
How do traffic characteristics specific to a VP influence its contribution to the inferences?
Characterizing traffic at our VPs assists with two objectives: (i) highlighting how the VP contributes to the census; and (ii) ensuring that traffic components specific to a VP do not skew our findings or make them not generally applicable. That is, to legitimately use passive traffic data for a census, we need to convince ourselves that a given VP is not observing a special set of /24 blocks. Fortunately for our purpose, we found that the number of /24s we inferred did not vary dramatically across VPs with substantially different traffic compositions. (We could not analyze traffic composition from the IXP due to the sampled packet capture.)
SWITCH. SWITCH hosts many popular services that attract end users to the monitored address space, including: a website hosting medical information, a SourceForge mirror, PlanetLab nodes, university web pages, and mail servers. The most popular service at SWITCH, the medical information website, exchanged traffic with hosts in 1.8M /24 blocks; however, all other IP addresses in SWITCH combined to capture 96.7% of all 3.6M blocks observed by SWITCH. The top 100 services in SWITCH each observe over 70K /24 blocks, and collectively contribute 91.2% of the /24 blocks observed at this VP. Compared to the UCSD-NT and R-ISP vantage points, SWITCH’s value as a VP depends more on these popular IP addresses. If SWITCH did not host its top 1000 most popular IP addresses, (i.e., the top services), it would observe only 69.9% of the /24 blocks it otherwise observes, compared to 89.7% and 97.5% at R-ISP and UCSD-NT respectively.
R-ISP. Table Lost in Space: Improving Inference of IPv4 Address Space Utilization aggregates the Tstat-identified traffic categories observed at R-ISP into five traffic components accounting for 97% of /24 blocks observed at the ISP. While HTTP and HTTPS account for 57.7% of the traffic volume, they contribute only 6.8% of the /24 blocks observed at the VP. Instead, the largest source of /24 blocks comes from client-to-client communication (e.g., P2P and VoIP). P2P is a key contributor, as 610k /24 blocks are only observable through P2P traffic.
UCSD-NT. The non-uniform nature of darknet traffic is well-known [?, ?], but our 2012 study of UCSD-NT and a darknet of comparable size observed a similar number of /24 blocks in a 34 day study (3.14M vs 2.98M) [?]. Surprisingly, P2P also plays a key role at the UCSD-NT VP, where we observe 2.2M /24 blocks (357k unique) from traffic with a BitTorrent payload (see Table Lost in Space: Improving Inference of IPv4 Address Space Utilization), probably caused by index poisoning attacks [?]. To a lesser extent, networks with end users are exposed through malware-infected hosts (e.g., Conficker and scanning). Alternatively, the backscatter traffic (a result of a spoofed DoS attack) reveals networks likely hosting services. Two classes (UDP 0x31101010A 58-byte with payload mostly 0x00 and ninth byte set to 0x31. The most common destination port (39455) is observed but unclassified in [?, ?]. and Encrypted) are of unknown purpose.
What is the effect of vantage point size?
Analyzing the effect of vantage point size on the number of /24 blocks observed is not straightforward due to the non-uniform nature of the monitored address space. Notwithstanding the extraordinary popularity of some IP addresses, as well as non-uniform assignment of hosts within an address subnet, we found an interesting correlation: the median number of /24 blocks observed is roughly proportional to the log of the number of monitored IP addresses. Consistent with this observation, the marginal utility of monitoring an additional IP address declines as the size of the vantage point increases, as expected.
How does the duration or time of collection affect the inference of which /24s are used?
Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization shows the logarithmic but varied growth of the number of /24 blocks collected over time for our four VPs. SWITCH, which initially captures the fewest /24 blocks has the fastest growth rate; while the R-ISP and IXP VPs capture more /24 blocks initially but they grow more slowly. Other factors that can influence inferences are strong changes in traffic composition, e.g., flash events. Our traffic data sets all had low (max 2%) standard deviation in the number of /24 blocks observed per week, with no abnormal events observed.
However, when observing measurements from a broader time frame, we found evidence of flash events and changes in traffic. For example, in August 2012 (the year before our study), SWITCH web sites hosting content about shark protection experienced a sharp increase in visits (and thus observed /24 blocks); the Discovery Channel’s Shark Week aired that month. Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization shows per-month sample measurements using our methodology over a period of two years. All vantage points except UCSD-NT observed a similar number of /24 blocks per month. At UCSD-NT, changes in IBR composition resulted in a corresponding increase in visible /24 blocks. Specifically, (i) in July 2012, there was an increase in BitTorrent traffic; (ii) in March 2013, there was a large increase in the darknet’s backscatter category, possibly related to the DDoS attacks on Spamhaus [?]. Such events may bring additional data but note that for the last two years each VP consistently observes over 3M /24s.
In this section, we first combine our seven datasets obtained from active and passive measurements to break down the routed node in Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization into used and routed unused categories (we filtered all the datasets used in this section to include only /24 blocks marked as routed according to Section Lost in Space: Improving Inference of IPv4 Address Space Utilization). We then compare our results to the state of the art represented by the ISI census (Section Lost in Space: Improving Inference of IPv4 Address Space Utilization).
What are the respective contributions and limitations of active and passive measurements? Are passive measurements from multiple VPs useful?
The top half of Table Lost in Space: Improving Inference of IPv4 Address Space Utilization shows the number of /24 blocks discovered by each active approach and their unique contribution. The large number of /24 blocks found by ISI and HTTP, and their distinct contributions within the set of active measurements, are unsurprising because we know that ICMP and TCP port 80 probing are among the most effective active probing methods that capture different but overlapping populations [?, ?]. More interesting is the 40k additional /24 blocks that we obtain from the Ark dataset; we speculate that routers may be sending TTL exceeded packets using a source address from what they use in ICMP echo responses.
|Dataset||# /24s||# Unique /24s||# Unique /24s|
|same family||+ passive|
The bottom half of Table Lost in Space: Improving Inference of IPv4 Address Space Utilization) compares the contribution of our passive measurements. The merged results from our four passive VPs do not entirely cover the set observed by active measurements, missing about 840k /24 blocks. However, the same data includes 470k /24 blocks not observed through active measurements, demonstrating the value of combining active and passive datasets.
Each passive vantage point offers a unique contribution, shown in the third and fourth columns of Table Lost in Space: Improving Inference of IPv4 Address Space Utilization, suggesting that these measurements are not exhaustive and that using more vantage points would improve the coverage. In particular, when we examine the portion of the address space observed exclusively by passive approaches (470k /24 blocks, not shown in the table), we find that only 17% of it was visible by all four vantage points, while 41% came from the sum of each unique contribution (4th column in Table Lost in Space: Improving Inference of IPv4 Address Space Utilization).
Since 3 out of 4 vantage points are in Europe, we test for the possibility of geographical bias in the passive measurements. Table Lost in Space: Improving Inference of IPv4 Address Space Utilization shows the percent increase of /24 blocks discovered by merged passive+active data vs. active measurements alone. The larger increase in European coverage vs. other continents (middle column) is consistent with a slight bias from to the European vantage points, but on a per-continent basis the marginal increase spreads more easily across continents (right column, noting that the lower three continents have so much less address space that any increase will be relatively large in percentage terms.)
|% of newly||per-continent|
|/24 blocks||of /24 blocks|
We also explored why so much space is discovered by the active but not the passive measurement in our data sets. Perhaps our heuristics to remove spoofed traffic are too conservative and remove much legitimate traffic. Also, for IXP and SWITCH, we included only TCP traffic which could have limited our view; curating UDP and other traffic (removing spoofed and scanning traffic) is future work.
|# VPs||# ISI-special /24s||# single-IP /24s|
Our results also reveal large fractions of IPv4 space visible only by active measurements that do not generate traffic on the public Internet, shedding doubt on whether they are really “used” for the purpose of global BGP reachability. In particular, we found most /24 blocks from the ISI dataset with a single responding address whose last octet was 0, 1, or 255 (the “isi_special” column) were not observed in our passive measurements. Table Lost in Space: Improving Inference of IPv4 Address Space Utilization shows the distribution of the number of passive vantage points that saw such /24 blocks (2nd column), as well as all /24s in the ISI data that had only a single non-special responding IP address (3rd column). Many of these blocks were not visible to any of our vantage points during the observation periods (Table Lost in Space: Improving Inference of IPv4 Address Space Utilization, 1st row), including the vast majority of the “isi_special” /24 blocks. The progression from /24 blocks observed by 1 to 4 VPs shows a rapid decay for “isi_special” blocks (middle column), while there is almost no trend for /24s in the right column. We conclude that /24s represented in the middle column likely do not send traffic to the public Internet.
We manually investigated other cases of network blocks only visible to active probing, identifying special cases that suggest absence of traffic on the public Internet, including clusters of /24 blocks apparently used as internal CDNs by large service providers. We plan a more thorough investigation of this behavior as future work.
The last row of Table Lost in Space: Improving Inference of IPv4 Address Space Utilization shows the final number (5.3M) of /24 blocks we infer as used combining our 7 active and passive datasets (leftmost leaf in Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization). We subtract this from the total amount of BGP-routed space (10.4M) to arrive at an estimate of 5.1M routed unused /24 blocks, an impressive quantity of unused but BGP-reachable IPv4 space.
What is the improvement of our combined approach to infer utilization in the routed space with respect to the state of the art (ISI census)?
We consider the ISI Census [?] to be the state of the art in inferring address space utilization within the routed space. Since there is no ground truth available about which routed space is actually utilized, we present our results in terms of additional IPv4 space coverage we obtain when combining our 7 datasets (which include ISI). We consider coverage at three different levels: (i) the percentage of routed /24 blocks inferred as used (global coverage); (ii) the percentage of ASes announcing the /24 blocks inferred as used out of the ASes that announce at least one BGP prefix (44628 ASes) (AS-level coverage); (iii) for each AS, the percentage of routed /24 blocks inferred as used (intra-AS coverage). AS-level coverage is the only case in which we expect the upper bound to approximate ground truth (i.e., it is reasonable to assume that an AS announcing prefixes on BGP uses at least one /24 block).
We found 718k previously undiscovered used /24 blocks (difference between last and 1st row of Table Lost in Space: Improving Inference of IPv4 Address Space Utilization), bringing global coverage from 44% to 51%. Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization shows that adding just a single dataset can greatly improve the global coverage. As we include our additional datasets, there is considerable amount of overlap. If we were to include additional measurements of used address space, the actual number of /24 blocks would be highly dependent on the quality and diversity of the datasets. However, if we consider the logarithmic trend suggested by our observations, increasing the number of additional datasets from 6 to 12 would result in approximately 200k more /24 blocks.
Our AS-level coverage is 98.9% versus 94.9% found by ISI. We manually analyzed whois and BGP data for the 489 ASes for which we did not infer a single used /24 block. We found that 37 ASes associated with U.S. military organizations accounted for 79% of the (17080) /24 blocks advertised by these 489 unobserved ASes. We suspect such networks do not transmit ICMP, TCP or UDP traffic over the public Internet (but they may be tunneling traffic using, e.g., IPSEC, which we did not capture in our passive measurements.) The vast majority of the remaining ASes (399 out of 452) announce 10 or fewer /24 blocks.
Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization shows the intra-AS coverage obtained with our combined approach as a function of results obtained by ISI (the graph is sorted by increasing ISI intra-AS coverage, with bins of 2%). The bottom graph shows the number of ASes per bin. In the top graph, the bottom grey bar represents the minimum intra-AS coverage obtained by ISI for the ASes in the bin, whereas the remaining 4 (colored) bars refer to the intra-AS coverage obtained by our combined approach (which includes ISI). Each of these 4 bars represents a quartile of the ASes in the bin. For a given bar, its bottom and top show on the axis, respectively, the lower and upper bound of the coverage we obtain for such ASes. For example, in the first bin, the bar from the median to the upper quartile shows intra-AS coverage between 23% and 100%. The graph shows visible increments across the whole axis (decreasing as ISI intra-AS coverage approaches 100%). This result shows that even for ASes which responded to ISI’s pings (), our additional datasets reveal new /24 blocks (i.e., ASes do not exhibit a uniform behavior across their used subnets with respect to ICMP echo requests). In most of the bins, for half of the ASes (i.e., two bottom quartiles) we obtain a few percentage point increase. The two upper quartiles show more significant increments, e.g., up to , for ASes in the upper quartile we see about 20% more /24 blocks (at least). The first bin shows different behavior, with at least 25% of ASes covered entirely by our method (although most of these ASes announce only one /24).
SWITCH is the only AS for which we can derive better reference data (rather than simply using the 100% upper bound): from 23 July to 25 August 2013, all 9,271 /24 blocks within SWITCH were announced in BGP, but only 49% of these blocks generated bidirectional flows. Assuming these are the only used /24 blocks in SWITCH, we should not infer an intra-AS coverage above 49% for this AS (instead of considering 100% of the routed /24 blocks according to our definition of upper bound). ISI’s inferred 20.9% intra-AS coverage for this AS; our combined approach (without data from the SWITCH VP) reached 33.1%. Still almost 16% of the used blocks of the AS were not discovered by our approach, showing space for further improvement. However, for all other ASes we would include the SWITCH VP in our analysis, potentially resulting in a higher intra-AS coverage.
How is used/unused and available space distributed across RIRs, ASes, countries and continents? Which ASes or countries make the worst use of the space they have been assigned? Would previous scientific studies of Internet-related phenomena change if they used this dataset instead of other related data sets?
Finally, we examine IPv4 address space utilization from the perspective of our inferences. We emphasize that our inferences do not provide complete coverage of the used IPv4 address space, but it is the first dataset which includes ASes and network blocks that do not reply to ICMP probing. All our data is from approximately the same period (from July 2013 through Oct 2013). We assume that usage of the address space does not change significantly within a period of 4 months.
Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization illustrates a Hilbert map of IPv4 address space utilization based on our results, taxonomized in Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization. The IETF reserved space accounts for 2.3M address blocks, or 13.7% of the entire IPv4 address space (blue). The remaining usable 14.5M address blocks consist of 5.3M (37%) used (red), 5.1M (35%) routed unused (grey), 3.4M (23%) unrouted assigned (black), and 0.7M (5%) available (green). Out of the 4.1M unrouted /24 blocks (Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization), 3.4M /24 blocks are assigned to organizations and yet not routed. These numbers are striking and suggest revisiting the topic of IPv4 address depletion. First, an enormous amount of IPv4 address space is assigned to organizations that do not even announce it on the BGP plane (i.e., there is no need to perform inference through additional active/passive measurements to sketch this phenomenon). In addition, since we verified that several of these organizations announce on BGP other address blocks they have been assigned, such number also suggests that our inference of large unutilization of routed space is realistic.
Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization classifed IPv4 addresses by their RIR region, or as legacy addresses if they were allocated before the RIR system began. Legacy addresses were allocated by the central Internet Registry prior to the RIRs primarily to military organizations and large corporations such as IBM, AT&T, Apple. Some of this space is now administered by individual RIRs. We use the IANA IPv4 address space registry [?], which marks legacy space and its designation at a /8 granularity. In Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization, we observe that the set of legacy routed unused and unrouted assigned addresses are similar in size (5.1M /24s) to the entire used address space (5.3M /24s). 42% of the usable address blocks are legacy; these blocks are more lightly utilized (9.5% of the legacy) and include more unrouted assigned (45% of the legacy) addresses than the RIRs (56% and 7.7% of the RIR address blocks, respectively). ARIN, RIPE, APNIC, and LACNIC have 50%, 65%, 54% and 68% of their address blocks used, respectively, in contrast to AFRINIC which has fewer of their blocks used (31%) and many more available (38%) address blocks than other RIRs (6.7% of other RIR addresses are available).
Table Lost in Space: Improving Inference of IPv4 Address Space Utilization lists the top-5 continents and countries in unused and unrouted assigned /24s. 52.2% of the unused space and 72% of unrouted assigned space is in North America, primarily in the U.S., where most legacy allocations (which are only 9.5% used, per Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization) were made. Asia follows, with China owning 8.79% and 5.7% of the global routed unused and unrouted assigned space, respectively, and then Europe. Other continents (South America, Oceania, and Africa) have between 0.93% and 2.13% of the global unused and unrouted assigned space. We can also observe that the distribution of unrouted assigned space is more skewed than routed unused space, because 45% of legacy space is unrouted assigned, while only 7.7% of the non-legacy space is unrouted assigned. Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization visually illustrates the per-country ratio of assigned unused over assigned space, suggesting which regions using space most and least efficiently. The U.S. is red in this map due to a few very large allocations, while some African countries are red because they use a very small fraction of their (also small) assigned space.
|By Routed Unused /24s||By Unrouted Assigned /24s|
|North America||52.2%||North America||72.0%|
|By Routed Unused /24s||By Unrouted Assigned /24s|
Table Lost in Space: Improving Inference of IPv4 Address Space Utilization lists the top ASes by routed unused /24s (we do not have per-AS data for unrouted assigned space). The top ASes are the Department of Defense (DoD) Network Information Center (NIC), followed by Level 3, HP, China Telekom, and finally UUNET.
|Top ASes in unused /24s|
|AS Name & Number||Routed Unused /24s (%)|
|DoD NIC (721)||190k (3.82%)|
|Level 3 (3356)||157k (3.16%)|
|HP (71)||126k (2.54%)|
|China Telecom (4134)||106k (2.13%)|
|UUNET (701)||105k (2.12%)|
Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization compares address space assigned to countries to per-country population [?] and Gross Domestic Product (GDP - we used “purchasing power parity” from CIA’s World Factbook [?]). We observe notable disparities between used /24s and population. For example, USA, Australia, UK, Canada, and Germany have 25%, 1.45%, 3.52%, 2.06% and 4.11% of the used /24s, but only 4.44%, 0.31%, 0.89%, 0.49% and 1.13%, respectively, of the population. In contrast, African and Asian countries have 16% and 59% of the population, but only 1.8% and 32%, respectively, of the used /24s. Nevertheless, the per-country used /24s correlate much better with the distribution of GDP (0.960 correlation), than with population (0.517 correlation), suggesting that economic inequalities could explain the differences in the used /24s.
This type of census dataset also has implications for a range of scientific research of the Internet, most notably projects that incorporate routed address space metrics into estimates of the size, degree, type, or maliciousness of ASes [?, ?, ?, ?, ?]. More accurate metrics of address space usage could also potentially improve the accuracy of analysis of (or prediction of likely future) address blocks transfers in the grey market [?]. Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization shows the overestimation error one would make by using a canonical BGP-routed address space metric to reflect how much address space an AS is actually observably using, for five types of network providers of various sizes. Figure Lost in Space: Improving Inference of IPv4 Address Space Utilization shows the overestimation error when using the same (BGP-routed address space) to reflect each country’s Internet footprint. Both figures show that there is no simple formula to translate between routed address space and actually used address space – the difference varies widely by AS.
We presented a new methodology for performing an Internet-wide census of IPv4 address space utilization. Among the many results presented, we find that only 5.3M /24s address blocks are visibly used (37% of usable /24 blocks), and that 3.4M assigned /24 blocks are not even visible in the global BGP routing system.
In addition to the applications of census measurements that have been well articulated by [?] (and summarized in Section Lost in Space: Improving Inference of IPv4 Address Space Utilization), there are many possible future directions for this work. To improve the methodology, we would like to further improve our ability to infer spoofed traffic, and validate our inferences, perhaps by responding to darknet traffic. We would also like to investigate the use of UDP or other protocol traffic at at R-ISP and IXP vantage points, and analyze in more detail what addresses are less visible to traffic measurement e.g., internal CDNs or quiet networks. As always, additional vantage points and ground truth information from operators would help improve the integrity of the method.
For a periodic global Internet census that tracks changes over time, we imagine a hybrid approach that first infers active IP address blocks based on passive measurements from one or more (live or dark) traffic vantage points, then probes only those address blocks that cannot be confidently inferred as active. This approach could dramatically improve coverage over state of the art methods, while minimizing measurement overhead and potential irritation of network operators with aggressive firewalls. Finally, the unscalability of active scanning to the IPv6 address space was one motivation to explore our hybrid apporach, but we do not know how well distributed passive traffic observation alone could effectively support a future IPv6 census.
- 1 UCSD Network Telescope. http://www.caida.org/projects/network_telescope/.
- 2 A. Dainotti, A. King. CAIDA Blog: Carna botnet scans confirmed. http://blog.caida.org/best_available_data/2013/05/13/carna-botnet-scans/.
- 3 Advanced Network Technology Center, University of Oregon. Route Views Project. http://www.routeviews.org/.
- 4 AFRINIC. AFRINIC delegation file. ftp://ftp.afrinic.net/pub/stats/afrinic/delegated-afrinic-extended-latest.
- 5 B. Ager, N. Chatzis, A. Feldmann, N. Sarrar, S. Uhlig, and W. Willinger. Anatomy of a large european ixp. In Proceedings of ACM SIGCOMM 2012, SIGCOMM ’12, pages 163–174, New York, NY, USA, 2012. ACM.
- 6 APNIC. APNIC delegation file. http://ftp.apnic.net/stats/apnic/delegated-apnic-latest.
- 7 ARIN. ARIN delegation file. http://ftp.arin.net/pub/stats/arin/delegated-arin-extended-latest.
- 8 G. Bartlett, J. Heidemann, and C. Papadopoulos. Understanding passive and active service discovery. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC ’07, pages 57–70, New York, NY, USA, 2007. ACM.
- 9 X. Cai and J. Heidemann. Understanding block-level address usage in the visible internet. In Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM ’10, pages 99–110, New York, NY, USA, 2010. ACM.
- 10 CAIDA. Routeviews prefix to AS mappings dataset for IPv4 and IPv6. http://www.caida.org/data/routing/routeviews-prefix2as.xml.
- 11 CAIDA. Supplemental data: Lost in Space: Improving Inference of IPv4 Address Space Utilization. http://www.caida.org/publications/papers/2014/lost_in_space/supplemental/, 2014.
- 12 CIA. The world factbook: GDP (purchasing power parity). https://www.cia.gov/library/publications/the-world-factbook/rankorder/2001rank.html.
- 13 CIA. The world factbook: Population. https://www.cia.gov/library/publications/the-world-factbook/rankorder/2119rank.html.
- 14 E. Cooke, Z. M. Mao, and F. Jahanian. Hotspots: The Root Causes of Non-Uniformity in Self-Propagating Malware. In International Conference on Dependable Systems and Networks DSN, 2006.
- 15 J. Czyz, K. Lady, S. Miller, M. Bailey, M. Kallitsis, and M. Karir. Understanding IPv6 Internet Background Radiation. In 2013 ACM SIGCOMM conference on Internet measurement, IMC ’13, 2013.
- 16 A. Dainotti, K. Benson, A. King, k. claffy, M. Kallitsis, E. Glatz, and X. Dimitropoulos. Estimating Internet address space usage through passive measurements. In ACM SIGCOMM Computer Communication Review (CCR), 2014.
- 17 A. Dhamdhere and C. Dovrolis. Twelve Years in the Evolution of the Internet Ecosystem. IEEE/ACM Transactions on Networking, Oct. 2011.
- 18 Digital Element. Netacuity. http://www.digital-element.net/ip_intelligence/ip_intelligence.html.
- 19 Z. Durumeric, E. Wustrow, and J. A. Halderman. ZMap: Fast Internet-wide scanning and its security applications. In Proceedings of the 22nd USENIX Security Symposium, 2013.
- 20 A. Finamore, M. Mellia, M. Meo, M. M. Munafø’, and D. Rossi. Experiences of internet traffic monitoring with tstat. IEEE Network, 25(3), 2011.
- 21 V. Gehlen, A. Finamore, M. Mellia, and M. M. Munafò. Uncovering the big players of the web. In TMA, TMA’12, pages 15–28, Berlin, Heidelberg, 2012. Springer-Verlag.
- 22 Geoff Huston. IPv4 Address Report. http://www.potaroo.net/tools/ipv4/.
- 23 Geoff Huston. Delegated address space: extended report, 2013. http://bgp.potaroo.net/stats/nro/archive/delegated-nro-extended-20131001.
- 24 H.D. Moore. Project Sonar), 2008. https://community.rapid7.com/community/infosec/sonar/blog.
- 25 J. Heidemann, Y. Pradkin, R. Govindan, C. Papadopoulos, G. Bartlett, and J. Bannister. Census and survey of the visible Internet. In 8th ACM SIGCOMM conference on Internet measurement, IMC ’08, 2008.
- 26 J. Horchert and C. Stöcker. Mapping the internet: A hacker’s secret internet census. Spiegel Online, March 2013.
- 27 G. Huston. Ipv4: How long do we have? The Internet Protocol Journal, 6(4):2008–2010, 2003.
- 28 G. Huston. Ipv4 address depletion and transition to ipv6. Internet Protocol Journal, 9(10):18–28, 2007.
- 29 G. Huston. The changing foundation of the internet: confronting ipv4 address exhaustion. The Internet Protocol Journal, 11(3):19–36, 2008.
- 30 Y. Hyun, B. Huffaker, D. Andersen, M. Luckie, and K. C. Claffy. The IPv4 Routed /24 Topology Dataset, 2014. http://www.caida.org/data/active/ipv4_routed_24_topology_dataset.xml.
- 31 IANA. Autonomous system (AS) numbers. http://www.iana.org/assignments/as-numbers/as-numbers.xml.
- 32 IANA. Ipv4 address space registry. https://www.iana.org/assignments/ipv4-address-space/.
- 33 IANA. IPv4 recovered address space. http://www.iana.org/assignments/ipv4-recovered-address-space/ipv4-recovered-address-space.xml.
- 34 IANA. IPv4 special-purpose address registry. http://www.iana.org/assignments/iana-ipv4-special-registry/iana-ipv4-special-registry.xml.
- 35 IANA. IPv6 global unicast address assignments. http://www.iana.org/assignments/ipv6-unicast-address-assignments/ipv6-unicast-address-assignments.xhtml.
- 36 IANA. IPv6 special-purpose address registry. http://www.iana.org/assignments/iana-ipv6-special-registry/iana-ipv6-special-registry.xhtml.
- 37 Information of Sciences Institute, University of Southern California. LANDER project:Internet address census it49c-20120731. http://www.isi.edu/ant/traces/internet_address_census_it55w-20130723.README.txt, 2012.
- 38 Information of Sciences Institute, USC. Internet Address Survey Binary Format. http://www.isi.edu/ant/traces/topology/address_surveys/binformat_description.html, 2012.
- 39 Internet Assigned Numbers Authority (IANA). IPv4 Address Space Registry. http://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.xhtml.
- 40 M. Konte and N. Feamster. Re-wiring activity of malicious networks. In Proceedings of the 13th International Conference on Passive and Active Measurement, PAM’12, pages 116–125, Berlin, Heidelberg, 2012. Springer-Verlag.
- 41 LACNIC. LACNIC delegation file. ftp://ftp.lacnic.net/pub/stats/lacnic/delegated-lacnic-latest.
- 42 J. Liang, N. Naoumov, and K. W. Ross. The index poisoning attack in p2p file sharing systems. In In INFOCOM. IEEE, 2006.
- 43 I. Livadariu, A. Elmokashfi, A. Dhamdhere, and K. Claffy. A first look at ipv4 transfer markets. In D. Papagiannaki and V. Misra, editors, CoNEXT 2013, pages 7–12. ACM SIGCOMM, December 2013.
- 44 A. Lodhi, N. Larson, A. Dhamdhere, C. Dovrolis, and k. claffy. Using PeeringDB to Understand the Peering Ecosystem. ACM SIGCOMM Computer Communication Review (CCR), 44(2):21–27, Apr 2014.
- 45 M. Luckie, B. Huffaker, A. Dhamdhere, V. Giotsas, and k claffy. AS relationships, customer cones, and validation. In IMC, Oct. 2013.
- 46 M. Luckie, Y. Hyun, and B. Huffaker. Traceroute Probe Method and Forward IP Path Inference. In Internet Measurement Conference (IMC), pages 311–324, Vouliagmeni, Greece, Oct 2008. Internet Measurement Conference (IMC).
- 47 M. Cotton and Leo Vegoda. RFC 5735: Special Use IPv4 Addresses, 2010. https://tools.ietf.org/html/rfc5735.
- 48 X. Meng, Z. Xu, B. Zhang, G. Huston, S. Lu, and L. Zhang. Ipv4 address allocation and the bgp routing table evolution. SIGCOMM Comput. Commun. Rev., 35(1):71–80, Jan. 2005.
- 49 R. Munroe. xkcd: MAP of the INTERNET 2006. http://blog.xkcd.com/2006/12/11/the-map-of-the-internet/.
- 50 M. Prince. The DDoS That Almost Broke the Internet. blog.cloudflare.com/the-ddos-that-almost-broke-the-internet, March 2013.
- 51 RIPE NCC. RIPE NCC delegation file. ftp://ftp.ripe.net/ripe/stats/delegated-ripencc-latest.
- 52 RIPE NCC. Routing Information Service (RIS), 2008. http://www.ripe.net/ris/.
- 53 C. Rossow. Ampliï¬cation Hell: Revisiting Network Protocols for DDoS Abuse. In Network and Distributed System Security Symposium (NDSS), NDSS ’14, 2014.
- 54 A. N. Shannon V. Spires. Exhaustive search system and method using space-filling curves. Patent, 10 2003. US 6636847.
- 55 SWITCH. Swiss Tele Communication System for Higher Education. http://www.switch.ch/.
- 56 H. Tangmunarunkit, J. Doyle, R. Govindan, W. Willinger, S. Jamin, and S. Shenker. Does as size determine degree in as topology? SIGCOMM Comput. Commun. Rev., 31(5):7–8, Oct. 2001.
- 57 The Bro Project. TCP Scan detection. bro.icir.org/sphinx/scripts/policy/misc/scan.html, 2014.
- 58 R. Wilhelm. RIPE Labs Blog: How to define address space as ‘routed’? https://labs.ripe.net/Members/wilhelm/content-how-define-address-space-routed.
- 59 E. Wustrow, M. Karir, M. Bailey, F. Jahanian, and G. Huston. Internet background radiation revisited. In 10th ACM SIGCOMM conference on Internet measurement, IMC ’10, 2010.
- 60 S. Zander, L. L. H. Andrew, G. Armitagei, and G. Huston. Estimating IPv4 Address Space Usage with Capture-recapture. In IEEE Workshop on Network Measurements (WNM 2013).