这是用户在 2024-10-28 24:45 为 https://app.immersivetranslate.com/pdf-pro/f41e5a2f-c526-4503-be09-0ca895dd5e59 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
case arises when m = n m = n m=nm=n ). 2 2 ^(2){ }^{2} A related direction of future interest lies in the characterization and understanding of those regions of the Internet topology which are relatively difficult to uncover using traceroute. Such a study could conceivably lead to a better understanding on the connection between topology and routing behavior or provide further insight into relationships between topology and peering agreements.
案例出现时 m = n m = n m=nm=n )。 2 2 ^(2){ }^{2} 未来研究的一个相关方向在于对互联网拓扑中那些相对难以通过 traceroute 揭示的区域进行特征描述和理解。这项研究可能会导致对拓扑与路由行为之间联系的更好理解,或提供对拓扑与对等协议之间关系的进一步洞察。
We focus specifically on marginal utility, i.e. the incremental benefit obtained by conducting one or more additional measurements. For edge coverage, we define the marginal utility of adding targets as follows (related definitions are similar):
我们特别关注边际效用,即通过进行一个或多个额外测量所获得的增量收益。对于边缘覆盖,我们将添加目标的边际效用定义如下(相关定义类似):
Definition 4: The marginal utility of conducting edge coverage measurement i + 1 i + 1 i+1i+1 on graph G’ from a set of k k kk sources is e G ( k , i + 1 ) e G ( k , i ) e G ( k , i + 1 ) e G ( k , i ) e_(G)^(')(k,i+1)-e_(G)^(')(k,i)e_{G}^{\prime}(k, i+1)-e_{G}^{\prime}(k, i).
定义 4:从一组 k k kk 源对图 G’进行边缘覆盖测量 i + 1 i + 1 i+1i+1 的边际效用为 e G ( k , i + 1 ) e G ( k , i ) e G ( k , i + 1 ) e G ( k , i ) e_(G)^(')(k,i+1)-e_(G)^(')(k,i)e_{G}^{\prime}(k, i+1)-e_{G}^{\prime}(k, i)
This and related quantities will be the primary focus of the rest of the paper. We first study marginal utility from a purely empirical perspective, focusing on the distinction between the core of the network and feeder networks. We then return to the problem from a theoretical perspective, developing and studying an information-theoretic formalism of marginal utility in this context.
本文的其余部分将主要关注这一及相关量。我们首先从纯经验的角度研究边际效用,重点关注网络核心与馈线网络之间的区别。然后,我们从理论的角度回到这个问题,开发并研究在此背景下边际效用的信息论形式。

IV. Experimental Methodology
IV. 实验方法论

We now present the experimental methodology we used to investigate scaling behavior in the Internet. The traceroute datasets we use in this section deviate in several ways from the ideal theoretical framework we prescribed in Section III, and a significant portion of this section is devoted to a discussion of additional assumptions which we made and a description of mechanisms for post-processing of actual datasets.
我们现在介绍用于研究互联网规模行为的实验方法。我们在本节中使用的 traceroute 数据集在多个方面偏离了我们在第三节中规定的理想理论框架,本节的很大一部分将讨论我们所做的额外假设以及对实际数据集进行后处理的机制的描述。

A. Internet Trace Data A. 互联网追踪数据

The topology data used in this work was supplied by the Skitter project at CAIDA. The Skitter project has a number of goals including Internet mapping, route characteristic analysis and performance analysis. At the time the primary dataset for this study was collected (May 2000), the Skitter infrastructure consisted of 16 source nodes deployed around the world; we received data from 8 of those nodes. Each source node sends traceroute-like probes to destination nodes located world-wide. All of the destination nodes are Web servers. Our primary data set contains results from traces run to 1277 destinations; The source nodes and the corresponding upstream providers (listed in
本研究中使用的拓扑数据由 CAIDA 的 Skitter 项目提供。Skitter 项目有多个目标,包括互联网映射、路由特征分析和性能分析。在本研究的主要数据集收集时(2000 年 5 月),Skitter 基础设施由部署在全球的 16 个源节点组成;我们从其中 8 个节点接收数据。每个源节点向位于全球的目标节点发送类似 traceroute 的探测。所有目标节点都是 Web 服务器。我们的主要数据集包含对 1277 个目标的跟踪结果;源节点及其相应的上游提供商(列在
parentheses) were located in Hamilton, NZ (University of Waikato); Tokyo, Japan (APAN), Singapore, SG (provider unknown); San Jose, CA, USA (Worldcorn); San Jose, CA, USA (Qwest); Ottawa, CA (CANET); London, UK (RIPE); and Washington DC, USA (AboveNet). On average, probes are sent to each destination once every 30 minutes. While it is not clear precisely how destinations for destinations are selected in Skitter, the Skitter web site states that destinations are randomly sampled from a “crawl of IP address space” [ 11]. We also include results from a larger dataset with 12 sources and over 300,000 destinations. This dataset includes the eight sites listed above, with the exception of Singapore, plus Marina Del Rey, CA, USA (ISI); Moffett Field, CA, USA (NASA), Palo Alto, CA, USA; San Diego, CA, USA (CAIDA) and London, UK (AboveNet).
括号内的地点位于新西兰汉密尔顿(怀卡托大学);日本东京(APAN);新加坡(提供者未知);美国加利福尼亚州圣荷西(Worldcorn);美国加利福尼亚州圣荷西(Qwest);加拿大渥太华(CANET);英国伦敦(RIPE);以及美国华盛顿特区(AboveNet)。平均而言,每个目的地每 30 分钟发送一次探测。虽然不清楚 Skitter 是如何选择目的地的,但 Skitter 网站表示,目的地是从“IP 地址空间的爬虫”中随机抽样的[11]。我们还包括来自一个更大数据集的结果,该数据集包含 12 个来源和超过 300,000 个目的地。该数据集包括上述八个站点,除了新加坡外,还有美国加利福尼亚州马里纳德尔雷(ISI);美国加利福尼亚州莫菲特场(NASA);美国加利福尼亚州帕洛阿尔托;美国加利福尼亚州圣地亚哥(CAIDA)和英国伦敦(AboveNet)。

B. Node and Edge Classification
B. 节点和边分类

Using the experimental results we gathered, it was immediately apparent that the network graph under observation was not a random network, but consisted of two constituent components: 1) a central routing core, and 2) a set of “feeder” links which feed into the backbone. We then focused on how successfully traceroute could be used when applied to identifying these two constituent components, which had evidently different properties. A central challenge to doing so is to develop an automated procedure which classifies nodes (and edges) into these categories. Using the terminology of Zegura et al [33] to describe their GT-ITM topology generator, we assume that there is a natural and identifiable separation between transit domains, which comprise the Internet backbone, and stub domains, which only transit traffic either originating or terminating in their domain. In this model, the set of transit domains typically forms a highly connected backbone, with a number (at least two and often many more) of node-disjoint paths between any two transit domains, while stub domains typically consist of trees with a single connection to the transit domain backbone.
根据我们收集的实验结果,观察到的网络图显然不是随机网络,而是由两个组成部分构成的:1)一个中央路由核心,2)一组“馈线”链接,连接到主干。我们随后专注于 traceroute 在识别这两个明显具有不同属性的组成部分时的成功程度。实现这一目标的一个主要挑战是开发一种自动化程序,将节点(和边)分类到这些类别中。使用 Zegura 等人 [33] 描述其 GT-ITM 拓扑生成器的术语,我们假设在传输域(构成互联网主干)和存根域(仅转发其域内起源或终止的流量)之间存在自然且可识别的分离。在这个模型中,传输域的集合通常形成一个高度连接的主干,任何两个传输域之间通常有多个(至少两个,通常更多)节点不相交的路径,而存根域通常由与传输域主干单一连接的树状结构组成。
The objective of our classification algorithm is to take our observations of a topology and determine the boundary between where the backbone ends and stub domains begin based on the available evidence. There are a number of reasons why our classification procedure may fail to classify nodes correctly - in future work, we intend to conduct validation trials to measure the effectiveness of our classification methods from traceroute measurements. Routes to destinations which did not respond to the traceroute requests were discarded, but routes in which intermediate hosts failed to respond to ICMP requests were included. Even using a relatively small number of measure-
我们分类算法的目标是根据可用证据,利用对拓扑的观察来确定主干结束和存根域开始之间的边界。我们的分类程序可能无法正确分类节点的原因有很多——在未来的工作中,我们打算进行验证试验,以测量我们基于 traceroute 测量的分类方法的有效性。未对 traceroute 请求作出响应的目的地的路由被丢弃,但未对 ICMP 请求作出响应的中间主机的路由被纳入考虑。即使使用相对较少的测量数量,

ment sites, a clear distinction between backbone links and stub links in this subgraph G’ emerged (we will demonstrate this and quantify how much error was removed from our classification process as the number of measurements increased).
在这个子图 G’ 中,主干链接和支链链接之间出现了明显的区别(我们将展示这一点,并量化随着测量次数的增加,我们的分类过程去除了多少错误)。
Given this subgraph, our classification procedure now amounts to a labelling of the nodes and edges of G’. To this end, nodes which correspond to routers and Internet hosts are classified as core muters, border routers, stub routers and leaf nodes. O O O\mathbf{O} ur node classification procedure is performed as follows. First, leaf nodes are identified and labelled as such, and edges adjoining leaf nodes are classified as stub links. Then, in a bottom-up fashion, internal nodes which adjoin a set of edges all but one of which are stub links, are classified as stub routers.
鉴于这个子图,我们的分类程序现在相当于对 G'的节点和边进行标记。为此,与路由器和互联网主机对应的节点被分类为核心路由器、边界路由器、存根路由器和叶节点。我们的节点分类程序如下进行。首先,识别并标记叶节点,然后与叶节点相邻的边被分类为存根链接。接着,以自下而上的方式,对所有相邻边中只有一条不是存根链接的内部节点进行分类,标记为存根路由器。
Upon completion of this procedure, the logical trees forming the visible portion of stub domains in G’ are established. The remaining unclassified nodes all satisfy the property that at least two of their incident edges are unlabeled - that entire unlabeled portion of the graph G’ is the network backbone, and we classify it as such. Within the network backbone, unlabelled nodes which adjoin at least one stub link are classified as border routers, all remaining nodes are classified as core routers, and those links which are not yet classified are backbone links. Figure 1 provides a simple diagram of the results of a classification procedure.
完成此过程后,形成 G' 中可见部分的存根域的逻辑树已建立。其余未分类的节点都满足至少有两个相邻边未标记的属性——整个未标记的图 G' 部分是网络骨干,我们将其分类为网络骨干。在网络骨干中,至少与一个存根链接相邻的未标记节点被分类为边界路由器,所有其余节点被分类为核心路由器,而那些尚未分类的链接则被视为骨干链接。图 1 提供了分类过程结果的简单示意图。

C. Coverage vs. Marginal Coverage
C. 覆盖率与边际覆盖率

In the examples we have described so far, our classification procedure labels the subset of Internet nodes and links visible in one or more of the end-to-end measurements in our study. Since we are primarily interested in characterizing the Internet backbone, and since we have no expectation of completely mapping stub domains, we would ideally like to measure the coverage of the Internet backbone achieved by our experiments, using the definitions presented in Section III. However, this approach is infeasible, as the exact topology of the graph which comprises this backbone is not known a posteriori. While we cannot measure total coverage directly, we can measure the marginal improvement in coverage as we conduct additional measurements. To quantify this approach, we take the aggregated information from all of the collected traces as the baseline graph for our study, and measure how well small subsets of the measurements manage to cover that baseline graph. This point highlights an important distinction between marginal coverage and overall coverage - the fact that additional measurements may provide low marginal coverage does not necessarily imply that the overall coverage obtained is high - it may be the case that the cov-
在我们迄今为止描述的例子中,我们的分类程序标记了在我们研究中一个或多个端到端测量中可见的互联网节点和链接的子集。由于我们主要关注对互联网骨干的特征描述,并且我们并不期望完全映射存根域,因此我们理想情况下希望使用第 III 节中提出的定义来测量我们的实验所实现的互联网骨干覆盖率。然而,这种方法是不可行的,因为构成该骨干的图的确切拓扑在事后并不知道。虽然我们无法直接测量总覆盖率,但我们可以在进行额外测量时测量覆盖率的边际改善。为了量化这种方法,我们将所有收集到的跟踪信息汇总为我们研究的基线图,并测量小子集的测量在多大程度上能够覆盖该基线图。这一点强调了边际覆盖和整体覆盖之间的重要区别——额外的测量可能提供低边际覆盖,但这并不一定意味着获得的整体覆盖率很高——可能情况是覆盖率..

erage is poor, but the additional measurements chosen are not productive. 3 3 ^(3){ }^{3}
平均水平较差,但所选择的额外测量并不具有生产力。 3 3 ^(3){ }^{3}

D. Interface Disambiguation
D. 接口消歧义

One of the unfortunate issues about building network maps based on traceroute is the existence of routers with multiple interfaces, each with different network addresses. This issue is pervasive - in our study we found that nearly twenty percent of all the nodes we classified as backbone nodes used multiple interfaces with distinct IP addresses to transmit packets. Clearly, studies which disregard this issue, by treating each distinct Internet address as if it were a distinct node, generate inaccurate maps.
基于 traceroute 构建网络地图的一个不幸问题是存在具有多个接口的路由器,每个接口都有不同的网络地址。这个问题普遍存在——在我们的研究中,我们发现近百分之二十的所有被我们分类为骨干节点的节点使用多个具有不同 IP 地址的接口来传输数据包。显然,忽视这个问题的研究,通过将每个不同的互联网地址视为一个独立节点,会生成不准确的地图。
The technique we employed to disambiguate multiple interfaces at a single node uses the same basic principle as the one originally used by Pansiot and Grad [19]. The key to this technique is that when transmitting an ICMP message, a router will typically transmit that packet with a source address equal to that of the outbound interface on which the packet is sent. Therefore, if one suspects that a router has two interfaces I 1 I 1 I_(1)I_{1} and I 2 I 2 I_(2)I_{2}, one can transmit a UDP packet to an unused port at each of those interfaces from a common source. If the interfaces are in fact on the same router, the router will respond with two ICMP Port Unreachable messages, both of which will have the same source address I 3 I 3 I_(3)I_{3}, possibly equal to I 1 I 1 I_(1)I_{1} or I 2 I 2 I_(2)I_{2}. By performing post-hoc probes of this form from a common source (Boston University) to all potentially distinct interfaces, we are able to detect and collapse hosts with duplicate interfaces. Unfortunately, this technique is not infallible. First, approximately 10 % 10 % 10%10 \% of the core routers never responded to UDP messages transmitted to unknown ports; others respond extremely sporadically - we conjecture that the likelihood of response may be correlated with the load on the router. For those routers, disambiguation appears to be impossible with this current technique. Second, our technique relies upon routers responding with a source address equal to the outbound interface. If routers instead respond with a source address equal to the UDP destination address, our technique would be rendered useless. We have no way of estimating the likelihood of this event; however, the fact that we frequently observe routers which respond with addresses which differ from the target address gives us some informal level of confidence that routers do in fact behave according to specification.
我们采用的在单个节点上消歧义多个接口的技术与 Pansiot 和 Grad 最初使用的基本原理相同。该技术的关键在于,当路由器传输 ICMP 消息时,通常会以发送该数据包的出接口的源地址来传输该数据包。因此,如果怀疑某个路由器有两个接口 I 1 I 1 I_(1)I_{1} I 2 I 2 I_(2)I_{2} ,可以从一个公共源向这两个接口的未使用端口发送 UDP 数据包。如果这些接口确实在同一个路由器上,路由器将会回应两个 ICMP 端口不可达消息,这两个消息的源地址都将相同 I 3 I 3 I_(3)I_{3} ,可能等于 I 1 I 1 I_(1)I_{1} I 2 I 2 I_(2)I_{2} 。通过从一个公共源(波士顿大学)对所有可能不同的接口进行这种形式的事后探测,我们能够检测并合并具有重复接口的主机。不幸的是,这种技术并不是万无一失的。 首先,大约 10 % 10 % 10%10 \% 的核心路由器从未对发送到未知端口的 UDP 消息作出响应;其他路由器的响应极为偶发——我们推测响应的可能性可能与路由器的负载相关。对于这些路由器,使用当前技术进行消歧义似乎是不可能的。其次,我们的技术依赖于路由器以等于出接口的源地址进行响应。如果路由器反而以等于 UDP 目标地址的源地址进行响应,我们的技术将变得无效。我们无法估计这种情况发生的可能性;然而,我们经常观察到路由器以与目标地址不同的地址进行响应,这在某种程度上给了我们一些非正式的信心,表明路由器确实按照规范行为。
Fig. 2. Class of nodes and interfaces discovered as sources are added (greedily) when classification is not known a priori.
图 2. 当分类未知时,随着源的添加(贪婪地)发现的节点和接口类别。

E. Accuracy of Classification
E. 分类的准确性

One central aspect of node classification is the accuracy with which we perform classification. With a small number of sources (less than five), many backbone nodes are misclassified as either stub nodes or border nodes by virtue of the fact that the observable Internet is the union of a small number of trees. Figure 2 depicts the relative classification of nodes and links as sources are increased. In some plots in this paper, the order in which sources are added has a significant impact on the overall results. A greedy ordering adds the sources in the order which maximizes at each step the total number of distinct nodes observed. A random ordering averages over a set of trials in which sources are added purely at random (without replacement) for each trial. In the context of accuracy of classification, behavior of greedy and random orderings were similar; the greedy ordering is depicted.
节点分类的一个核心方面是我们进行分类的准确性。当源数量较少(少于五个)时,许多主干节点被错误分类为存根节点或边界节点,因为可观察的互联网是少数树的并集。图 2 描绘了随着源数量增加,节点和链接的相对分类。在本文的一些图中,添加源的顺序对整体结果有显著影响。贪婪排序按照每一步最大化观察到的不同节点总数的顺序添加源。随机排序则是在每次试验中随机(不放回)添加源的试验集的平均值。在分类准确性的背景下,贪婪和随机排序的行为相似;贪婪排序被描绘出来。
As we increase the number of sources, our classification procedure increases in accuracy. For example, once we have amassed sufficient evidence to classify a node as a backbone or border router, no set of additional measurements will reverse that classification decision. On the other hand, nodes which we initally classify as part of a stub domain may in fact be backbone nodes, and we may uncover evidence to that effect with additional measurements. In general, we expect to underestimate the fraction of backbone nodes and overestimate the fraction of stub nodes in our classification. The diagram in Figure 2 quantifies that intuition when the number of measurement sites is small, but it is also interesting to note that for this dataset, classification stabilizes after only about five measurement sites (vantage points) are used.
随着我们增加源的数量,我们的分类程序的准确性也会提高。例如,一旦我们积累了足够的证据将一个节点分类为骨干路由器或边界路由器,任何额外的测量都不会改变这一分类决定。另一方面,我们最初将某些节点分类为存根域的一部分,但实际上它们可能是骨干节点,我们可能会通过额外的测量发现相关证据。一般来说,我们预计在分类中会低估骨干节点的比例,并高估存根节点的比例。图 2 中的图示量化了这一直觉,当测量站点数量较少时,但有趣的是,对于这个数据集,分类在仅使用大约五个测量站点(观察点)后就会稳定下来。

F. Limitations of the Approach
F. 方法的局限性

The metrics we propose are difficult to use directly, first because the graph which comprises the Internet is neither fixed nor given in advance. Moreover, even if the graph
我们提出的度量难以直接使用,首先是因为构成互联网的图既不是固定的,也不是事先给定的。此外,即使图形
#of Interfac| 接口数量 1 2 3 4 5 6 7 10
# of Routers 路由器数量 4892 602 169 54 29 13 3 1
#of Interfac| 1 2 3 4 5 6 7 10 # of Routers 4892 602 169 54 29 13 3 1 | #of Interfac\| | 1 | 2 | 3 | 4 | 5 | 6 | | 7 | 10 | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | # of Routers | 4892 | 602 | 169 | 54 | 29 | 13 | 3 | 1 | |
Fig. 3. Distribution of observed interface density across routers.
图 3. 路由器之间观察到的接口密度分布。

comprising the Internet were known in advance, our measures of coverage may fluctuate, since the behavior of the routing algorithms in the Internet is non-deterministic, due to the effect of routing policies [28], [32]. Also, while one might hope to quantify topology scaling laws on certain classes of graphs (such as on power-law random graphs) when shortest-path routing is in effect, policy-based routing at the level of AS’s skews (or “inflates”) routes, making the problem of accurately modeling these scaling laws much more difficult. We note that factors ranging from a wide variety of routing metrics and protocols, variability in network load, and policy-based economic agreements between autonomous systems cause the routes chosen to be quite different than an observer with access only to topology information might expect.
由于互联网的组成在事先已知,我们的覆盖度测量可能会波动,因为互联网中路由算法的行为是非确定性的,这受到路由策略的影响。此外,尽管人们可能希望在某些类别的图(例如幂律随机图)上量化拓扑缩放法则,但在最短路径路由生效时,基于策略的路由在自治系统(AS)层面上扭曲(或“膨胀”)了路由,使得准确建模这些缩放法则的问题变得更加困难。我们注意到,从各种路由指标和协议、网络负载的变化,到自治系统之间基于政策的经济协议等因素,导致所选择的路由与仅能访问拓扑信息的观察者所期望的有很大不同。

V. Results V. 结果

The results in this section are divided into five parts: (1) the results of interface disambiguaton run on all nodes in the primary data set, (2) a quantitative evaluation of the number of nodes and links discovered in the backbone as the number of sources and destinations vary, (3) an evaluation of the estimated distribution of node degree in the backbone as the number of sources and destinations vary, (4) fitting the evidence of these evaluations to statistical models and (5) assessing the accuracy of the node classification procedure itself.
本节的结果分为五个部分:(1)在主要数据集中的所有节点上运行的接口消歧结果,(2)随着源和目的地数量的变化,在主干中发现的节点和链接数量的定量评估,(3)随着源和目的地数量的变化,对主干中节点度数估计分布的评估,(4)将这些评估的证据拟合到统计模型中,以及(5)评估节点分类过程本身的准确性。

A. Results of running the disambiguation procedure
A. 运行消歧义程序的结果

Approximately three weeks after the traceroute data was collected by CAIDA, we ran our interface disambiguation tool to all network interfaces which we had classified as part of the network backbone. An early lesson we learned in our preliminary experiments with the disambiguation software was that a substantial fraction of routers responded to our probes with very low frequency. In an effort to elicit responses from as many responding interfaces as possible, we transmitted five ICMP messages to each interface every twenty minutes for 12 successive hours.
大约在 CAIDA 收集跟踪路由数据后三周,我们对所有被我们分类为网络骨干的一部分的网络接口运行了我们的接口消歧义工具。我们在初步实验中学到的一个早期教训是,很多路由器对我们的探测响应频率非常低。为了尽可能多地引发响应,我们每 20 分钟向每个接口发送五个 ICMP 消息,持续 12 个小时。
Of the 7451 interfaces on our list, 6510 responded to one or more of our probes and the remaining 941 (12.6%) never responded. We recorded pairs of the form [Target Address, Response Address] and recorded 6709 distinct pairs from the 6510 targeted interfaces which responded. We suspect that this slight (3%) discrepancy is due to route fluctuation affecting the first hop of the return path to B.U.
在我们列表中的 7451 个接口中,6510 个对我们的一次或多次探测做出了响应,其余 941 个(12.6%)从未响应。我们记录了形式为[目标地址,响应地址]的对,并从 6510 个响应的目标接口中记录了 6709 个不同的对。我们怀疑这个轻微的(3%)差异是由于路由波动影响了返回路径的第一跳到 B.U.

Fig. 4. Number of nodes discovered as sources are added (greedily)
图 4. 随着源的增加发现的节点数量(贪婪地)

and does not represent anomalous behavior. The next step we took was to represent the set of addresses present in our list of pairs as nodes in a graph. We drew a correspondence between each connected component of this graph and a single router, where the nodes of the component correspond to distinct addresses for interfaces of the router. Using this strategy, the 6510 targeted interfaces mapped to 5763 distinct routers. The distribution of multiple interfaces we observed is depicted in Figure 3. Using the results in this table, we observed an incidence rate of multiple interfaces of 871 5763 = 15.1 % 871 5763 = 15.1 % (871)/(5763)=15.1%\frac{871}{5763}=15.1 \%.
并不代表异常行为。我们采取的下一步是将我们对地址对的列表中的地址集合表示为图中的节点。我们将图的每个连通分量与单个路由器建立了对应关系,其中分量的节点对应于路由器接口的不同地址。使用这种策略,6510 个目标接口映射到 5763 个不同的路由器。我们观察到的多个接口的分布如图 3 所示。根据此表中的结果,我们观察到多个接口的发生率为 871 5763 = 15.1 % 871 5763 = 15.1 % (871)/(5763)=15.1%\frac{871}{5763}=15.1 \%
In the results below, we have the goal of taking measurements over a set of paths which cover at least n distinct nodes (resp. links) in the Internet. Our first set of experiments demonstrates rapidly diminishing marginal returns as sources are added to trace routes to the full set of 1277 destinations, while our second set demonstrates nearly constant marginal returns as destinations are incrementally added to a destination set targeted by the full set of 8 sources.
在下面的结果中,我们的目标是对一组路径进行测量,这些路径覆盖至少 n 个不同的节点(或链接)在互联网中。我们的第一组实验表明,随着源节点的增加,追踪到 1277 个目的地的边际收益迅速递减,而我们的第二组实验则表明,随着目的地逐步添加到由 8 个源节点组成的目标目的地集合中,边际收益几乎保持不变。
In Figures 4 and 5, we demonstrate how the node coverage and link coverage in the Internet improve as sources are added. In both of these plots, there is pronounced evidence of diminishing returns as sources are added, which is highly evident even when running traceroute between a small number of sources (8) and a much larger number of destinations (1277). In each figure we also demonstrate the effect of node and link discovery before and after interface disambiguation.
在图 4 和图 5 中,我们展示了随着源的增加,互联网中的节点覆盖率和链路覆盖率是如何提高的。在这两个图中,随着源的增加,收益递减的明显证据非常明显,即使在少量源(8 个)和大量目的地(1277 个)之间运行 traceroute 时也是如此。在每个图中,我们还展示了在接口消歧之前和之后节点和链路发现的效果。
In Figures 6 and 7, we demonstrate how the node coverage and link coverage in the Internet improve as destinations are added. In both of these plots, there is a relatively constant addition as destinations are added. A simple slope calculation shows that after 200 destinations, approximately 3 new nodes are discovered and 4 new links are discovered when a new destination is added. Each
在图 6 和图 7 中,我们展示了随着目标的增加,互联网中的节点覆盖率和链接覆盖率是如何提高的。在这两个图中,随着目标的增加,增加的幅度相对恒定。简单的斜率计算表明,在添加 200 个目标后,每增加一个新目标,大约会发现 3 个新节点和 4 个新链接。每个

Fig. 5. Number of links discovered as sources are added (greedily)
图 5. 随着源的添加(贪婪地)发现的链接数量

Fig. 6. Number of nodes discovered as destinations are added (randomly). Each line is for a single source
图 6. 随着目标节点(随机)添加,发现的节点数量。每条线代表一个单独的源。

of these figures shows effects after interface disambiguation. Results for interface discovery are approximately the same.
这些数字显示了接口消歧后的效果。接口发现的结果大致相同。
Next, we break down node discovery by node classification. In Figure 8 we show how nodes and interfaces are discovered as sources are added when the node classification is known a priori. This result shows that we primarily discover new backbone nodes and interfaces as additional sources are added, but backbone discovery show diminishing marginal utility.
接下来,我们按节点分类分解节点发现。在图 8 中,我们展示了当节点分类事先已知时,随着源的添加,节点和接口是如何被发现的。这个结果表明,随着额外源的添加,我们主要发现新的骨干节点和接口,但骨干发现的边际效用递减。

Fig. 7. Number of links discovered as destinations are added (randomly). Each line is for a single source
图 7. 随着目标的随机添加,发现的链接数量。每条线代表一个单一来源。

Fig. 8. Class of nodes and interfaces discovered as sources are added (greedily) when classification is known a priori.
图 8. 当已知分类时,随着源的添加(贪婪地)发现的节点和接口类别。

C. Contour Plots C. 等高线图

The following diagrams plot the scaling behavior of the subgraphs induced by IP routing for the topologies observed via the CAIDA trace data, assuming that each of the CAIDA sources and destinations reflects a randomly chosen vertex in the graph. In particular, we study the behavior of the function v G ( k , m ) v G ( k , m ) v_(G)(k,m)v_{G}(k, m) as k k kk and m m mm vary. The values of k k kk and m m mm are plotted along the x x xx and y -axes, respectively. Each labelled contour, or isoline, represents the discovery of a fixed constant number of nodes, such that all sets of measurements corresponding to a point ( x , y ) ( x , y ) (x,y)(x, y) along a contour have an equal value of v G v G v_(G)v_{G}. Our experiments were constrained by the fact that we have a limited number of sources, and a much larger set of destinations, so we are unable to plot a full square’s worth of data. Another point regarding symmetry: if both sources and destinations are chosen uniformly at random from all locations in the Internet, then the labels of source and destination are arbitrary, which implies that i , j i , j AA i,j\forall i, j, points (i, j j jj ) and ( j , i j , i j,ij, i ) lie along the same contour.
以下图表绘制了由 IP 路由引发的子图的缩放行为,基于通过 CAIDA 追踪数据观察到的拓扑,假设每个 CAIDA 源和目标反映图中的一个随机选择的顶点。特别地,我们研究函数 v G ( k , m ) v G ( k , m ) v_(G)(k,m)v_{G}(k, m) k k kk m m mm 变化时的行为。 k k kk m m mm 的值分别沿 x x xx 和 y 轴绘制。每个标记的轮廓或等值线表示发现固定常数数量的节点,使得与轮廓上某一点 ( x , y ) ( x , y ) (x,y)(x, y) 对应的所有测量集具有相等的 v G v G v_(G)v_{G} 值。我们的实验受到限制,因为我们只有有限数量的源,而目标数量要大得多,因此无法绘制完整的方形数据。关于对称性的另一个观点:如果源和目标都是从互联网的所有位置均匀随机选择的,那么源和目标的标签是任意的,这意味着 i , j i , j AA i,j\forall i, j ,点(i, j j jj )和( j , i j , i j,ij, i )位于同一轮廓上。
The depiction shown in Figure 9 4 9 4 9^(4)9{ }^{4} gives preliminary evidence that the isolines do not follow hyperbolas of the form x x xx x y = k y = k y=ky=k, which would hold in the event that each pair of measurements is equivalently useful. Instead, and pending further study on larger trace data, these contour plots indicate that striking a balance between sources and destinations is relatively less important than making use of a large number of sites overall, which can be done relatively easily by employing more passive targets, rather than requiring deployment of more active measurement infrastructure.
9 4 9 4 9^(4)9{ }^{4} 所示的描绘提供了初步证据,表明等值线并不遵循形式为 x x xx x y = k y = k y=ky=k 的双曲线,这在每对测量值同样有用的情况下是成立的。相反,待对更大追踪数据进行进一步研究,这些等高线图表明,在源和目的地之间取得平衡相对不如充分利用大量站点重要,这可以通过使用更多的被动目标相对容易地实现,而不需要部署更多的主动测量基础设施。
Destinations600 目的地 600
Fig. 9. Backbone node discovery as both sources and destinations arc varied
图 9. 随着源和目的地的变化,骨干节点发现情况

D. Estimating the distribution of node degree in the backbone
D. 估计主干中节点度的分布

As the number of measurement sources increases, the distribution of node degree in the discovered portion of the backbone shown in Figures 10 and 11 (especially in the tail) changes. We calculated the root mean square difference to measure the differences in the distributions as we add nodes, which is shown in Figure 12. Surprisingly, the distribution on node degree in the backbone which we observe after taking measurements from a single site (forming a tree to the sources) is both visibly similar and similar with respect to the RMSE metric to the more refined distribution we identify with subsequent measurements. Quantifying the refinement in our measured distribution over time, in Figure 11, it appears that the weight in the tail may actually diminish somewhat as the number of measurements increase. 5 5 ^(5){ }^{5} Another interesting point is that in the RMSE plot in Figure 12, the error actually increases after source 6 is added. Unlike node and edge coverage, which never decrease as additional measurements are conducted, the estimated node degree distribution may in fact become less accurate.
随着测量源数量的增加,图 10 和图 11 中显示的主干部分的节点度分布(特别是在尾部)发生了变化。我们计算了均方根差异,以测量在添加节点时分布的差异,如图 12 所示。令人惊讶的是,在从单个站点进行测量后(形成到源的树)观察到的主干节点度分布在可见上与我们通过后续测量识别的更精细分布在 RMSE 指标上都非常相似。量化我们测量分布随时间的细化,在图 11 中,似乎随着测量数量的增加,尾部的权重实际上可能会有所减少。另一个有趣的点是,在图 12 的 RMSE 图中,添加源 6 后,误差实际上增加。与节点和边覆盖不同,后者在进行额外测量时从不减少,估计的节点度分布实际上可能变得不那么准确。
We conducted a similar analysis considering how the addition of destination nodes affects backbone node degree distributions. In Figures 13 and 14 we see the distribution of backbone node degree when all sources are used to trace to increasing numbers of destinations in groups of 100 . The figures show that while the body of the dis-
我们进行了类似的分析,考虑到目标节点的增加如何影响骨干节点的度分布。在图 13 和图 14 中,我们可以看到当所有源节点用于追踪到以 100 为一组的增加数量的目标节点时,骨干节点度的分布。这些图表显示,尽管分布的主体部分...
Fig. 10. CDF of backbone node degree as sources are added (randomly)
图 10. 随着源的随机添加,骨干节点度的 CDF

Fig. 11. Tail of CDF of backbone node degree as sources are added (randomly)
图 11. 随着源的添加(随机),骨干节点度的 CDF 尾部

tribution stays relatively constant as destination nodes are added, the tail weight increases as destination nodes are added.
随着目标节点的增加,分布保持相对稳定,而尾部权重随着目标节点的增加而增加。

E. Comparison to Larger Dataset
E. 与更大数据集的比较

The results so far provide considerable insight into IP routing patterns but the limited size of the node set covered makes it hard to extend our conclusions to the Internet as a whole. To address this we examine a much larger dataset to see whether it shows similar patterns of diminishing returns when adding measurement sites.
到目前为止的结果为 IP 路由模式提供了相当大的洞察,但所覆盖的节点集的有限规模使得我们很难将结论扩展到整个互联网。为了解决这个问题,我们检查了一个更大的数据集,以查看在添加测量站点时是否显示出类似的收益递减模式。
The second data sets consists of 12 sources and 313,709
第二个数据集包含 12 个来源和 313,709 条数据

Fig. 12. Root mean squared error difference in backbone node degree distributions
图 12. 主干节点度分布的均方根误差差异

Fig. 13. CDF of backbone node degree as destinations are added (randomly)
图 13. 随着目标节点的随机添加,骨干节点度数的累积分布函数(CDF)

Fig. 14. Tail of CDF of backbone node degree as destinations are added (randomly)
图 14. 随着目标节点的随机添加,骨干节点度数的 CDF 尾部

destinations; thus it is more than 10 times the size of the first data set. This dataset was gathered in October, 2000. Source locations for this dataset were Hamilton, NZ; San Jose, CA, USA; London, UK; Marina del Rey, CA, USA; Palo Alto, CA, USA; Tokyo, JP; Ottawa, CA; London, UK; Moffett Field, CA, USA; Washington, DC, USA; San Jose, CA, USA; and San Diego, CA, USA.
目的地;因此它的大小是第一个数据集的 10 倍以上。该数据集是在 2000 年 10 月收集的。该数据集的来源地点包括新西兰汉密尔顿、美国加利福尼亚州圣荷西、英国伦敦、美国加利福尼亚州马里纳德雷、美国加利福尼亚州帕洛阿尔托、日本东京、加拿大渥太华、英国伦敦、美国加利福尼亚州莫菲特场、美国华盛顿特区、美国加利福尼亚州圣荷西和美国加利福尼亚州圣地亚哥。
Unlike the first data set, in this case sources did not trace routes to a common set of destinations. In fact, no destination in this set is common to all sources. Furthermore, the considerable size of this node set makes it much more difficult to disambiguate interfaces, so our results are in terms of interfaces rather than nodes (routers).
与第一个数据集不同,在这种情况下,源并没有追踪到一组共同的目的地。实际上,这个数据集中没有任何目的地是所有源共有的。此外,这个节点集的相当大规模使得区分接口变得更加困难,因此我们的结果是以接口而不是节点(路由器)为单位。
In Figure 15 we show how the number of interfaces discovered grows as we add sources greedily. In this case, adding a source means that we add all the measurement paths originating from that source. The line labelled “interfaces” denotes the number of interfaces that would have been discovered had each source been used independently from the others. In Figure 16 we show how the number of interface-interface links discovered grows as we add sources. Presumably each individual interface-interface link corresponds to a router-router link, so for this plot the distinction between nodes and interfaces is less important.
在图 15 中,我们展示了随着贪婪地添加源,发现的接口数量是如何增长的。在这种情况下,添加一个源意味着我们添加所有来自该源的测量路径。标记为“接口”的线表示如果每个源独立于其他源使用,将会发现的接口数量。在图 16 中,我们展示了随着添加源,发现的接口-接口链接数量是如何增长的。可以推测,每个单独的接口-接口链接对应于一个路由器-路由器链接,因此在这个图中,节点和接口之间的区别不那么重要。
These figures show a declining slope as sources are
这些数字显示出随着来源的减少而呈下降趋势

  1. 'While our work is primarily experimental in nature, we believe that the theoretical study of these properties on graphs of interest (such as power-law graphs vs. random graphs) with idealized routing algorithms (such as use of shortest-path routes) may help provide deeper insight.
    虽然我们的工作主要是实验性的,但我们相信对这些属性在感兴趣的图(例如幂律图与随机图)上进行理论研究,结合理想化的路由算法(例如使用最短路径路由),可能有助于提供更深入的见解。
  2. 3 3 ^(3){ }^{3} An analogous situation arises when choosing black-box test cases to provide coverage of code paths in a software module.
    在选择黑箱测试用例以覆盖软件模块中的代码路径时,会出现类似的情况。
  3. 4 4 ^(4){ }^{4} We excluded the one anomalous source which only reached 184 destinations since its inclusion would dramatically alter the results displayed in this figure.
    我们排除了那个仅达到 184 个目的地的异常来源,因为其纳入会显著改变此图中显示的结果。
  4. 'There are several explanations for why this may arise in our datasets which we are currently investigating, including a statistically insignificant sample size, effects from hosts with multiple interfaces, or issues inherent in the measurement set-up. We plan on re-running this experiment with other orderings of the sources as part of our further investigation.
    我们正在调查为什么在我们的数据集中可能出现这种情况的几种解释,包括样本量统计上不显著、具有多个接口的主机的影响,或测量设置固有的问题。我们计划在进一步调查中,以其他源的排序重新进行此实验。