Educational Objectives: Experiencing text processing in Python, experiencing using data structures (list, set, dict) in Python, experiencing Python file IO.
Statement of Work: Implement a Python tool to analyze the properties of the data in a real-world data file.
Description:
In this assignment you will write a Python program for extracting and analyzing Internet connectivity information from the Border Gateway Protocol (BGP) routing table trace files dumped from an Internet router. It is not necessary for you to understand how BGP or the Internet works (of course, you are more than welcome to spend time and effort to learn them). In this assignment, the only thing you need to understand is the format of the trace file. Some background on BGP and its routing table is provided below.
Background on BGP and Its Routing Table
The Internet is a large collection of networks or autonomous systems (ASes). Each AS owns a number of network prefixes (range of network addresses). BGP is the inter-domain routing protocol used on the Internet for ASes to exchange the reachability information among ASes (or rather, their network prefixes). Each AS has a unique AS number. For example the AS number of the FSU campus network is 2553 and the network prefix owned by the FSU campus network is 128.186.0.0/16. Each organization typically owns one AS number (some own more than one). You can see the mapping between AS numbers and organizations at https://bgp.potaroo.net/cidr/autnums.html.
By treating each AS as a node, and the connection between neighboring ASes as a link/edge, we can represent the Internet as a graph (an AS-level graph that captures the connectivity between ASes). The AS-level graph can be derived from the BGP routing table. The following are examples of the BGP routing table data traces used in this assignment (compressed using bzip2).
The data format of the files is as follows. Each line contains a record of the BGP routing information with a fixed format. The fields in a line are separated by a vertical bar |. The 6th and 7th fields are of particular interests to the Internet inter-domain routing. The 6th field is the destination network prefix associated with this record. The 7th field is the so called ASPATH, which indicates the sequence of ASes that packets need to traverse to reach the corresponding destination network prefix. In this assignment, you only need to work on the 7th field (the ASPATH information). For example, the following line is taken from the data trace file.
TABLE_DUMP|1130191746|B|144.228.241.81|1239|128.186.0.0/16|1239 2914 174 11096 2553|IGP|144.228.241.81|0|-2|1239:321 1239:1000 1239:1011|NAG||
The 6th field is 128.186.0.0/16 (FSU campus network prefix), and the 7th field is 1239 2914 174 11096 2553 (ASPATH). The ASPATH field states that, the rightmost AS (2553, FSU campus network) originates (i.e., owns) the corresponding network prefixes (128.186.0.0/16), and on the way from the leftmost AS (1239) to the rightmost AS (2553), a packet needs to traverse the immediate ASes (2914 174 11096). The ASPATH exposes the neighboring relationship between ASes, i.e., the adjacent ASes in the ASPATH are neighbors on the Internet. For example, ASes 1239 and 2914 are neighbors on the Internet, ASes 2914 and 174 are neighbors, and so are ASes 174 and 11096. In summary, based on the 7th field of this line, AS 1239 has one neighbor (2914); AS 2914 has two neighbors (1239 and 174); AS 174 has two neighbors (2914 and 11096); AS 11096 has two neighbors (174 and 2553), and AS 2553 has one neighbor (11096).
The ASPATH in the file may have two special cases that you need to handle. First, some of the ASPATH's contain the so called ASSET's that we do not know their exact relationship. The ASSET in an ASPATH is indicated by a pair of square bracket ([]). For example, the following is an ASPATH that contains an ASSET:
1239 1668 10796 [11060 12262]
which was taken from the following line in the data trace:
TABLE_DUMP|1130191716|B|144.228.241.81|1239|24.223.128.0/17|1239 1668 10796 [11060 12262]|IGP|144.228.241.81|0|-2|1239:321 1239:1000 1239:1006|AG|24.95.80.203|
In this example, 11060 and 12262 are in an ASSET. Fortunately, ASSET always occurs at the end (right side) of the ASPATH. In this assignment, when you analyze the neighboring ASes, you should ignore the ASSET part of an ASPATH, but you still need to use the rest of the ASPATH. For example, for the above example, you still need to use the portion of the 7th field after removing the ASSET, that is, 1239 1668 10796 should be included in your analysis of AS-level graph.
Second, some AS numbers may appear multiple times in an ASPATH, for example, in the following ASPATH, AS number 7911 appears three times, and 30033 appears twice.
1239 7911 7911 7911 30033 30033
which is taken from the following line in the data trace:
TABLE_DUMP|1130191714|B|144.228.241.81|1239|8.3.43.0/24|1239 7911 7911 7911 30033 30033|IGP|144.228.241.81|0|-2|1239:123 1239:1000 1239:1011|NAG||
In the data file, if an AS number appears multiple times, they always appear consecutively (next to each other). When you analyze the data file, you should note that an AS is not the neighbor of its own. That is, you should ignore duplicated AS numbers in an ASPATH in a line.
You need to take special care of these two special cases in the data trace.
Requirement:
In this assignment, you will develop a Python tool to analyze BGP routing tables, and obtain the Internet connectivity information. The tool takes the name of the BGP dump file as a command line argument, determines neighboring ASes for each AS, and produces two output files. The first output file, top10.txt, contains the information of the top 10 ASes in terms of the number of neighbors. The top 10 ASes should be listed in the descending (non-increasing) order based on the number of neighbors they have. If two ASes have the same number of neighbors, the AS with a smaller AS number (numerically) is considered to have more neighbors (listed first). Let k denote that the total number of ASes in the data file, if k < 10, the program should just list the number of neighbors for these k ASes.
Each line in top10.txt contains the AS number of a top 10 AS, followed by a colon, a white space, and then the number of neighbors of this AS, followed by the list of neighboring AS numbers that are sorted in ascending order based on the AS number and separated by a '|' between AS numbers. Example top10.txt files are provided for you to compare.
The second output file is neighbor_count.txt, which lists the neighbor count for all ASes in the descending order based on the number of neighbors they have (same definition as that for top10.txt). Each line in neighbor_count.txt contains an AS number, followed by a colon, a white space, and then the number of neighbors of this AS. See the sample output files for example. For the corresponding dump file, your tool needs to produce exactly the same output files as the sample output files. (use 'diff' command to check).
Example run:
< linprog2:869 > python3 lastname_firstinitial_asconnectivity.py aspath.rib.20051024.2208_144.228.241.81Provided files
BGP routing information base (RIB) data trace (rib.20051024.2208_144.228.241.81.bz2)
A tar file contains four small testing BGP trace files: test_1line_a, test_1line_b, test_1line_c, test_3line_a. The tar file also includes top10 and neighbor_count files for the test files as well as the whole rib file and the rib10000 file.
Due time: September 25, 2024, 11:59pm
Submission: Name your program lastname_firstinitial_asconnectivity.py and submit on Canvas.
Grading (50 points total): Miscellaneous information If you are interested, some brief introduction of BGP is available at
here
or here. Or you can search online. Information regarding how BGP data trace is collected is
available at the route view project
site. However, as aforementioned, it is not necessary for you to really
understand any of these in order to complete the assignment. As
long as you make sure to understand the format of the data text
file, you are good to start the assignment.