DC Metro Network Analysis

Network analysis is often applied to the social and online systems around us, for example, Facebook friend networks. A more concrete example of a physical network, however, is a metro system.

The DC metro is a straightforward example of a fully connected network system. The network can be described as nodes (stations) and edges (railway paths). With each additional connection to other stations, a station’s importance to the network becomes more critical.

Given this information, how can we understand the importance of each metro station? Metro nodes can be described with three main metrics:

  1. Degree: How many other stations a rider can reach from a given station

  2. Betweenness Centrality: How much control a station has on the ability for a rider to reach other stations

  3. Closeness Centrality: How close a given station is to other stations

To understand the degree, betweenness, and closeness of a given station, we first need to represent the metro system in a dataset that the computer can understand. The best way to accomplish this is through an adjacency matrix of stations. Luckily, the Washington Metropolitan Area Transit Authority (WMATA) provides a series of API products that make the fetching of station and line data easy. Code for how to build the adjacency matrix is in my Github Repo. I was surprised this data was not available anywhere I looked online, so I hope it is useful for anyone else who wants to explore metro analytics.

Once the network is expressed in an adjacency matrix, Python can help us calculate the network statistics discussed above. Finally, we can overlay those stats on top of network line and station shapefiles provided by WMATA and visualized via geopandas. Hover over points to view node statistics and press the magnifying glass to zoom.

Before I started this project, I assumed Metro Center would be the most important station based on my ridership patterns while living in DC. However, given the findings on this analysis, it looks like L’Enfant Plaza is the most important! L’Enfant has a degree of 5, betweenness of .571, and closeness of .144, not to mention an underground food court. Honorable mention goes to the Pentagon, the most important station outside of DC in terms of betweenness and closeness centrality.

Instead of using set shapefiles to draw the network, what if we let the computer decide how it should look? Force-directed algorithms allow us to visualize networks as nodes and edges that repel against each other based on a given charge. The D3 JavaScript library is a popular tool for building force-directed networks — I implemented D3 to create a free-flowing DC metro system below. Pull a node with your mouse and watch the network update positions.

Through network analysis, we can understand which stations are most important in terms of degree and centrality, then visualize the network with Python Bokeh or D3. A subsequent step would be to weight the nodes and edges based on how many people flow through a station each day and where they are traveling. This would give us a better idea of node centrality and could be interesting to view through time, especially during the workday.

For anyone hoping to visualize networks in D3, I found that the library had a steep learning curve for someone without previous experience in JavaScript, but provided some really unique and customizable results once you got the hang of it. Using D3 with a physical network like the DC metro is kind of trivial since there is already a real physical representation, but would be very useful when visualizing social networks. A common example of D3 with social networks is here with Les Mis characters.