In this article, I explore the public transport systems of four selected cities relying on General Transit Feed Specification and various tools of spatial data science.
I picked four cities in this notebook, Budapest, Berlin, Stockholm, and Toronto, to overview their public transport system using publicly available GTFS (General Transit Feed Specification) data. This notebook aims to serve as an introductory tutorial on accessing, manipulating, aggregating, and visualising public transport data using Pandas, GeoPandas, and other standard data science tools to derive insights about public transport. Later on, such understanding can be helpful in various use cases, such as transport, urban planning, and location intelligence.
Additionally, while the GTFS format is supposed to be general and universal, I will also point out situations that still require one-by-one, city-level insights and manual validations throughout the following analytical steps.
For this article, I downloaded public transport data from Transitfeeds.com, an online aggregator website for public transport data. In particular, I downloaded data with the following latest update times for the following cities:
In the following code blocks, I will explore each of these cities multiple times, create comparative plots, and stress out the universality of the GTFS format. Also, to ensure that my analytics are easy to update with newer data dumps, I store each city’s GTFS data in a folder corresponding to the update date:
import osroot = 'data'
cities = ['Budapest', 'Toronto', 'Berlin', 'Stockholm']
updated = {city : [f for f in os.listdir(root + '/' + city) if '20' in f][0] for city in cities}
updated
The output of this cell:
Now, let’s take a closer look at the different files stored in these folders:
for city in cities…