Daniel Heppner

When transit agencies publish a new service change, they often tell us a list of routes that are affected, along with specific details on each trip that is being added or removed. But in the back of my mind, I’ve always wanted to know if traffic is making certain routes slower, or if something that looks like a service increase is actually just a reallocation of the budget.

The Transit Diff project is my attempt to reverse engeineer a service change. When planners draw up a schedule, they start with a budgeted number of hours and they have to allocate those hours to assign trips to routes. The result that we get to see is that list of trip level changes. But if we add up all the scheduled hours of every trip on a route for a whole day, we can get how many hours are allocated to each route. Then adding those numbers up, we can see the total service hours budgeted for a day of service. We can do the same thing for the trip count, and the devide the hours by the trip count to get the average length (of time) of each trip.

Diffing these values simply requires choosing two different service days, ideally on the same day of the week on either side of a service change. For example, if a service change occurs on a Sunday, you might want to compare the Monday immediately after and the Monday immediatley before the Sunday of the service change.

When Sound Transit opened its light rail extension to Lynnwood, a bunch of bus service was restructured to provide more local service instead of duplicating freeway service into downtown Seattle, which the train now does in parallel. Initially I felt that the bus service was an overall reduction, because I was expecting more increases in local routes’ frequency. But when I put it into the analzyer, I found that in fact it was a 5% service increase!

It’s also possible to see when a route has hours added because of increased traffic congesion or because it was lengthened. At the above link, you can see the Swift Blue Line has 13% more service hours with only a 2% increase in trips. That means most of those hours went into the longer trip duration, confirmed by the last column showing an 11% increase in trip duration!

DuckDB: Made to analzye GTFS

I was inspired to start this project because of a podcast episode I listened to where the DucDB co-creator Hannes Mühleisen talked about the super fast CSV parser built in to DuckDB. Basically, DuckDB is an in process database that lets you read a CSV in and execute SQL queries on it without having to spin up a separate database or load the data into some other data structure.

Since GTFS transit data is published as a bunch of CSV files, I realized that this would be the perfect way to test out DuckDB. This type of analysis was something that I always wanted to do, and it felt like DuckDB was literally made just for it!

Indeed, the project is very simple. The entire thing lives in a Next.js app, and DuckDB is run in a React Server Component. It reads the CSVs directly, which I provide by unzipping them in a specific folder, and executes the queries against them. When there’s a new GTFS feed I want to analyze all I have to do is drop the zip file in a specific folder and update the list of presets with the service change dates I want to look at.

Future Ambitions

The most pressing thing is to improve the caching performance. I haven’t looked into it much, but there’s probably some more aggressive caching that I could have Next.js use, since the service hour calculations only need to happen once. As of now, it seems like Next.js likes to re-render the server component more than necessary.

Weeklong Comparisons

Right now I’m only able to compare one service day to another. But since generally service patterns repeat on a weekly basis (Monday-Sunday), it would be useful to aggregate all the service hours in a week. This would allow us to see if a service change simply reallocates weekday hours to the weekend or if those new weekend trips are actually new hours.

Mapping the Service levels

My most ambitious idea for this project is a map that displays where bus service is allocated. Your first thought might be that I could just map the routes with the service information I’ve calculated above, but I actually want to look at the data for trips between any two stops, ignoring what specific routes are serving them. This would allow me to identify corridors of bus service, even if the corridor is serviced by several overlapping lower frequency routes.

From a techincal perspective this is much more complicated than just analyzing some CSVs, and will definitely require a proper backend. My plan is to leverage OpenTripPlanner’s transit data graph to find all the stops (nodes) connected by trips (edges) in an efficient manner.