GTFS Realtime: How to get realtime transit information in your app
--
App developers in Ireland with an interest in building apps which use realtime public transport information will be familiar with the National Transport Authority’s RTPI REST API.
The RTPI REST API provided various endpoints where a GET request could retrieve realtime bus stop information, route information and more. A dream for app developers!
This API has now been deprecated and replaced by the GTFS Real-Time API.
The newer API will require a little more work to get realtime transport information in your app, however rest assured this article will help you to get going. TL;DR at the bottom 😀
The GTFS Real-Time API
In August 2020, the National Transport Authority (NTA) announced the launch of the new GTFS-Real-Time API:
In line with the NTA’s commitment to providing open data, a new GTFS Real-time (GTFS-R) API has been launched. GTFS-R is a data feed specification that allows the NTA to provide real-time public transport updates to application developers. It is an extension to GTFS (General Transit Feed Specification), an open data format for public transportation schedules and associated geographic information.
The new API uses the GTFS Realtime specification developed by Google. This is a feed specification which has become a standard for open data feeds from transit providers around the world, being used by over 2500 operators in over 55 countries.
It is designed with an ease of interoperability in mind, transit apps can provide realtime data for many different cities and countries once their transport operators are using the same specification for their realtime data.
We can see the API spec here and try it out using a key for the API, obtained by signing up here.
A successful response to a GET request on the API will yield a large dataset of results for every trip which has a live update on the Dublin Bus, Bus Éireann and Go-Ahead schedules. Let’s take a look at a snippet grabbed from this response.
header
contains two properties:
gtfsRealtimeVersion
— The version of the GTFS Realtime specification being used. Here it’s version 1.0.timestamp
— The moment the dataset was created on the server in POSIX time
entity
is the set of realtime updates provided by the system. If this set is empty it should be assumed there are no realtime updates currently being provided.
Here, entity
contains two properties:
id
— The unique identifier for the entity.tripUpdate
— Here is where information about the trip along with realtime departure and arrival delays of a trip are found.
Looking into tripUpdate
we can see it contains two properties:
trip
— This is the trip that this update applies to.stopTimeUpdate
— This is the set of updates to stop times along the trip. Here we can find departure delays along the trip’s stop sequence.
Here, trip
contains five properties:
tripId
— This is the unique identifier of the trip that this update applies to.startTime
— The scheduled start time of this trip instance.startDate
— The start date of this trip instance in YYYYMMDD format.scheduleRelationship
— The relation between this trip and the static schedule. This can be one of SCHEDULED, ADDED, UNSCHEDULED or CANCELED.routeId
— The id of the route that this trip refers to.
And finally, we will take a look at the list of updates on stopTimeUpdate
. The first update in the list has the properties:
stopSequence
— Here, stopSequence has the value 1. This means that this update in the list of stop updates applies to the first stop in the sequence of stops on the entire trip, up until the next stop update in the list.departure
— This contains thedelay
property which shows the number of seconds delay to the departure time at this stop in the trip. A delay of0
here means departure is on time according to the schedule for every stop in the stop sequence up to the next stop update. A delay of60
for example, would mean there is a 60 second delay on top of the scheduled time for each stop in the stop sequence up to the next update.stopId
— This is the identifier for this stop in the trip.scheduleRelationship
— The relationship between the stop time and the schedule, the default is SCHEDULED but other possible values are SKIPPED and NO_DATA.
Now we’ve covered what an update looks like in the API response, how can we use these updates to calculate the ETA for a bus arriving at a bus stop?
We use this realtime data in conjunction with static schedule data provided by the transit operator. This dataset is called General Transit Feed Specification or GTFS static data.
The General Transit Feed Specification
The General Transit Feed Specification (GTFS) defines a common format for public transportation schedules and associated geographic information.
Transit operators can publish their schedule and geographic data in this format as a GTFS dataset. Developers can then write tools to consume GTFS datasets so they can incorporate public transportation information into their apps in an interoperable way.
From the Google reference we can see what a GTFS dataset is comprised of:
A GTFS feed is composed of a series of text files collected in a ZIP file. Each file models a particular aspect of transit information: stops, routes, trips, and other schedule data.
In Ireland, the NTA provides all available GTFS datasets here, including the GTFS feed to be used in conjunction with the GTFS-R API.
Downloading the feed to be used with the GTFS-R API and extracting the downloaded ZIP file will show a folder containing nine text files. What these files are for can be seen in detail in the Google reference here, but in short, they contain the latest entire dataset of schedule and geographic data for Bus Éireann, Dublin Bus and Go-Ahead services. Individual datasets for each service can also be downloaded.
As of 27th November 2020, looking at stop_times.txt
we can see a file currently a whopping 437.3 MB in size! The reason being that this file contains the scheduled stop times for every single stop, on every single trip, on every single route across the entire Dublin Bus, Bus Éireann and Go-Ahead schedules.
We also have stops.txt
, trips.txt
, and routes.txt
, which detail each individual stop, trip, and route respectively along with files containing rules for transfers between routes in transfers.txt
, service dates in calendar.txt
, exceptions to service dates in calendar_dates.txt
, rules for mapping travel paths in shapes.txt
, and the transit agencies represented in the dataset in agency.txt
.
In order to get realtime information the static schedule data in a GTFS dataset must be used in conjunction with the realtime updates provided via GTFS Realtime.
For example, If we want the next live departure time for a particular route at a particular stop, we would search the static GTFS dataset for the next trip arriving at that stop that is in service according to the calendar. If a realtime update for that trip is present in GTFS Realtime, add the delay to the scheduled departure time and we have scheduled times updated with realtime updates.
Great! How can I use this in my app?
For those developing a mobile app which uses transit feed data, the tricky part is you shouldn’t be directly consuming a GTFS realtime feed.
GTFS realtime feeds typically provide a huge dataset of all updates across their entire fleet, and when you look at the NTA’s GTFS Real-time API, you will typically see updates across the entire networks of three agencies. This requires downloading large amounts of data at very frequent intervals.
Along with the effects on a data cap when using a mobile network connection, the additional processing power required to process large datasets of updates every few minutes would cause significant battery drain.
You will instead need a server which consumes a GTFS realtime feed, where your mobile application can query the server for only the updates it needs. This server will also need to consume data from a GTFS feed. We will look at how we can consume both types of feed in a server below. The below diagram is an example of how it might work:
Consuming a GTFS Realtime feed
GTFS Realtime data can be consumed using the Google developed Protocol Buffers.
From Google Transit APIs:
GTFS-realtime data is encoded and decoded using Protocol Buffers, a compact binary representation designed for fast and efficient processing.
To work with GTFS realtime data, a developer would typically use the gtfs-realtime.proto schema to generate classes in the programming language of their choice. These classes can then be used for constructing GTFS realtime data model objects and serializing them as binary data or, in the reverse direction, parsing binary data into data model objects.
Google has provided GTFS Realtime language bindings for its protocol buffer in some of the most popular programming languages here.
The below is a code sample provided by Google in Javascript/Node.js which demonstrates using the GTFS Realtime bindings to download a GTFS realtime feed from a provided URL, parsing the response as a FeedMessage
, and iterating over the results.
To try it out with the NTA’s GTFS Real-time API, we can paste in the url found here and add a headers
object to requestSettings
with a property x-api-key
containing an API key obtained by signing up here.
Consuming a GTFS dataset
Given the specification requires that files are provided in a CSV format, there are many ways a server can consume a GTFS dataset. While it is possible to read and parse the CSV files directly in in Node.js, Let’s look at two ways to consume a GTFS dataset by importing the data into a database.
Using an Azure SQL database
I converted the files into tables in an SQL database in order to make it easier to query the different files for data when needed. To do this I used the Node-gtfs tool developed by BlinkTagInc to load a GTFS feed into an SQLite database.
I used DB Browser to edit the database as it provided a high quality, visual interface for editing and searching the large amount of data.
In my case I removed empty tables and empty columns within tables and stripped out any GTFS data I knew I would not be working with, my app does not currently aim to display Bus Éireann transit data for example, so this agency’s schedule data was removed from the database.
In addition, replacing string identifiers such as trip_id
with indexed integer identifiers in the database table corresponding with stop_times.txt
will reduce file size in the database along with improving database query times as we no longer need to perform string comparison on long string identifiers for the ~4 million records in a database table for stop_times.txt
.
To host the database I set up an Azure SQL database following this guide. I also followed this guide to migrate the SQLite database to Azure.
If you are consuming the SQL database in a Node.js application, you can use the mssql node package, a Microsoft SQL server client to connect to a Microsoft SQL database and run queries. See the following code snippet from the mssql package which demonstrates connecting to a database and running a sample SQL query:
Using an MS SQL Docker container
Microsoft offers docker images for Microsoft SQL server. Provided you have docker installed, with the below docker-compose.yml
file a node.js application can start an instance of the latest 2019 mssql docker image running in a docker container named docker-gtfs-db
.
Once the container is running, access the interactive bash shell of the container with the following command:
docker exec -it docker-gtfs-db “bash”
You can then connect to the SQL Server using the sqlcmd tool inside of the container using the below command:
/opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P YOUR_PASSWORD_HERE
You are now connected to the SQL server. Create a database with the following query:
CREATE database gtfsdb;
GO
Here we have created a database named gtfsdb
. Hit ctrl + c to exit the database followed by ctrl + d to exit the bash shell. We can now use the mssql node package mentioned in the previous section to connect to this database.
The tool provided in the node app here downloads a static GTFS feed from a provided URL and uses the package to write the GTFS feed to the MS SQL database we created in the docker container.
We can now consume a GTFS feed using queries to the database, returning the response on our server.
Bringing it all together
Now that we know how to consume both a GTFS Realtime feed and a GTFS feed in a Node.js app, we need to create an endpoint that a mobile app can hit with a GET request to retrieve a realtime update for a given bus stop.
I have created a Node.js app that can do all of this for you, it can be found here.
The above app creates a basic REST API using the hapi framework, a Node.js framework for building web services. The hapi framework uses a plugin architecture which allows an application to be easily broken up into isolated components of logic.
We can create plugins which we register on our hapi server. In this case I have a realtime
plugin and a data
plugin which represent the logic for requests to the GTFS Real-time API and logic for the data access layer respectively, where we will use node mssql to query the SQL database.
By querying the API on the server we can see the realtime information for ourselves, see below for an example response of a query for the next trips arriving at bus stop 297 within the next hour.
So now we know we are successfully getting realtime results from our server, let’s make the same request in a sample iOS app and display the route and ETA in a table view:
Woohoo! We are now getting realtime transit data in a mobile app! 🎉
TL;DR
- GTFS Realtime feeds are provided by transit companies as a way of providing frequent realtime updates to their schedule with updates such as delays or cancelled trips.
- We use a GTFS Realtime feed in conjunction with a GTFS dataset
- A GTFS dataset is a static dataset of the entire schedule and geographic data of a transit agency which is updated less frequently.
- By retrieving trips from the static GTFS dataset we can update scheduled stop times with the delays in their corresponding updates in a GTFS Realtime feed.
- You can build a server to retrieve both a GTFS Realtime feed and GTFS feed and expose REST APIs so parts of that data can be queried.
- A mobile application can then query the server to get realtime data.