GTFS Realtime: How to get realtime transit information in your app

App developers in Ireland with an interest in building apps which use realtime public transport information will be familiar with the National Transport Authority’s RTPI REST API.

The RTPI REST API provided various endpoints where a GET request could retrieve realtime bus stop information, route information and more. A dream for app developers!

This API has now been deprecated and replaced by the GTFS Real-Time API.

The newer API will require a little more work to get realtime transport information in your app, however rest assured this article will help you to get going. TL;DR at the bottom 😀

The GTFS Real-Time API

In line with the NTA’s commitment to providing open data, a new GTFS Real-time (GTFS-R) API has been launched. GTFS-R is a data feed specification that allows the NTA to provide real-time public transport updates to application developers. It is an extension to GTFS (General Transit Feed Specification), an open data format for public transportation schedules and associated geographic information.

The new API uses the GTFS Realtime specification developed by Google. This is a feed specification which has become a standard for open data feeds from transit providers around the world, being used by over 2500 operators in over 55 countries.

It is designed with an ease of interoperability in mind, transit apps can provide realtime data for many different cities and countries once their transport operators are using the same specification for their realtime data.

We can see the API spec here and try it out using a key for the API, obtained by signing up here.

A successful response to a GET request on the API will yield a large dataset of results for every trip which has a live update on the Dublin Bus, Bus Éireann and Go-Ahead schedules. Let’s take a look at a snippet grabbed from this response.

Response edited to show only one entity in the feed. Typically the response includes hundreds of entities across Dublin Bus, Bus Éireann and Go-Ahead schedules

header contains two properties:

  • gtfsRealtimeVersion — The version of the GTFS Realtime specification being used. Here it’s version 1.0.
  • timestamp — The moment the dataset was created on the server in POSIX time

entity is the set of realtime updates provided by the system. If this set is empty it should be assumed there are no realtime updates currently being provided.

Here, entity contains two properties:

  • id — The unique identifier for the entity.
  • tripUpdate — Here is where information about the trip along with realtime departure and arrival delays of a trip are found.

Looking into tripUpdate we can see it contains two properties:

  • trip — This is the trip that this update applies to.
  • stopTimeUpdate — This is the set of updates to stop times along the trip. Here we can find departure delays along the trip’s stop sequence.

Here, trip contains five properties:

  • tripId — This is the unique identifier of the trip that this update applies to.
  • startTime — The scheduled start time of this trip instance.
  • startDate — The start date of this trip instance in YYYYMMDD format.
  • scheduleRelationship — The relation between this trip and the static schedule. This can be one of SCHEDULED, ADDED, UNSCHEDULED or CANCELED.
  • routeId — The id of the route that this trip refers to.

And finally, we will take a look at the list of updates on stopTimeUpdate . The first update in the list has the properties:

  • stopSequence — Here, stopSequence has the value 1. This means that this update in the list of stop updates applies to the first stop in the sequence of stops on the entire trip, up until the next stop update in the list.
  • departure — This contains the delay property which shows the number of seconds delay to the departure time at this stop in the trip. A delay of 0 here means departure is on time according to the schedule for every stop in the stop sequence up to the next stop update. A delay of 60 for example, would mean there is a 60 second delay on top of the scheduled time for each stop in the stop sequence up to the next update.
  • stopId — This is the identifier for this stop in the trip.
  • scheduleRelationship — The relationship between the stop time and the schedule, the default is SCHEDULED but other possible values are SKIPPED and NO_DATA.

Now we’ve covered what an update looks like in the API response, how can we use these updates to calculate the ETA for a bus arriving at a bus stop?

We use this realtime data in conjunction with static schedule data provided by the transit operator. This dataset is called General Transit Feed Specification or GTFS static data.

The General Transit Feed Specification

Transit operators can publish their schedule and geographic data in this format as a GTFS dataset. Developers can then write tools to consume GTFS datasets so they can incorporate public transportation information into their apps in an interoperable way.

From the Google reference we can see what a GTFS dataset is comprised of:

A GTFS feed is composed of a series of text files collected in a ZIP file. Each file models a particular aspect of transit information: stops, routes, trips, and other schedule data.

In Ireland, the NTA provides all available GTFS datasets here, including the GTFS feed to be used in conjunction with the GTFS-R API.

Downloading the feed to be used with the GTFS-R API and extracting the downloaded ZIP file will show a folder containing nine text files. What these files are for can be seen in detail in the Google reference here, but in short, they contain the latest entire dataset of schedule and geographic data for Bus Éireann, Dublin Bus and Go-Ahead services. Individual datasets for each service can also be downloaded.

An unzipped GTFS dataset released by Transport For Ireland

As of 27th November 2020, looking at stop_times.txt we can see a file currently a whopping 437.3 MB in size! The reason being that this file contains the scheduled stop times for every single stop, on every single trip, on every single route across the entire Dublin Bus, Bus Éireann and Go-Ahead schedules.

We also have stops.txt, trips.txt, and routes.txt, which detail each individual stop, trip, and route respectively along with files containing rules for transfers between routes in transfers.txt, service dates in calendar.txt, exceptions to service dates in calendar_dates.txt, rules for mapping travel paths in shapes.txt, and the transit agencies represented in the dataset in agency.txt.

An example of the data found in stops.txt of a GTFS dataset

In order to get realtime information the static schedule data in a GTFS dataset must be used in conjunction with the realtime updates provided via GTFS Realtime.

For example, If we want the next live departure time for a particular route at a particular stop, we would search the static GTFS dataset for the next trip arriving at that stop that is in service according to the calendar. If a realtime update for that trip is present in GTFS Realtime, add the delay to the scheduled departure time and we have scheduled times updated with realtime updates.

Great! How can I use this in my app?

GTFS realtime feeds typically provide a huge dataset of all updates across their entire fleet, and when you look at the NTA’s GTFS Real-time API, you will typically see updates across the entire networks of three agencies. This requires downloading large amounts of data at very frequent intervals.

Along with the effects on a data cap when using a mobile network connection, the additional processing power required to process large datasets of updates every few minutes would cause significant battery drain.

You will instead need a server which consumes a GTFS realtime feed, where your mobile application can query the server for only the updates it needs. This server will also need to consume data from a GTFS feed. We will look at how we can consume both types of feed in a server below. The below diagram is an example of how it might work:

How a mobile app could receive updates from a server consuming both GTFS Realtime data and GTFS data

Consuming a GTFS Realtime feed

From Google Transit APIs:

GTFS-realtime data is encoded and decoded using Protocol Buffers, a compact binary representation designed for fast and efficient processing.

To work with GTFS realtime data, a developer would typically use the gtfs-realtime.proto schema to generate classes in the programming language of their choice. These classes can then be used for constructing GTFS realtime data model objects and serializing them as binary data or, in the reverse direction, parsing binary data into data model objects.

Google has provided GTFS Realtime language bindings for its protocol buffer in some of the most popular programming languages here.

The below is a code sample provided by Google in Javascript/Node.js which demonstrates using the GTFS Realtime bindings to download a GTFS realtime feed from a provided URL, parsing the response as a FeedMessage, and iterating over the results.

To try it out with the NTA’s GTFS Real-time API, we can paste in the url found here and add a headers object to requestSettings with a property x-api-key containing an API key obtained by signing up here.

Consuming a GTFS dataset

Using an Azure SQL database

I used DB Browser to edit the database as it provided a high quality, visual interface for editing and searching the large amount of data.

In my case I removed empty tables and empty columns within tables and stripped out any GTFS data I knew I would not be working with, my app does not currently aim to display Bus Éireann transit data for example, so this agency’s schedule data was removed from the database.

In addition, replacing string identifiers such as trip_id with indexed integer identifiers in the database table corresponding with stop_times.txt will reduce file size in the database along with improving database query times as we no longer need to perform string comparison on long string identifiers for the ~4 million records in a database table for stop_times.txt.

To host the database I set up an Azure SQL database following this guide. I also followed this guide to migrate the SQLite database to Azure.

If you are consuming the SQL database in a Node.js application, you can use the mssql node package, a Microsoft SQL server client to connect to a Microsoft SQL database and run queries. See the following code snippet from the mssql package which demonstrates connecting to a database and running a sample SQL query:

Using an MS SQL Docker container

Starts an instance of the docker mssql server image in the container docker-gtfs-db

Once the container is running, access the interactive bash shell of the container with the following command:

docker exec -it docker-gtfs-db “bash”

You can then connect to the SQL Server using the sqlcmd tool inside of the container using the below command:

/opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P YOUR_PASSWORD_HERE

You are now connected to the SQL server. Create a database with the following query:

CREATE database gtfsdb;
GO

Here we have created a database named gtfsdb. Hit ctrl + c to exit the database followed by ctrl + d to exit the bash shell. We can now use the mssql node package mentioned in the previous section to connect to this database.

The tool provided in the node app here downloads a static GTFS feed from a provided URL and uses the package to write the GTFS feed to the MS SQL database we created in the docker container.

We can now consume a GTFS feed using queries to the database, returning the response on our server.

Bringing it all together

I have created a Node.js app that can do all of this for you, it can be found here.

The above app creates a basic REST API using the hapi framework, a Node.js framework for building web services. The hapi framework uses a plugin architecture which allows an application to be easily broken up into isolated components of logic.

We can create plugins which we register on our hapi server. In this case I have a realtime plugin and a data plugin which represent the logic for requests to the GTFS Real-time API and logic for the data access layer respectively, where we will use node mssql to query the SQL database.

For brevity, an in depth look at the code can be seen on the repo.

By querying the API on the server we can see the realtime information for ourselves, see below for an example response of a query for the next trips arriving at bus stop 297 within the next hour.

A JSON response to a query for incoming trips on different bus routes arriving at a given bus stop within the next hour

So now we know we are successfully getting realtime results from our server, let’s make the same request in a sample iOS app and display the route and ETA in a table view:

Woohoo! We are now getting realtime transit data in a mobile app! 🎉

TL;DR

  • We use a GTFS Realtime feed in conjunction with a GTFS dataset
  • A GTFS dataset is a static dataset of the entire schedule and geographic data of a transit agency which is updated less frequently.
  • By retrieving trips from the static GTFS dataset we can update scheduled stop times with the delays in their corresponding updates in a GTFS Realtime feed.
  • You can build a server to retrieve both a GTFS Realtime feed and GTFS feed and expose REST APIs so parts of that data can be queried.
  • A mobile application can then query the server to get realtime data.

iOS Developer📱https://github.com/matthewoleary