Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance

Over at Todd W. Schneider's website:

NYC Taxi Data

The official TLC trip record dataset contains data for over 1.1 billion taxi trips from January 2009 through June 2015, covering both yellow and green taxis. Each individual trip record contains precise location coordinates for where the trip started and ended, timestamps for when the trip started and ended, plus a few other variables including fare amount, payment method, and distance traveled.

I used PostgreSQL to store the data and PostGIS to perform geographic calculations, including the heavy lifting of mapping latitude/longitude coordinates to NYC census tracts and neighborhoods. The full dataset takes up 267 GB on disk, before adding any indexes. For more detailed information on the database schema and geographic calculations, take a look at the GitHub repository.

Uber Data

Thanks to the folks at FiveThirtyEight, there is also some publicly available data covering nearly 19 million Uber rides in NYC from April–September 2014 and January–June 2015, which I’ve incorporated into the dataset. The Uber data is not as detailed as the taxi data, in particular Uber provides time and location for pickups only, not drop offs, but I wanted to provide a unified dataset including all available taxi and Uber data. Each trip in the dataset has a cab_type_id, which indicates whether the trip was in a yellow taxi, green taxi, or Uber car.

The introduction of the green boro taxi program in August 2013 dramatically increased the amount of taxi activity in the outer boroughs. Here’s a graph of taxi pickups in Brooklyn, the most populous borough, split by cab type:

brooklyn pickups

From 2009–2013, a period during which migration from Manhattan to Brooklyn generally increased, yellow taxis nearly doubled the number of pickups they made in Brooklyn.

Once boro taxis appeared on the scene, though, the green taxis quickly overtook yellow taxis so that as of June 2015, green taxis accounted for 70% of Brooklyn’s 850,000 monthly taxi pickups, while yellow taxis have decreased Brooklyn pickups back to their 2009 rate. Yellow taxis still account for more drop offs in Brooklyn, since many people continue to take taxis from Manhattan to Brooklyn, but even in drop offs, the green taxis are closing the gap.

More here.