The release of the NYC MTA Taxi data covering all Taxi rides in 2013 and obtained through the Freedom of Information act by Chris Whong has been making the rounds on the internet. Here is my first take on the data. The following analysis is based off of 173 million transactions. The total amount of money spent is $2,561,345,362 or 2.6 billion dollars. The data can be divided into two main parts, transaction data and trip data. For the first set of analysis we will focus on the transaction data. Here is a high level analysis of the transaction data for the taxis.

Payment Type Credit Card Cash DIS NOC Unknown
Number of Transactions 93,334,004 79,110,096 127,309 401,483 206,867

From the table it is clear that the majority of the transactions that take place are credit card based. Now Lets look at the breakdown of fare, surcharge, tips etc (Figure 1). The distribution of the various charges for MTA Tax, Surcharge and the Toll Amount appears to be more of less the same. The differences however show up in the other transaction types. The interesting thing is that for the other transaction types the tip is almost non-exist. Since it is unlikely that people do not pay tips when they pay cash for a taxi trips. There are two possibilities, either the tip amount is already included in the reported amount or the tips are scarcely being reported.

image (2)

We can get a better idea about under reporting or over-reporting if we look at average transaction cost. We can glean this information by looking at the average of each type of charge for the various form of transactions (Figure 2). The main thing noticeable here is that the tip amount is minuscule for non-credit card and non-unknown cases. On average the tip amount is approximately $2.5 for both of these two categories. If we make the reasonable assumption that the tip amount is being under-reported in the other cases as well and it is in fact also $2.5 then we get the astonishing figure of $199,097,220 as the amount of tips which are under reported in the taxi data. This brings the adjusted total amount of money spent to be $2,760,442,582 or 2.76 billion dollars. I think even this number is an underestimate, it becomes clear if we compare these numbers to aggregate stats for tips for previous years.

image (3)

Since we are mainly interested in the general patterns then the rest of the summary of the data will be based on the un-adjusted raw data for the tips unless otherwise noted. Now let us look at the spending patterns over the course of days of the week as in Figure 3 and summed over the whole year where the main trend is that the amount of money spent goes up as the week progress peaking on Friday and the declining again. No real surprises here, although I was expecting Saturday to be a little higher since people do travel more often on the weekend. It would be interesting to compare the volume of taxi rides to the volume of subway rides in NYC to get a fuller picture of transportation behaviors of New Yorkers.

image (5)

The next step is of course to ask how do the spending patterns vary over the course of the day (Figure 4).  The activity is much lower after midnight and early hours of the day the pace picks up after 5 am with a notable dip around 4 pm which is right before rush hour in the city.

image (7)

Not surprise here, not let us look at the total money spent for each month (Figure 5). Nothing exciting here also except the dip that we see in summer where the total money declines from $230 million at the beginning of summer in May to $190 million in August; hence a decline of $40 million over the course of summer before picking up pace again. It is most likely the case that New Yorkers are taking taxis less often in summer.

image (8)

Last but not least we can also look at the breakdown of the money spent over the course of the year broken down by weeks (Figure 6). One thing is clear that spending goes down close to major holidays (New Year, Memorial Day, Independence Day, Thanksgiving) and dramatically during Christmas holidays. This is not at all surprising but we do see a few unexpected things: There is a spike of spending when the day light savings start as well as a sharp decline right before Spring Break in NYC. The unexpected thing is the sharp decline coinciding with the end of Islamic month of Ramadan and just a few days before the beginning of the Islamic Festival of Eid. Not coincidentally NYC is also the city in the US with the largest number of mosques. Could this be an indirect indicator of the presence of a large Muslim population in the city? Additional analysis and data may reveal more insights into this phenomenon.


In the next post I shall analyze the trip data and see how all of this fits together.


  1. I don’t have the access to data but one can look at the time where any given taxi is hired during the day if there is such data, then we can see whether money less spent on holidays is because of lack of demand or supply.

    1. reading back, my post should be confusing for anyone! It says checkout the average hired time of taxis in Holiday time, if they are busier compared to non Holiday times, taxi drivers left the city, if not, taxi riders left the city.

      1. Yes location data is also available and that will be the subject of my next post.

        I have considered that hypothesis and will be addressed in the next post. Thanks for pointing it out.

  2. Great Analysis. I am looking forward to the next post.

    What an interesting insight that the drop in transactions could be an indirect indicator of the presence of a large Muslim population in the city.

    I am curious about supply vs demand sides of this insight. On supply side, I would argue that it is likely that a considerable percentage of taxi drivers are Muslim and take off at the end of Ramadan to celebrate Eid, and that the reduction in the number of taxi available would could explain the a dip in transactions.

    Additionally, on the demand side. I am curious as to whether Muslims are more likely to use public transportation or walk to visit relatives and friends during Eid. It is possible that the data is revealing that Muslims are more likely to use environmentally friends modes of transportation.
    On the other hand, it could just mean the Muslims are more likely to live in closer proximity to each friends and relatives, and therefore less likely to use taxi, well at least MTA taxis, to get around.

    Either way, its remains a very interesting insight.

    1. Yes good point, that is why the breakdown of the traffic for that day and other days should be helpful in answering that question.

      It would be ideal to complement this data with data from the subway and the buses

    2. I think it is definitely true that a majority of NYC Taxi drivers are muslim, based on my own experience riding in taxis, as well as on an interview I recently heard on NY public radio (WNYC) with a teacher at a taxi driver school.

  3. Good stuff!

    Think you can use other variables to predict tip % (within credit card transactions only, cuz those are the only reliable data)?

    Late night rides = higher tips because drunk ppl are more generous?

    Title for paper: the daily cycle of stinginess.

    1. That is a great suggestion, using social variables to predict tipping. In other words one can even build a tip predictor if one can add such variables into a model

      (Paper title duly noted)

    1. No I have not been able to figure that out. My best guess is that it is a non-traditional form of payment.

  4. It is impossible to get a taxi on Eid because a large number of the drivers are Muslim. In years where ID is around the same time as Diwali, it is even more difficult. It’s not an issue of demand but rather supply.

Leave a Reply

Your email address will not be published. Required fields are marked *