The release of the NYC MTA Taxi data covering all Taxi rides in 2013 and obtained through the Freedom of Information act by Chris Whong has been making the rounds on the internet. Here is my first take on the data. The following analysis is based off of 173 million transactions. The total amount of money spent is $2,561,345,362 or 2.6 billion dollars. The data can be divided into two main parts, transaction data and trip data. For the first set of analysis we will focus on the transaction data. Here is a high level analysis of the transaction data for the taxis.
|Number of Transactions
From the table it is clear that the majority of the transactions that take place are credit card based. Now Lets look at the breakdown of fare, surcharge, tips etc (Figure 1). The distribution of the various charges for MTA Tax, Surcharge and the Toll Amount appears to be more of less the same. The differences however show up in the other transaction types. The interesting thing is that for the other transaction types the tip is almost non-exist. Since it is unlikely that people do not pay tips when they pay cash for a taxi trips. There are two possibilities, either the tip amount is already included in the reported amount or the tips are scarcely being reported.
We can get a better idea about under reporting or over-reporting if we look at average transaction cost. We can glean this information by looking at the average of each type of charge for the various form of transactions (Figure 2). The main thing noticeable here is that the tip amount is minuscule for non-credit card and non-unknown cases. On average the tip amount is approximately $2.5 for both of these two categories. If we make the reasonable assumption that the tip amount is being under-reported in the other cases as well and it is in fact also $2.5 then we get the astonishing figure of $199,097,220 as the amount of tips which are under reported in the taxi data. This brings the adjusted total amount of money spent to be $2,760,442,582 or 2.76 billion dollars. I think even this number is an underestimate, it becomes clear if we compare these numbers to aggregate stats for tips for previous years.
Since we are mainly interested in the general patterns then the rest of the summary of the data will be based on the un-adjusted raw data for the tips unless otherwise noted. Now let us look at the spending patterns over the course of days of the week as in Figure 3 and summed over the whole year where the main trend is that the amount of money spent goes up as the week progress peaking on Friday and the declining again. No real surprises here, although I was expecting Saturday to be a little higher since people do travel more often on the weekend. It would be interesting to compare the volume of taxi rides to the volume of subway rides in NYC to get a fuller picture of transportation behaviors of New Yorkers.
The next step is of course to ask how do the spending patterns vary over the course of the day (Figure 4). The activity is much lower after midnight and early hours of the day the pace picks up after 5 am with a notable dip around 4 pm which is right before rush hour in the city.
Not surprise here, not let us look at the total money spent for each month (Figure 5). Nothing exciting here also except the dip that we see in summer where the total money declines from $230 million at the beginning of summer in May to $190 million in August; hence a decline of $40 million over the course of summer before picking up pace again. It is most likely the case that New Yorkers are taking taxis less often in summer.
Last but not least we can also look at the breakdown of the money spent over the course of the year broken down by weeks (Figure 6). One thing is clear that spending goes down close to major holidays (New Year, Memorial Day, Independence Day, Thanksgiving) and dramatically during Christmas holidays. This is not at all surprising but we do see a few unexpected things: There is a spike of spending when the day light savings start as well as a sharp decline right before Spring Break in NYC. The unexpected thing is the sharp decline coinciding with the end of Islamic month of Ramadan and just a few days before the beginning of the Islamic Festival of Eid. Not coincidentally NYC is also the city in the US with the largest number of mosques. Could this be an indirect indicator of the presence of a large Muslim population in the city? Additional analysis and data may reveal more insights into this phenomenon.
In the next post I shall analyze the trip data and see how all of this fits together.