If you live or work in Montreal, you've probably passed by one of the 700 Bixi stations. And you've undoubtedly come across "Bixists," i.e., cyclists riding one of the grey bikes (or perhaps a blue bike). That's Bixi. It is a hybrid between "BIcycle" and "TaXI" to underline the concept of using a bicycle just like a taxi.
While it has been growing in popularity, it would be an excellent opportunity to analyze various aspects of Bixi data, especially in Montreal. The goal is to find results that can help improve the quality of the service, help customers find their closest Bixi station, the most and least crowded time to pick up a Bixi, and all other services that will benefit the customer. For this particular case study, we will use open datasets.
The datasets we will be using in this analysis are:
The datasets regarding bixi ridership have over four million records. The data includes the start and end station code, the start and end date plus time, the duration in seconds, and membership details. The file with all the station details (Stations.csv) contains the station's details against their station codes like code, name, address, coordinates, active status. The weather files (noaa-daily-weather-data.csv) contain all-weather information throughout the year like date, precipitation, snow, max and min temperature, elevation, coordinates, and country code. We have the list of holidays and their corresponding date, as per the year, to know what the traffic is like on those days.
After procuring all the data, we will follow these steps to complete our analysis. Here's what we will be finding out:
The first task is to find out the most and least popular bike stations. Popularity is measured by the number of bikes taken and returned at a station. For further simplicity, we will find four stations;
To complete this task, we will be getting all the data specifically for 2016 and 2017. Next, we will sort stations according to their popularity. We will calculate popularity by a simple method, the more times a station has a bike rented or returned, the more popular that particular station is. Following this step, we will find the most and least popular stations according to starting point and ending point. They are listed below, along with their name and code.
For 2016
For 2017
Note: As the Station with the least popularity to start and end was the same, a single point represents it on the map.
Now that we have the most and least popular stations in Montreal, we will next determine the most and the least popular 'Days of the Week' when the ridership was most popular. For this analysis, we will use data from the sources mentioned above (dataset OD.csv) for 2016 and 2017.
As for how we will analyze- we will check every entry, aka every date when a bixi was rented. We will then sort the dates into days and note the number of rentals made on that specific day. As a result, we will know the most popular day and the number of rented bikes on that day.
For 2016
For 2017
What do these numbers depict? It shows that maximum rentals are made on Wednesday, with a whopping number of 636580 in 2016 and 632176 in 2017. Surprisingly, the days that see the least number of rentals are on Sundays in 2016. However, Mondays see the least number of rentals in 2017, making sense as it is the beginning of a workweek.
We now know the most and least popular stations and the days of the week for ridership. The following agenda is to find the most and least popular times of day when a bike is rented.
For this analysis, we will be using the above dataset (OD.csv), which gives us an hourly timestamp and also looks at 24 hours throughout the year. This analysis aims to help us get to the most popular hour when rents were made and the time with least rents for both 2016 and 2017.
For 2016
For 2017
After looking at the analysis, it is clear that most rents were made at 5:00 PM throughout the year, which is the perfect time to go for a ride. As for the least rents, 4:00 AM was when bixis were least rented, which is not surprising. This analysis clearly shows that the evenings are when the stations are crowded.
Other than factors, one of the primary parameters is weather that can affect ridership. For this particular case - we will use two datasets - one with weather details (noaa-daily-weather-data.csv) and the other which has ridership details (OD.csv) for respective years. We will audit and compare them both to find any visible differences in ridership as weather changes.
We first see the number of rides taken on a particular day and their duration, then check the weather conditions for that day and plot them against a graph to view any visible changes. We display the results below;
2016 and 2017:
We have four parameters shown in the graphs (for 2016 and 2017) - ridership popularity, precipitation analysis, temperature analysis, and snow analysis. When you compare all the parameters, the results are as follows:
We don't want a day when you head to the station and don't find a ride. One would assume that the most popular stations would exhaust the fastest, and those that are least popular would finish the slowest. While this is true in most cases, the popularity analysis is annual, and our bixi case study results may vary.
To solve this puzzle, we will first check the number of rentals and returns in a day at a particular station, then compare them with their maximum capacity. If the rentals exceed the maximum capacity, it is safe to say that the station exhausts their bikes the fastest. If it is the other way round, then the station finishes its bikes the slowest.
Using the above data, the analysis leads us to the following:
For 2016
For 2017
If a station's exhaustion rate is 2, the number of rents is two times faster than the number of returns, which means that more bikes are taken than returned, and that station is likely to exhaust its bikes fast. If a station's exhaust rate is 0.5, the number of returns is twice the number of rents, which means that more bikes are returned than rented, and that station will have more bikes than its capacity causing it never to exhaust.
For this particular analysis, we will be using the holiday data from 2016 and 2017. But we will only be using the holidays mentioned here. Firstly, we will check if there are weekends and get rid of them. Second, we will check their dates, compare them with the primary data mentioned above for the respective years, and finally find the popularity of overall ridership on that day.
Once these steps are completed, we will compare the holidays with the primary data, which will determine the most popular ridership, and which has the least. We will look at holidays such as Victoria Day, St. Jean Baptiste Day, Canada Day, Labour Day, and Thanksgiving Day. Here are the corresponding dates.
For 2016
For 2017
Does being a member or a non-member have any influence on the analysis thus far? Let's find out. We again use one of the data sets mentioned in the Data Source section. We find the member's and non-members' data and compare it to the number of rents. We calculated and compared the total number of rents and popularity with members and non-members, and the results are as follows for both years.
83% members
17%, not members
In conclusion, 83% of the members played an essential part in renting the bikes and impacted the overall bixi traffic analysis.
https://github.com/Mindtrades-Consulting/Analyzing-BIXI-Ridership-With-Data
How can MindTrades help?
This case study is only a tipping point to such in-depth analysis with insight and solutions. For more such case studies, contact https://www.mindtrades.com.