By Kieran McHugh
Here at Skyscanner, we’re always looking out for ways to improve our travellers’ experience through novel applications of machine learning.
Every day, millions of people start planning their next trip on Skyscanner. Each search can generate thousands of prices for travellers to browse and compare. All of these prices pass through Skyscanner’s data platform: where they are ingested, filtered, processed, and ultimately stored. At our scale, it becomes a real challenge to extract actionable insights owing to the sheer volume of pricing data with which we’re being bombarded.
We’ve all experienced the frustration and disappointment of having to fork out more than expected for a product or service: be it a nasty utility price hike, a heated online auction, or just missing out on a flash sale. Afterwards, we feel ripped off — and we might have to make significant and inconvenient adjustments to financial plans in order to gather the necessary cash.
It’s an unfortunate and regrettable fact that this happens all the time when people are booking flights for upcoming trips. The travel industry is a dynamic and highly competitive marketplace, where prices set by airlines and travel agents are extremely volatile and subject to modification without notice. Sometimes, prices can increase several times per hour.
In fact, the airline industry as a whole is seen as “one of the most sophisticated in its use of dynamic pricing strategies in an attempt to maximise its revenue” [1].
The price of the cheapest flight for any given route can vary wildly over time. The above chart illustrates price changes for a one-way short-haul flight over a 14-day window.
This situation is hugely frustrating for us, and for our travellers. Every day, we work extremely hard to maximise the recency and accuracy of prices we display on our website. We aren’t notified when prices are going to change, nor do we know the amount by which they will increase or decrease. It’s up to us to second-guess the market and minimise the impact of these fluctuations on our users.
Having worked with Skyscanner throughout my university studies, I’d been on the lookout for how to apply my academic experience in the context of Skyscanner’s biggest business challenges. So, when the time came to select a topic for my thesis, I knew that I wanted to partner with Skyscanner to tackle a difficult problem. I had studied the principles of machine learning in-depth, and I decided to specialise in Artificial Neural Networks. I developed a specific interest in how neural networks can be applied to regression problems unrelated to computer vision and image recognition.
I wondered whether we could leverage Skyscanner’s large historical datasets to design, build, train and evaluate some simple neural networks — which, given some contextual information about a flight, could provide us with some insight about how we can expect the ticket price to increase or decrease between now and its departure date.
Why would we want to do this? There are already several websites which advise travellers whether they should ‘buy now’ or ‘wait for a while’. In the latter case, it is hoped that the ticket price will decrease, allowing the traveller to save some cash. However, in the majority of cases, flight ticket prices will never go down — they increase monotonically. I therefore reasoned that machine learning models focused on ‘buy/wait’ classification are not very useful for customers.
The best advice we can give to travellers is to purchase your tickets as early as you can. Though, life is rarely this simple: it’s often inconvenient to book a flight right away, and it might be necessary to defer a booking. For instance, we often have to wait for payday to come around, or for a (less organised) friend to confirm their attendance so that all the tickets can be booked together. I believe it would be much more valuable if we could inform our travellers…
So, instead of a simple ‘buy’ or ‘wait’ classification, it would be far preferable to develop a machine learning model whose output is an indicative price. In other words, we give the model the details about our flight (such as the origin, destination, and date of travel) along with a parameter representing the number of days until departure. In theory, as we vary this parameter between 365 days (1 year in advance) and 0 days (the day of the departure), the model should produce different prices.
At the point that I came up with this proposal, I had no idea how difficult such a model would be to construct, or even whether it was remotely feasible. My whole project was clouded in uncertainty, and there were an overwhelming number of unknowns which made it difficult to know where I should begin.
Before attempting to break the problem down any further, I concentrated on getting access to as much high-quality training data as possible. I worked with the Skyscanner legal team to set up an academic data-sharing agreement. This agreement granted me unprecedented access to a snapshot of global flight pricing over a 60-day period from September to November 2016.
Specifically, for every search performed on Skyscanner in this period, we recorded the minimum direct price (MDP) that we showed to the traveller. The MDP is the cheapest direct fare (no connections) across all flight times and providers. This amounted to more than 200,000,000 candidate training patterns.
That’s a lot of data!
With such a huge amount of training data, I felt that I would be setting myself up for failure if I didn’t establish a clear focus for my investigation.
With this in mind, I decided to consider only direct, one-way flights. I thought that this presented a sensible starting point, and that it would provide a basis for future work investigating a more complete model including return flights and connecting flights.
In addition, I made the assumption that users would be willing to travel at any time of day in order to minimise the amount they pay, even if the cheapest ticket was for a flight at a particularly unsocial time.
I found that establishing a focused scope for my research and making some assumptions about traveller behaviour removed a lot of complexity. That being said, there were quite a few outstanding challenges to think about.
While I knew that I wanted to apply neural networks to this problem, I then had to choose from an overwhelming array of possible different neural network paradigms and architectures. There were two possible approaches I could have taken.
I decided to take Approach 1. My project was the first documented attempt to apply neural networks to this problem, it made much more sense to start simple and scale the complexity later — rather than starting complex and reducing the complexity in the event of failure.
To keep my research efficient, I wanted to minimise the time I spent implementing the networks and managing the data pipeline. I looked into several machine learning frameworks, and eventually settled on TensorFlow. I decided to take advantage of Keras: a machine learning library built on top of TensorFlow that offers powerful, expressive, and production-ready neural network components right out of the box.
I took advantage of Amazon’s Elastic MapReduce to cleanse, preprocess, and reformat the training data to extract the variables I needed. This data was piped directly into Pandas dataframes and fed verbatim to the training utilities provided by Keras. Keras and TensorFlow support handling data in mini-batches: a group of training patterns can be delegated to the GPU and processed in parallel to dramatically speed up the training process.
In order to evaluate our model’s performance reliably, not all of the data can be used for training: part must be reserved to assess performance on unseen situations. Intuitively, we might imagine that an appropriate split could be achieved by sorting the data chronologically, using the ‘early bird’ flight prices for training, and reserving the last-minute prices for evaluation. There are several reasons why I think that this is not necessarily the best approach.
Firstly, this approach only measures the ability of the network to extrapolate (project trends based on a set of historical data points into the future), and not to interpolate (fill in missing information between known data points). In the context of modelling flight prices, good performance on both extrapolation and interpolation are critical, since data for less popular routes is often sparse. Secondly, a temporal split could result in a situation where the evaluation data is unrepresentative of general trends. In other words, I would be evaluating each model’s performance solely on its ability to predict what happens in the final few days before departure, rather than the whole booking window. This is bad.
Splitting the data randomly is often a more reliable method. Though, owing to the fact that each price in the data set belongs to a wider time series corresponding to a specific flight, we cannot simply split the data randomly. I proposed that a more effective approach would be to first to group training points by the flights they refer to. We will reserve 10% of these flights for evaluation, and use the remaining 90% to train the network. This way, there is no possibility that the training process will divulge any sneak hints about the flights that we intend to use for evaluation.
Having conducted extensive research on the economics of airline pricing, I developed an understanding of what factors are taken into account by revenue management systems when deciding on flight pricing. I identified a subset of these factors to provide a basic set of inputs to the model.
There’s a problem here. As input, neural networks accept continuous data. Four of our six proposed inputs are not continuous. I had a few possible solutions to this:
I decided to implement the third approach: entity embeddings. Under my proposed mapping, each airport would be represented by a 12-dimensional vector. I refer to this mapping as Airport2Vec.
I decided to implement, train and evaluate four feed-forward neural network designs.
By gradually ramping up the complexity of the models, I was able to measure the impact of adding new inputs and adjusting the topology. It also allowed me to perform sanity checks and validate some of my ‘leap of faith’ assumptions early on in the process. All of the networks except LR-1 used Adam optimisation and ReLU activation to keep them ‘future-proof’.
It was reassuring to see that the performance of my models improved substantially with each iteration. As expected, LR-1 exhibited the poorest performance. The chart below illustrates the linear fit learned by the network for the two input variables I provided: flight duration, and days remaining until departure.
The addition of hidden layers in NN-1 meant that the network was able to fit a non-linear curve to the training data. This resulted in a 40% reduction in Mean Absolute Error (MAE) on the unseen training samples. The relationship that the model learned between duration, days to departure, and price also looked much cooler when it was plotted.
At this point, it was clear that the network didn’t have enough information to account for all of the variation in the data. The next logical step for NN-2 was to include information about the origin and destination airports for each flight.
The addition of airport information, compressed using Airport2Vec, led to a further 48% reduction in MAE on unseen evaluation samples versus NN-1. Curiously, NN-2 was able to identify asymmetrical pricing strategies on certain routes. For instance, the plot below shows that it’s generally considerably cheaper to fly one-way from London to New York than the other way around. Different taxes across national jurisdictions could be one of the reasons that this occurs.
When I constructed and trained PricingNet (the final iteration) and incorporated information regarding the day and week of departure, there was a further 13.5% improvement in MAE versus NN-2. The regression plot below illustrates for 10,000 random unseen evaluation samples PricingNet’s prediction versus the actual price.
The general y=x shape of this plot was reassuring to see, though it’s clear that there’s still some work to be done to account for the residual variation.
The results I obtained suggest that it would not only be feasible to develop a universal neural pricing model, but that it would also be straightforward in comparison to some of the previous work in this area. A simple network was able to account for most of the variance in pricing with just six input variables. I suspect that, had the network been trained on at least two years’ worth of pricing data, the performance improvements displayed by PricingNet would have been even more significant. This is because the network must really observe at least two full annual cycles in order to identify recurring seasonal trends.
Although the network performed well, there’s clearly work still left to be done to improve the overall accuracy. The remaining residual variation means that it might be misleading to present the network output to travellers — without the caveat that the information only represents a general trend, and not necessarily the exact prices the user can expect to encounter. Providing specific numeric values might give customers the impression that the network, in its current form, is more accurate than it actually is. Until further work can be carried out, it might be wise to instead provide a trend graph with an unlabelled price axis.
I learned throughout this process that it’s very important to apply ‘lean’ principles even when developing machine learning models. This means starting simple, restricting the input variables, and constraining the network topology. It’s then possible to iterate, measuring the performance of each model, and using this to inform the development of subsequent designs.
[1] O. Etzioni, R. Tuchinda, C. A. Knoblock and A. Yates, “To buy or not to buy: Mining airfare data to minimize ticket purchase price”, in Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2003, pp. 119–128.
At Skyscanner, we’re for travellers by travellers. Our employees can work up to 30 days a year from any of our 10 global offices on our SEE (Skyscanner Employee Experience) programme and even do up to 30 days home country working if you’re based in an office out of the country you call home. Of course, there are always chances to travel to the other offices for work trips or conferences too.
Like the sound of this? Look at our current Skyscanner Product Engineering job roles.
My name is Kieran, and I’m a Software Engineer at Skyscanner London and a recent graduate from the University of York. My team manages Skippy, responsible for redirecting millions of daily travellers to airline websites to purchase their tickets. Outside work, I’m a keen pianist and love learning more about technology, business, finance, aviation and French language.