paint-brush
Flighty App Can Now Predict Flight Delays: Here’s How You Can Also Do It Using Machine Learningby@kisican
New Story

Flighty App Can Now Predict Flight Delays: Here’s How You Can Also Do It Using Machine Learning

by Can KisiAugust 13th, 2024
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Flighty can now pinpoint an exact reason behind delays. It can identify the two largest causes of delay: late aircraft and airspace issues. It will also give users with advanced warning on delays prior to airlines. Such prediction models could be important in helping airlines optimize operations, improve passenger satisfaction, and lower operational costs.
featured image - Flighty App Can Now Predict Flight Delays: Here’s How You Can Also Do It Using Machine Learning
Can Kisi HackerNoon profile picture

Top flight tracking app, Flighty, is using machine learning as well as the data from the aviation authorities to now serve early warning signals of delays and pinpoint an exact reason behind delays. With this latest release, the app now is capable of identifying the two largest causes of delay—late aircraft and airspace issues—providing users with advanced warning on delays prior to airlines doing so. The idea behind the update is to help you make more informed decisions about your travel plans by giving you information that airlines typically won't.


For example, an airline might delay your travel by half an hour, then an hour, and so on. Flighty can alert you ahead of time that your flight is likely to be delayed for at least five hours due to something like an official ground stop at your airport or weather issues. You may want to take steps like rebooking or waiting a bit longer to head out to the airport.


We'll go in-depth into the mechanics behind Flighty's new feature and lead you through adding this powerful feature to your project: predicting flight delays using LSTM.

Introduction to Flight Delay Prediction

Flight delay prediction can be very challenging due to the number of factors that may turn out to be an influencing cause of delays, mostly related to weather, air traffic, and technical problems. Such prediction models could be important in helping airlines optimize operations, improve passenger satisfaction, and lower operational costs.

Key Features in Flight Delay Prediction

To build an effective flight delay prediction model, it is essential to use a variety of features that can influence delays. In this article, we will use the following features:


  • FL_DATE: The date of the flight.
  • DEP_DELAY: The departure delay in minutes.
  • ORIGIN_CITY_NAME: The city from which the flight departs.
  • DEST_CITY_NAME: The destination city.
  • CRS_DEP_TIME: The scheduled departure time.
  • DISTANCE: The distance of the flight in miles.

Data Preparation

Data Preparation is one of the major steps in the process of building a machine-learning model. We will use some historical data about flights and perform a couple of preprocessing steps: handling missing values, encoding categorical features, and normalizing the data.


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

# Load the flight data
flights_df = pd.read_csv('path_to_your_flight_data.csv')

# Convert FL_DATE to datetime and set as index
flights_df['FL_DATE'] = pd.to_datetime(flights_df['FL_DATE'])
flights_df.set_index('FL_DATE', inplace=True)

# Select relevant columns and drop rows with NaN values
features = ['DEP_DELAY', 'ORIGIN_CITY_NAME', 'DEST_CITY_NAME', 'CRS_DEP_TIME', 'DISTANCE']
flights_df = flights_df[features].dropna()

# Convert categorical features to dummy variables
flights_df = pd.get_dummies(flights_df, columns=['ORIGIN_CITY_NAME', 'DEST_CITY_NAME'])

# Normalize the data
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(flights_df)


Sequence Creation for LSTM

One kind of recurrent neural network is the long short-term memory or LSTM network, which is specially designed to learn long-term dependencies in time series data. First, it will be necessary to create a sequence of data points using LSTM.

# Create sequences
def create_sequences(data, seq_length):
    sequences = []
    for i in range(len(data) - seq_length):
        seq = data[i:i+seq_length]
        target = data[i+seq_length][0]  # DEP_DELAY is the target
        sequences.append((seq, target))
    return sequences

seq_length = 30
sequences = create_sequences(scaled_data, seq_length)

Train-Test Split

Next, we split the sequences into training and testing sets to evaluate the model’s performance.

# Split into train and test sets
train_size = int(len(sequences) * 0.8)
train_sequences = sequences[:train_size]
test_sequences = sequences[train_size:]

# Prepare the input and output
X_train, y_train = zip(*train_sequences)
X_train, y_train = np.array(X_train), np.array(y_train)

X_test, y_test = zip(*test_sequences)
X_test, y_test = np.array(X_test), np.array(y_test)

Building the LSTM Model

We then define and train an LSTM model. The model includes two LSTM layers with dropout layers to prevent overfitting and a dense output layer.

# Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(seq_length, X_train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(50))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))


Making Predictions

After training the model, we can use it to make predictions on the test data and visualize the results.

# Make predictions
predictions = model.predict(X_test)
predictions = scaler.inverse_transform(np.concatenate((predictions, np.zeros((predictions.shape[0], scaled_data.shape[1] - 1))), axis=1))[:, 0]

from sklearn.preprocessing import MinMaxScaler


future_flight_data = {
    'DEP_DELAY': 0,
    'ORIGIN_CITY_NAME': 'San Francisco, CA',
    'DEST_CITY_NAME': 'New York, NY',
    'CRS_DEP_TIME': 1230,
    'DISTANCE': 2904
}


future_flight_df = pd.DataFrame([future_flight_data])


future_flight_df = pd.get_dummies(future_flight_df, columns=['ORIGIN_CITY_NAME', 'DEST_CITY_NAME'])


scaler_columns = list(scaler.feature_names_in_)


for col in scaler_columns:
    if col not in future_flight_df.columns:
        future_flight_df[col] = 0


future_flight_df = future_flight_df[scaler_columns]

# Normalize the data using the fitted scaler
scaled_future_flight = scaler.transform(future_flight_df)


seq_length = 30
# Repeat the future flight data to create a sequence
future_sequence = np.array([scaled_future_flight] * seq_length)


future_sequence = future_sequence.reshape(1, seq_length, future_sequence.shape[2])


predicted_delay = model.predict(future_sequence)
predicted_delay = scaler.inverse_transform(
    np.concatenate(
        (predicted_delay, np.zeros((predicted_delay.shape[0], scaled_future_flight.shape[1] - 1))),
        axis=1
    )
)[:, 0]

print(f"Predicted delay for the specific future flight: {predicted_delay[0]:.2f} minutes")


Predicted delay for the specific future flight: 4.10 minutes

Interpreting the Results

The above plot shows actual and predicted delays. If the two lines show very little deviation, then the model is doing a fine job of predicting delays. However, there is always room for improvements to the model by fine-tuning hyperparameters or by adding more features or using more advanced architectures.

Challenges and Considerations

Despite the benefits, there are several challenges and considerations:

  • Data Quality: The quality and the completeness of the data itself heavily influence how good the predictions can be.
  • Feature Selection: The choice of correct features for the building of an effective model.
  • Model Complexity: The higher a model's complexity, the more computationally intensive and hard to interpret it is.

Conclusion

In general, machine learning in flight delay prediction is a very powerful tool; it can help bring gigantic efficiency to airlines in their operations and provide a better travelling experience to passengers. Go through the examples given in this article to implement your own model of a flight delay predictor and get a feel of the power of machine learning working in this domain.


This is just one of the newer features from Flighty, highlighting what's possible with machine learning to solve real-world problems. In the process of technological and data science advancement, this sort of model will go on improving in its accuracy and the types of problems it can be applied to, paving the way toward smarter and more efficient air travel.



Next Steps

For those interested in further enhancing their model, consider the following:

  • Hyperparameter Tuning: Use grid search or random search to zero in on the optimal hyperparameters of your model.
  • Feature Engineering: Explore other features which influence flight delays, like weather conditions, air traffic data, and aircraft type.
  • Advanced Architectures: Design experiments with deep architectures, bidirectional LSTMs or GRUs with attention mechanisms, to catch more complex patterns in this data.


Through continuous iteration and improvement, higher accuracy can be achieved for more reliable predictions, hence smoother and more efficient air travel.