paint-brush
Stock Market Prediction - the Holy Grail of Time Series Databy@sagarsidana
207 reads

Stock Market Prediction - the Holy Grail of Time Series Data

by Sagar SidanaJuly 24th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Predicting stock market prices, particularly for volatile stocks like Nvidia (NVDA), remains challenging due to numerous influencing factors such as correlated stocks, seasonal variations, and news impacts. Despite this complexity, the article explores various predictive methods including ARIMA, LSTM, and Prophet. Among these, Prophet, a tool developed by Facebook, stands out for its ability to handle non-linear trends and missing data automatically. Combined with moving averages, Prophet offers robust forecasts, albeit with some limitations in long-term accuracy and handling multiple correlated time series.
featured image - Stock Market Prediction - the Holy Grail of Time Series Data
Sagar Sidana HackerNoon profile picture


All images unless explicitly stated are generated using DALL·E 3 from here: https://www.bing.com/images/create

United States Stock Market Prediction

Stock Prediction has been the Holy Grail of Time Series Data because of the potential lucrative applications.


However, a huge number of dependent variables exist in stock market prediction for even one single stock.


The market price of correlated stocks, holidays, overall stock market valuations, seasonal differences, international finance news, company


financial news, company hype - to list it all would be an extremely long list.


This makes predicting stock market prices for a single company extremely challenging and difficult. Not to mention unreliable.


So what do we do?

I thought of taking the most difficult to predict stock -Nvidia (because of its high volatility and non-linear behavior over the last few months).


I tried the following methods:

  1. ARIMA
  2. Exponential smoothing
  3. LSTM
  4. Neural Networks
  5. Prophet
  6. XGBoost
  7. Moving Averages
  8. Random Forests
  9. Support Vector Regression
  10. Logistic Regression


Out of all these, I ensembled the two most promising ones.


I made several predictions of stock prices till the end of 2024.


Further than six months is a risk because of the huge number of variables that need to be accounted for while making predictions.


But here I have a script that can make reasonably accurate predictions about the stock market out of all of the above.


I’m sharing the code below.


Insert your favorite stock ticker to get a state-of-the-art prediction for till the end of 2024.


Seriously!


Editor’s note: This article should be relied upon for informational purposes only. Stocks can be speculative, complex, and involve high risks. This can mean high prices volatility and potential loss of your initial investment. You should consider your financial situation, investment purposes, and consult with a financial advisor before making any investment decisions. HackerNoon and its distribution partners disclaim any liability for losses or damages resulting from the use of this information and does not endorse or guarantee the accuracy, reliability, or completeness of the information within. #DYOR

The Methods I Choose and Why I Choose Them


The two methods I choose after trying all of the above were Prophet and moving averages.


There’s not much to discuss about moving averages - you simply predict the next price according to an average of the last chosen time period.


It works great when you have existing data but obviously, not for prediction. But combined with Prophet, it can yield powerful results. What is Prophet, you ask? See below!


Prophet - Meta’s All in One Swiss Knife for Time Series Prediction

We summarize the Research Paper published about Prophet below.


Prophet is an open-source time series forecasting tool developed by Facebook's Core Data Science team. According to the sources, the key points about Prophet are:


Additive Model with Non-Linear Trends: Prophet is based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.


Robust to Outliers and Missing Data: Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.


Fully Automatic Forecasting: Prophet can provide reasonable forecasts on messy data with no manual effort. It is designed to be robust to outliers, missing data, and dramatic changes in the time series.


Tunable Forecasts: The Prophet procedure includes many possibilities for users to tweak and adjust forecasts by using human-interpretable parameters to incorporate domain knowledge.


Accurate and Fast: Prophet is used extensively across Facebook for producing reliable forecasts, outperforming other approaches in most cases. The models are fit in Stan, allowing forecasts to be generated quickly


More details available in the References.

Advantages of Prophet


Can handle non-linear trends and complex patterns in the data using an additive model with piecewise linear trends.


Robust to outliers and missing data, making it suitable for messy real-world time series.


Fully automatic forecasting with reasonable results even without much manual effort.


Tunable forecasts that allow incorporating domain knowledge through human-interpretable parameters.


Available in both R and Python, sharing the same underlying Stan code for fitting models.


Relatively computationally efficient compared to other time series methods.


Provides an interpretable decomposition of the forecast into trend, seasonality, holiday and extra regressors components.

Disadvantages of Prophet

Subpar predictive performance compared to classical time series models in some cases.


Only appropriate for univariate time series, not designed for forecasting multiple correlated time series jointly.


Can only handle covariates representing holidays, not other types of regressors.


Trend component tends to explain the majority of the prediction (around 90% in one case study), making it essential to get the trend right.


Confidence intervals can be quite large, especially for long-term forecasts.


Requires some tuning of parameters like changepoint_prior_scale to get the best results.


The Source Code for the Stock Estimator That You Can Run on Your Local System

You will need an Alpha Vantage API key. The instructions to get that are available here:


https://www.alphavantage.co/support/#api-key


This is the requirements.txt file:

yfinance==0.2.18
pandas==1.5.3
numpy==1.23.5
matplotlib==3.7.1
prophet==1.1.2
statsmodels==0.13.5
scikit-learn==1.2.2
tensorflow==2.12.0
xgboost==1.7.5


This is the source code and in this section, you can replace with the desired ticker symbol:

#Change this symbol variable to get predictions of any stock ticker you may need

symbol = "NVDA" 
end_date = "2024-04-30"

#You may want to experiment with the forecast date. But normally

#greater the time length, the less accurate the prediction.

forecast_end = "2025-01-31"


Replace the Alpha Vantage API Key with your own in this code block below:

https://www.alphavantage.co/support/#api-key

# Replace with your Alpha Vantage API key
ALPHA_VANTAGE_API_KEY = "XXXXXXXXXXXXXXXXXXXX"


import requests
import pandas as pd
import numpy as np
from datetime import datetime
from prophet import Prophet
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report



# Replace with your Alpha Vantage API key
ALPHA_VANTAGE_API_KEY = "XXXXXXXXXXXXXXXXXXXX"





def get_stock_data(symbol, end_date):
    url = f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol={symbol}&outputsize=full&apikey={ALPHA_VANTAGE_API_KEY}"
    response = requests.get(url)
    data = response.json()

    if "Time Series (Daily)" not in data:
        raise ValueError("Error fetching data from Alpha Vantage API")

    df = pd.DataFrame(data["Time Series (Daily)"]).T
    df.index = pd.to_datetime(df.index)
    df = df.sort_index()
    df = df.astype(float)

    # Filter data up to end_date
    df = df[df.index <= end_date]

    return df[["4. close"]].rename(columns={"4. close": "y"})


def prepare_data_for_prophet(df):
    df_prophet = df.reset_index()
    df_prophet.columns = ['ds', 'y']
    return df_prophet


def moving_average_model(data, window_size):
    return data.rolling(window=window_size).mean()


def train_and_predict(df, periods):
    # Prophet model
    model_prophet = Prophet(
        yearly_seasonality=False,
        weekly_seasonality=False,
        daily_seasonality=False,
        changepoint_prior_scale=0.05
    )
    model_prophet.fit(df)
    future_prophet = model_prophet.make_future_dataframe(periods=periods)
    forecast_prophet = model_prophet.predict(future_prophet)

    # Moving average model
    ma_short = moving_average_model(df['y'], window_size=50)
    ma_long = moving_average_model(df['y'], window_size=200)

    # Combine forecasts
    forecast = forecast_prophet.copy()
    forecast['yhat_ma'] = np.concatenate([ma_long.values, np.full(periods, ma_long.iloc[-1])])
    forecast['yhat_ensemble'] = (forecast['yhat'] + forecast['yhat_ma']) / 2

    return model_prophet, forecast


def plot_predictions(historical_data, forecast, stock_name):
    plt.figure(figsize=(15, 8))

    # Plot historical data
    plt.plot(historical_data.index, historical_data['y'], label='Historical', color='blue')

    # Plot Prophet forecast
    forecast_start = forecast[forecast['ds'] >= '2020-01-01']
    plt.plot(forecast_start['ds'], forecast_start['yhat'], label='Prophet Forecast', color='red')

    # Plot Moving Average forecast
    plt.plot(forecast_start['ds'], forecast_start['yhat_ma'], label='MA Forecast', color='green')

    # Plot Ensemble forecast
    plt.plot(forecast_start['ds'], forecast_start['yhat_ensemble'], label='Ensemble Forecast', color='purple')

    plt.fill_between(forecast_start['ds'], forecast_start['yhat_lower'], forecast_start['yhat_upper'], color='red',
                     alpha=0.2)

    plt.title(f'{stock_name} Stock Price: Historical and Forecast (2020-2025)')
    plt.xlabel('Date')
    plt.ylabel('Close Price')
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.show()


def calculate_direction(series):
    return (series.diff() > 0).astype(int)


def evaluate_model(actual, predicted):
    actual_direction = calculate_direction(actual)
    predicted_direction = calculate_direction(predicted)
    return classification_report(actual_direction[1:], predicted_direction[1:])


def main():



# Change this symbol to get predictions of any stock ticker you may need	
    symbol = "NVDA" 
    end_date = "2024-04-30"
# You may want to experiment with the forecast date. But normally, the #greater the time length, the less accurate the prediction. 
    forecast_end = "2025-01-31"





    try:
        # Fetch historical data
        stock_data = get_stock_data(symbol, end_date)
        df_prophet = prepare_data_for_prophet(stock_data)

        # Train model and make predictions
        periods = (datetime.strptime(forecast_end, "%Y-%m-%d") - datetime.strptime(end_date, "%Y-%m-%d")).days
        model, forecast = train_and_predict(df_prophet, periods)

        # Print data info
        print(stock_data.head())
        print(stock_data.tail())
        print(f"\nTotal data points: {len(stock_data)}")
        print(f"Date range: {stock_data.index.min()} to {stock_data.index.max()}")

        # Plot predictions from 2020 to 2025
        plot_predictions(stock_data['2015':], forecast, "NVDA")

        # Print predictions for 2025
        predictions_2025 = forecast[forecast['ds'].dt.year == 2025]
        print(f"\nPredicted prices for 2025:")
        print(predictions_2025.groupby(predictions_2025['ds'].dt.to_period('Y')).agg(
            {'yhat_ensemble': ['mean', 'min', 'max']}))

        # Evaluate model
        actual = stock_data['y']
        predicted = forecast.set_index('ds')['yhat_ensemble'].loc[actual.index]
        print("\nClassification Report:")
        print(evaluate_model(actual, predicted))

    except Exception as e:
        print(f"Error: {e}")


if __name__ == "__main__":
    main()


Program Console Output (NVDA):


Program Output (NVDA)
…. head() and tail() output eliminated for brevity
Total data points: 6163
Date range: 1999-11-01 00:00:00 to 2024-04-30 00:00:00
Predicted prices for 2025:
     yhat_ensemble                        
              mean         min         max
ds                                        
2025    534.933425  534.059932  535.806918

Classification Report:
              precision    recall  f1-score   support
           0       0.51      0.46      0.48      2977
           1       0.54      0.59      0.56      3185
    accuracy                           0.53      6162
   macro avg       0.52      0.52      0.52      6162
weighted avg       0.52      0.53      0.52      6162

The Graphical Plot

Image output by running the code


This is what you get if you change the symbol to AAPL (with your own Alpha Vantage API key, of course)

Program Console Output (AAPL):

Total data points: 6163
Date range: 1999-11-01 00:00:00 to 2024-04-30 00:00:00

Predicted prices for 2025:
     yhat_ensemble                        
              mean         min         max
ds                                       
2025    159.003104  158.695001  159.311208

Classification Report:
              precision    recall  f1-score   support
           0       0.52      0.39      0.45      2965
           1       0.54      0.66      0.59      3197
    accuracy                           0.53      6162
   macro avg       0.53      0.53      0.52      6162
weighted avg       0.53      0.53      0.52      6162

The Graphical Plot

Image output by running the code


Have fun playing with the script.


Install the requirements.txt file with the following command and you are good to go, along with a free Alpha Vantage API Key of course.


pip install -r requirements.txt


Future Directions

These are really exciting times. Facebook (Meta) uses this method throughout its ecosystem, and it works well, especially where there is seasonality.


Now Meta has come up with a new predictor for time-series called NeuralProphet which is a huge improvement on the Prophet algorithm.


The program has been released but it is still in beta. It is expected to release in a functional condition soon.


Expect another blog when that happens!


The world is changing fast.


This was not a super-complex prediction system – it just used a super powerful prediction tool.


You can insert any stock ticker symbol you want at the point marked in the source code and you will get a prediction of the price for Jan 2025.

MSFT, anyone?


References:

  1. (Forecasting at scale [PeerJ Preprints])
  2. Prophet | Forecasting at scale. (facebook.github.io)
  3. Time Series Analysis using Facebook Prophet - GeeksforGeeks
  4. Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice. OTexts. ISBN-10: 0987507133, ISBN-13: 978-0987507136. Available online (fully) at Forecasting: Principles and Practice (3rd ed) (otexts.com)
  5. When to use Facebook Prophet - Crunching the Data
  6. Is Facebook Prophet suited for doing good predictions in a real-world project? - Artefact
  7. NeuralProphet: A Time-Series Modeling Library based on Neural-Networks | by Essi Alizadeh | Towards Data Science
  8. NeuralProphet
  9. https://ai.meta.com/blog/neuralprophet-the-neural-evolution-of-facebooks-prophet