This story draft by @escholar has not been reviewed by an editor, YET.

Optimizing Deep Reinforcement Learning for American Put Option Hedging: Hyperparameter Experiments

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Deep Reinforcement Learning

  2. Similar Work

    3.1 Option Hedging with Deep Reinforcement Learning

    3.2 Hyperparameter Analysis

  3. Methodology

    4.1 General DRL Agent Setup

    4.2 Hyperparameter Experiments

    4.3 Optimization of Market Calibrated DRL Agents

  4. Results

    5.1 Hyperparameter Analysis

    5.2 Market Calibrated DRL with Weekly Re-Training

  5. Conclusions

Appendix

References

4.2 Hyperparameter Experiments

The first round of analysis in this article is focused on examining hyperparameter impact on DRL American option hedging performance. For all hyperparameter experiments, training data is in the form of simulated paths of the GBM process. The time-to-maturity is one year, the option is at the money with a strike of $100, the volatility is 20%, and the risk-free-rate is equal to the mean expected return of 5%. All tests also use GBM data, and each test consists of 10k episodes, consistent with Pickard et al. (2024). Note that to assess the robustness and consistency of DRL agents, testing will be performed with transaction cost rates of both 1 and 3%. Table 1 summarizes the hyperparameters used in Pickard et al. (2024), and for each experiment, the hyperparameters not being analysed are held fixed at these values. This set of hyperparameters is referred to as the base case.


Table 1: Base Case Hyperparameter Summary


A first experiment will examine how DRL agent performance is impacted by actor and critic learning rates as a function of episodes. Specifically, a grid search is performed using actor learning rates of 1e-6, 5e-6, and 10e-6, critic learning rates of 1e4, 5e-4, and 10e-4, and episode lengths of 2500, 5000, and 7500. A second experiment will assess the impact of neural-network architectures. Similar to the first experiment, tests will be conducted across actor learning rates of 1e-6, 5e-6, and 10e-6, critic learning rates of 1e-4, 5e-4, and 10e-4, and NN architectures of 322 , 642 , and 643 , for both the actor and critic networks, respectively.


Next, an experiment is conducted to evaluate the impact of training steps. Here, DRL agents are trained using 10, 25, and 50 steps over the year of hedging. Moreover, to examine whether there is a performance impact based on the difference between training steps and testing re-balance periods, testing will be conducted by considering environments with 52 (weekly), 104 (bi-weekly), and 252 (daily) re-balancing times. To be clear, in this study, the steps per episode is synonymous with re-balance periods. As such, for this particular hyperparameter experiment, the training is comprised of 10, 25, and 50 rebalance periods (steps), spaced equally throughout the episode (which has length of 1 year), and the testing is comprised of 52, 104, and 252 rebalance periods (steps).



Authors:

(1) Reilly Pickard, Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada ([email protected]);

(2) F. Wredenhagen, Ernst & Young LLP, Toronto, ON, M5H 0B3, Canada;

(3) Y. Lawryshyn, Department of Chemical Engineering, University of Toronto, Toronto, Canada.


This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks