paint-brush
How to Design a Deep Reinforcement Learning Agent for American Put Option Hedgingby@hedging

How to Design a Deep Reinforcement Learning Agent for American Put Option Hedging

by Economic Hedging TechnologyOctober 29th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This section details the methodology for creating a deep reinforcement learning (DRL) agent to hedge American put options, focusing on the design of the DDPG agent, the state and action space formulation, and the training and testing procedures. The study emphasizes avoiding unnecessary complexity, such as excluding the Black-Scholes Delta from the state space.
featured image - How to Design a Deep Reinforcement Learning Agent for American Put Option Hedging
Economic Hedging Technology HackerNoon profile picture
  1. Abstract

  2. Introduction

    Background

    Reinforcement Learning

    Similar Work

  3. Methodology

    DRLAgent Design

  4. Training Procedures

  5. Testing Procedures

  6. Results

  7. SABR Experiments

  8. Conclusions

  9. Appendix A

  10. References

Methodology

Successfully implementing a DRL agent to achieve the American put option hedging task requires three main steps: the design of the DRL agent, the setup of the training procedure, and the setup of a testing procedure. The design of the DRL agent features the construction of the neural network, the choice of hyperparameters, the formulation of state- and action-spaces, and a reward. The training procedure involves the data generation processes required to provide the agent with adequate state and reward information. Finally, forming testing scenarios requires the acquisition of data, and the development of a benchmark comparator for the DRL agent, such as the Delta hedging method. This section will first detail the DRL agent design before detailing the training and testing procedures for all experiments.

DRL AGENT DESIGN

As policy based DRL methods allow for continuous action-spaces, the DRL method employed in this study is DDPG. In this study, the actor and critic networks are both fully connected NNs with two hidden layers consisting of 64 nodes each. In both the actor and critic networks, the hidden layers use the rectified linear unit as the non-linear activation function. The actor network uses a sigmoidal output function to map the actions to the range [0, 1], while the critic network uses a linear output. Note that the actor output is multiplied by −1, as this study hedges American put options, which require shares to be shorted.





The state-space for the DRL agent in this study includes the current asset price, the time-to-maturity, and the current holding (previous action). This is aligned with the state-spaces used in Kolm and Ritter (2019), Cao et al. (2021), Xiao, Yao, and Zhou (2021), and Assa, Kenyon, and Zhang (2021). This study does not include the BS Delta in the state-space, agreeing with Cao et al. (2021), Kolm and Ritter (2019), and Du et al. (2020) in that the addition of the BS Delta is an unnecessary augmentation of the state. Moreover, the inclusion of the BS Delta in the state may hinder the effectiveness of an American DRL hedger, as the BS model is derived using European options.


The reward formulation used in this study is given by



Authors:

(1) Reilly Pickard, Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada ([email protected]);

(2) Finn Wredenhagen, Ernst & Young LLP, Toronto, ON, M5H 0B3, Canada;

(3) Julio DeJesus, Ernst & Young LLP, Toronto, ON, M5H 0B3, Canada;

(4) Mario Schlener, Ernst & Young LLP, Toronto, ON, M5H 0B3, Canada;

(5) Yuri Lawryshyn, Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON M5S 3E5, Canada.


This paper is available on arxiv under CC BY-NC-SA 4.0 Deed (Attribution-Noncommercial-Sharelike 4.0 International) license.