Introduction
Following the sale of a financial option, traders seek to mitigate the associated risk through hedging strategies. For example, a trader that has just sold a European put option may hedge against adverse price drops by selling shares in the underlying asset. Traditionally, option hedgers attempt to maintain a Delta neutral portfolio, where Delta is the first partial derivative of the option price with respect to the underlying asset (Hull 2012). Therefore, the Delta neutral hedge for the European put option requires the sale of Delta shares of the underlying. For European options, analytic solutions for the option price and Delta exist via the Black and Scholes (BS) option pricing model (Black and Scholes 1973). While the BS model shows that the underlying risk of an option position is eliminated by a continuously rebalanced Delta-neutral portfolio, financial markets operate discretely in practice. Further, the BS model assumes constant volatility and no trading costs, which is not reflective of reality. As such, dynamic option hedging under market frictions is a progressive decision-making progress under uncertainty. One field that has garnered significant attention in dynamic decision-making procedures is reinforcement learning (RL), a subfield of artificial intelligence (AI). RL problems are aided in complex environments through neural network (NN) function approximation. The field concerning the combination of NN’s and RL is called deep RL (DRL), and DRL has been used to achieve super-human level performance in video games (Mnih et al. 2013), board games (Silver et al. 2016), and robot control (Lillicrap et al. 2015).
As the dynamic hedging problem requires decision-making under uncertainty, several recent studies have used DRL to effectively hedge option positions. A review of 17 studies that use DRL for dynamic stock option hedging is given by Pickard and Lawryshyn (2023), who detail that while many studies show that DRL outperforms a Delta-neutral strategy when hedging European options under transaction costs and stochastic volatility, no current studies consider the hedging of American options. For American put options specifically, there is no analytical pricing or hedging formula available due to the potential of early exercise, and numerical methods for option pricing and hedging are required (Hull 2012). To address the literature gap pertaining to DRL agents that consider the potential for early option exercise, this article details the design of DRL agents that are trained to hedge American put options under transaction costs. DRL agents in this study are designed using the deep deterministic policy gradient (DDPG) method. In addition to training an American put DRL hedger when the underlying asset price follows a geometric Brownian motion (GBM), stochastic volatility is considered by calibrating stochastic volatility models using empirical option data on several stock symbols. Once these DRL agents are tested on simulated paths generated by the calibrated model, the hedging performance of each DRL agent is evaluated on the empirical asset price path for the respective symbol between the sale and maturity dates.
Finally, note that DRL agent reward function requires the option price at each time step. While an interpolation of a binomial American option tree is used in GBM cases, this study employs the use of a Chebyshev interpolation method first proposed by Glau, Mahlstedt, and Potz (2018) for the determination of the option price in stochastic volatility experiments. This Chebyshev method is model-agnostic, and this work thereby provides a framework that extends seamlessly to more intricate processes. Moreover, the Chebyshev method allows the American option price to be computed more efficiently in stochastic volatility settings, as the requirement to average the payoff of several thousand Monte Carlo (MC) simulations from the current price level to expiry or exercise is eliminated. This Chebyshev pricing method is described in detail in the methodology section of this work.
The rest of this section is dedicated to the introduction of DRL and a detailed account of similar work in the DRL hedging space. This article will then detail the methodology used to train DRL agents, before presenting and discussing the results of all numerical experiments.
Authors:
(1) Reilly Pickard, Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada ([email protected]);
(2) Finn Wredenhagen, Ernst & Young LLP, Toronto, ON, M5H 0B3, Canada;
(3) Julio DeJesus, Ernst & Young LLP, Toronto, ON, M5H 0B3, Canada;
(4) Mario Schlener, Ernst & Young LLP, Toronto, ON, M5H 0B3, Canada;
(5) Yuri Lawryshyn, Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON M5S 3E5, Canada.
This paper is