Table of Links
-
Similar Work
-
Methodology
-
Results
6 Conclusions
In conclusion, this paper extends the literature pertaining to the hedging of American options with DRL. First, a hyperparameter sensitivity analysis is performed to shed light on hyperparameter impact and provide a guide for practitioners aiming to implement DRL hedging. Results indicate not only the optimal hyperparameter sets for analyses across learning rates and episodes, learning rates and NN architectures, training steps, and transaction cost penalty functions, but provide key insight on what combinations should be avoided. For example, high learning rates should not be used in conjunction with high amounts of training episodes, and low learning rates should not be used when there is few training episodes. Instead, best results stem from middling values, and results also show that if the learning rates are low enough, increasing the number of training episodes will improve results. Further the analysis shows that two hidden layers with 32 nodes is too shallow for the actor and critic networks, and similar to the first experiment it is shown that when the learning rates are lowered, expanding the depth of the NN improves performance. Further, a third experiment shows that caution should be given to using too many training steps to avoid instability. A final hyperparameter test indicates that a quadratic transaction cost penalty function yields better results than a linear version.
This paper then built upon the key results of Pickard et al. (2024), by training DRL agents with asset paths from a market calibrated stochastic volatility model, noting that new DRL agents are trained each week to re-calibrated models. Results show that DRL agents trained each week outperform those trained solely on the sale date. Moreover, results show that DRL agents, both the single-train and weekly-train variants, outperform the BS Delta method at 1 and 3% transaction costs. As such, this paper has significant practical relevance, as it is shown that practitioners may use readily available market data to train DRL agents to hedge any option on their portfolio.
Authors:
(1) Reilly Pickard, Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada ([email protected]);
(2) F. Wredenhagen, Ernst & Young LLP, Toronto, ON, M5H 0B3, Canada;
(3) Y. Lawryshyn, Department of Chemical Engineering, University of Toronto, Toronto, Canada.
This paper is