Table of Links
-
Similar Work
-
Methodology
-
Results
3.2 Hyperparameter Analysis
Islam et al. (2017) follow up on the work of Henderson et. al (2017) by examining the performance of DDPG and TRPO methods in the HalfCheetah and Hopper gym environments. They first note that it is difficult to reproduce the results of Henderson et al. (2017), even with similar hyperparameter configurations. Next, Islam et al. (2017) add to the literature an analysis of DDPG actor and critic learning rates. They conclude first that the optimal learning rates vary between the Hopper and HalfCheetah environments, before noting it is difficult to gain a true understanding of optimal learning rate choices while all other parameters are held fixed. Andrychowicz et al. (2020) perform a thorough hyperparameter sensitivity analysis for multiple on-policy DRL methods, but do not consider off-policy methods such as DDPG. Overall, Andrychowicz et al. (2020) conclude that performance is highly dependent on hyperparameter tuning, and this limits the pace of research advances. Several other studies find similar results when testing various DRL methods in different environments, whether through a manual search of hyperparameters or some predefined hyperparameter optimization algorithm (Ashraf et al. (2021), Kiran and Ozyildirim (2022), Eimer et al. (2022)).
As such, it is evident that hyperparameter configurations can drastically impact DRL agent results. However, much of the literature on DRL hyperparameter choices often utilize a generic, pre-constructed environment for analysis, rather than addressing real-life applicable problems. Therefore, this study aims to contribute to the DRL hedging literature, and the DRL space as a whole, by conducting a thorough investigation of how hyperparameter choices impact the realistic problem of option hedging in a highly uncertain financial landscape.
Authors:
(1) Reilly Pickard, Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada ([email protected]);
(2) F. Wredenhagen, Ernst & Young LLP, Toronto, ON, M5H 0B3, Canada;
(3) Y. Lawryshyn, Department of Chemical Engineering, University of Toronto, Toronto, Canada.
This paper is