This is Part 7 of a 11-part series based on the research paper “Reinforcement Learning In Agent-based Market Simulation: Unveiling Realistic Stylized Facts And Behavior”. Use the table of links below to navigate to the next part.
Part 1: Abstract & Introduction
Part 4: Agents & Simulation Details
Part 8: Market and Agent Responsiveness to External Events
Part 9: Conclusion & References
Part 10: Additional Simulation Results
Part 11: Simulation Configuration
We use Google, Apple, and Amazon’s tick level limit order book data on 2012-06-21 from LOBSTER [28]. We are presenting graphs for Group A only due to page limitations. The continual learning Group A is the most interesting one to analyze.
Heavy tails and kurtosis decay. We measure leptokurtic behavior using excess kurtosis which should be significantly larger than 0. Decay means that kurtosis decreases as the sampling frequency of returns decreases (seconds, minutes, hours, etc.). Figure 3 shows the Quantile-Quantile plots of simulations for the continual training RL agents (Group A) and ZI agents respectively, compared with real data. The real data, as well as data obtained using RL and ZI agents, all exhibit strong fat tail return and price distributions. The results from the ZI agents simulation show a milder tail. Results from RL agents closely align with the real data. The average kurtosis from the 10 simulations is presented in Table 1. The table shows the kurtosis decays as data is sampled less frequently.
Absence of auto-correlations. [1] describes this property as small insignificant values for auto-correlation of the return time series unless within a very short time interval. Figure 3 shows a strong negative correlation in the first lag and that the auto-correlation function decays to 0 for increasing lags. The tighter boxplots in 3 imply the ZI agents generate a very consistent market, while the real data and RL agents exhibit more variability in this analysis. The author of [1] cites the bounce of market orders between bid and ask prices as the reason why the first lag shows stronger negative auto-correlation.
Slow decay of auto-correlation for absolute returns. The auto-correlation function of (1-second) absolute returns decays slowly as a function of the time lags, this implies a long-range time dependency of return magnitude. Figure 4 shows auto-correlation of absolute returns from the simulations using RL agents (Figure 4a) and the ones using ZI agents (Figure 4b). The blue lines in both plots indicate the autocorrelation of the absolute returns of the real data. It can be seen that the auto-correlations from all three methods are decreasing slowly with increasing lags. Figure 4b shows again tighter boxes indicating the simulation using ZI agents is more consistent. Comparing both plots, we can see that RL agents simulate a market in which the return magnitude has a larger autocorrelation than the market simulated by ZI agents.
Volatility clustering. Cont et al. [1] use autocorrelation of squared return time series to measure the volatility clustering effect. Figure 5 shows decreasing auto-correlations in both types of simulation. This indicates that they all exhibit volatility clustering effects. Similar to the auto-correlations of absolute returns, the auto-correlation in the ZI-agent simulated market shown in Figure 5b is significantly lower than that of the real data, indicating a less pronounced clustering effect in the ZI-agent simulated market than the RL-agent simulated market.
To assess realistic behavior, the MM agents should control their inventory and rely on the Profit and Loss (PnL) from spread (providing liquidity) rather than from inventory (long-term investments). Figure 6 shows the average inventory evolution of all MM agents in the three groups over time. The continual training group has the smallest variation in the inventory, while the untrained group has the largest variation. We further separate the MM’s PnL into profit from spread and profit from holding inventory. The MMs in the continual training group profit on average $1.26 million from spread and lose $0.22 million from inventory. The results for MM agents from other groups are similar.
Authors:
(1) Zhiyuan Yao, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(2) Zheng Li, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(3) Matthew Thomas, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(4) Ionut Florescu, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]).
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.