This is Part 5 of a 11-part series based on the research paper “Reinforcement Learning In Agent-based Market Simulation: Unveiling Realistic Stylized Facts And Behavior”. Use the table of links below to navigate to the next part.
Part 1: Abstract & Introduction
Part 4: Agents & Simulation Details
Part 8: Market and Agent Responsiveness to External Events
Part 9: Conclusion & References
Part 10: Additional Simulation Results
Part 11: Simulation Configuration
In order to provide realistic answers to questions about RL behavior, it is crucial to ensure that the simulation is similar to a real market. We list a set of properties inherent in real-world financial markets. We then analyze the results of the simulations to see whether they exhibit these properties.
In general, RL agents assume unchanged dynamics of the system. However, in real-world scenarios when interacting with other traders and in a dynamic market, the system dynamics evolve. Assuming that the system is realistic, we investigate whether training RL agents during the simulation improves their behavior and enhances their adaptability to changing market conditions.
The following two sub-sections discuss in detail the design of the experiment to answer these two questions:
Can RL agents simulate a realistic market?
Should agent training persist during simulation?
A realistic simulation should look and behave like a real-world market. To analyze and quantify a realistic behavior we examine two aspects: statistical characteristics and market responsiveness. Statistical characteristics include distributional properties such as the price/return distribution, as well as time dynamics such as auto-correlations of the return series, volatility clustering, and more. Market responsiveness measures changes in several variables in response to a large sell order hitting the market.
4.1.1 Statistical Characteristics
Prior research using empirical financial time series data found some statistical properties that are shared by a wide range of assets and across various time periods. For instance, stock returns have a distribution that has a sharper peak and fatter tail, compared to a normal distribution. These statistical characteristics are known as stylized facts [1]. We examine whether the results of our simulation show these stylized facts previously evidenced in literature [1, 23]. We focus on the following well-known single-asset stylized facts:
Heavy tails and kurtosis decay with increasing ∆t,
Absence of auto-correlations,
Slow decay of auto-correlation in absolute returns,
Volatility clustering.
In addition to these variables which are the result of the entire market activity, we examine individual market makers’ performance records and the evolution of their inventory and PnL component. MM agents should have a stable inventory evolution and the absolute value of the inventory should be close to 0. MM agents should make most of their profit from the spread rather than the inventory PnL.
4.1.2 Market Responsiveness
A simulated market within a closed environment where trading is driven by traders without any outside information should be stable. However, we are interested in how the simulation will behave when we additional external impact is introduced. We want to see if changes in behavior align with documented literature on real-world cases. In a real-world market, a large liquidation typically causes price impact. Almgren and Chriss [24] assert that price impact manifests in two forms: temporary impact and permanent impact. Therefore, following a substantial sell order, the price is observed to undergo a transient decline, followed by a recovery to a level that remains below the pre-sale price. Additionally, a large directional order flow carries information, such as the potential for substantial liquidation. Market makers can capture the information from their order flows and adjust their strategies accordingly. In practical scenarios, when more orders come from one side than the other side, market makers tend to widen the spread and shift the center of price spread towards the direction which has fewer orders [25, 26, 27]. We examine the behavior of the MM agents to assess whether similar patterns are observed. To do this, we design two experiments with different types of informed traders. The first experiment introduces a flash-sale agent who places large sell orders to the simulated market. The second experiment involves dynamically altering the buy/sell preferences of LT agents throughout the simulation, prompting them to trade in the same direction within short time periods, thereby influencing price movements.
Authors:
(1) Zhiyuan Yao, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(2) Zheng Li, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(3) Matthew Thomas, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(4) Ionut Florescu, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]).
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.