This is Part 2 of a 11-part series based on the research paper “Reinforcement Learning In Agent-based Market Simulation: Unveiling Realistic Stylized Facts And Behavior”. Use the table of links below to navigate to the next part.
Part 1: Abstract & Introduction
Part 4: Agents & Simulation Details
Part 8: Market and Agent Responsiveness to External Events
Part 9: Conclusion & References
Part 10: Additional Simulation Results
Part 11: Simulation Configuration
Mathematically, each RL agent solves a problem associated with a Markov Decision Process (MDP)[1]. A MDP is defined as a tuple (S, A, R, P, γ) with several key components:
• S is the state space, in our case a set of vectors describing the market limit order book and the agent’s account information,
• A is the action space which defines the specific orders agents can place.
• R denotes the reward function which specifies the immediate reward for taking an action in response to a particular state,
• P denotes the transition probability function, which outputs the probability of transition from one state to another by executing a given action.
• γ ∈ (0, 1) is the discount factor; a smaller discount factor lets the agent focus more on recent reward.
When using model-free RL methods such as in [19, 20], the dynamics of the system (i.e., the transition probability function P) can be unknown. If we denote the policy function of the RL agent as π, the agent solves the following problem
We choose the Proximal Policy Optimization (PPO [20]) method to optimize our RL agents.
Almost all traditional financial exchanges today use a Continuous Double Auction market model. A continuous double auction (CDA) market allows traders to place buy and sell orders continuously at any time [21]. The CDA market maintains two limit order books (LOBs), one for buy orders and one for sell orders. Each order is an instruction placed by a trader who wants to buy or sell an asset at a specific or better price. Since the instruction contains a range of prices for execution this type of order is called a limit order. A market order is an instruction to buy or sell an asset immediately at the current market price. Generally, limit orders stay in the LOB until they are matched with an incoming market order.
Authors:
(1) Zhiyuan Yao, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(2) Zheng Li, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(3) Matthew Thomas, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(4) Ionut Florescu, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]).
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.