This is Part 3 of a 11-part series based on the research paper “Reinforcement Learning In Agent-based Market Simulation: Unveiling Realistic Stylized Facts And Behavior”. Use the table of links below to navigate to the next part.
Part 1: Abstract & Introduction
Part 4: Agents & Simulation Details
Part 8: Market and Agent Responsiveness to External Events
Part 9: Conclusion & References
Part 10: Additional Simulation Results
Part 11: Simulation Configuration
The system contains a machine engine that organizes LOBs and settles trades, as well as a brokerage center that keeps track of each agent’s account, including the agent’s buying power and assets. All agents place market and limit orders to the matching engine through their brokerage accounts. The matching engine runs a CDA market model. The engine updates the latest LOB information and streams its state to each trading agent in real time.
The agents in this system are of two types: liquidity-taking (LT) agents and market-making (MM) agents. Each instance of these agents is formulated as an RL agent, each with its own parameters and reward function. Each agent observes the system independently, selects actions, receives feedback, and optimizes its own strategy. Each agent learns to adapt its strategy through actions (orders submitted) and feedback received (reward). The formulation of rewards is different for each agent, we provide details in the next section.
We highlight two aspects of our work which we think helped improve the realism of the simulation compared to prior work. First, all agents run in their own respective threads, thus all threads run in parallel and are not waiting for any other thread once they are launched. Second, all agents are heterogeneous. Even though some agents belong to the same category, they use different sets of hyperparameters, and this results in significantly different behavior for each agent.
Authors:
(1) Zhiyuan Yao, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(2) Zheng Li, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(3) Matthew Thomas, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);
(4) Ionut Florescu, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]).
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.