paint-brush
Reinforcement Learning Agents Optimizes Trading in CDA Markets by@reinforcement
New Story

Reinforcement Learning Agents Optimizes Trading in CDA Markets

tldt arrow

Too Long; Didn't Read

This section delves into the mechanics of reinforcement learning (RL) agents in financial markets, focusing on Markov Decision Processes and the Limit Order Book in Continuous Double Auction (CDA) markets. It highlights the role of Proximal Policy Optimization (PPO) in enhancing trading decisions and market simulations.
featured image - Reinforcement Learning Agents Optimizes Trading in CDA Markets
Reinforcement Technology Advancements HackerNoon profile picture


This is Part 2 of a 11-part series based on the research paper “Reinforcement Learning In Agent-based Market Simulation: Unveiling Realistic Stylized Facts And Behavior”. Use the table of links below to navigate to the next part.

Part 1: Abstract & Introduction

Part 2: Important Concepts

Part 3: System Description

Part 4: Agents & Simulation Details

Part 5: Experiment Design

Part 6: Continual Learning

Part 7: Experiment Results

Part 8: Market and Agent Responsiveness to External Events

Part 9: Conclusion & References

Part 10: Additional Simulation Results

Part 11: Simulation Configuration

2. Important Concepts

2.1 Reinforcement Learning Agents

Mathematically, each RL agent solves a problem associated with a Markov Decision Process (MDP)[1]. A MDP is defined as a tuple (S, A, R, P, γ) with several key components:


• S is the state space, in our case a set of vectors describing the market limit order book and the agent’s account information,


• A is the action space which defines the specific orders agents can place.


• R denotes the reward function which specifies the immediate reward for taking an action in response to a particular state,


• P denotes the transition probability function, which outputs the probability of transition from one state to another by executing a given action.


• γ ∈ (0, 1) is the discount factor; a smaller discount factor lets the agent focus more on recent reward.


When using model-free RL methods such as in [19, 20], the dynamics of the system (i.e., the transition probability function P) can be unknown. If we denote the policy function of the RL agent as π, the agent solves the following problem



We choose the Proximal Policy Optimization (PPO [20]) method to optimize our RL agents.


2.2 Limit Order Book (LOB) in a Continuous Double Auction (CDA) Market

Almost all traditional financial exchanges today use a Continuous Double Auction market model. A continuous double auction (CDA) market allows traders to place buy and sell orders continuously at any time [21]. The CDA market maintains two limit order books (LOBs), one for buy orders and one for sell orders. Each order is an instruction placed by a trader who wants to buy or sell an asset at a specific or better price. Since the instruction contains a range of prices for execution this type of order is called a limit order. A market order is an instruction to buy or sell an asset immediately at the current market price. Generally, limit orders stay in the LOB until they are matched with an incoming market order.


Authors:

(1) Zhiyuan Yao, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);

(2) Zheng Li, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);

(3) Matthew Thomas, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]);

(4) Ionut Florescu, Stevens Institute of Technology, Hoboken, New Jersey, USA ([email protected]).


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.