125 reads

Reinforcement Learning Agents Optimizes Trading in CDA Markets

by Reinforcement Technology AdvancementsJanuary 1st, 2025

Too Long; Didn't Read

This section delves into the mechanics of reinforcement learning (RL) agents in financial markets, focusing on Markov Decision Processes and the Limit Order Book in Continuous Double Auction (CDA) markets. It highlights the role of Proximal Policy Optimization (PPO) in enhancing trading decisions and market simulations.

featured image - Reinforcement Learning Agents Optimizes Trading in CDA Markets

This is Part 2 of a 11-part series based on the research paper “Reinforcement Learning In Agent-based Market Simulation: Unveiling Realistic Stylized Facts And Behavior”. Use the table of links below to navigate to the next part.

Table of Links

Part 1: Abstract & Introduction

Part 2: Important Concepts

Part 3: System Description

Part 4: Agents & Simulation Details

Part 5: Experiment Design

Part 6: Continual Learning

Part 7: Experiment Results

Part 8: Market and Agent Responsiveness to External Events

Part 9: Conclusion & References

Part 10: Additional Simulation Results

Part 11: Simulation Configuration

2. Important Concepts

2.1 Reinforcement Learning Agents

Mathematically, each RL agent solves a problem associated with a Markov Decision Process (MDP)[1]. A MDP is defined as a tuple (S, A, R, P, γ) with several key components:

• S is the state space, in our case a set of vectors describing the market limit order book and the agent’s account information,

• A is the action space which defines the specific orders agents can place.

• R denotes the reward function which specifies the immediate reward for taking an action in response to a particular state,

• P denotes the transition probability function, which outputs the probability of transition from one state to another by executing a given action.

• γ ∈ (0, 1) is the discount factor; a smaller discount factor lets the agent focus more on recent reward.

When using model-free RL methods such as in [19, 20], the dynamics of the system (i.e., the transition probability function P) can be unknown. If we denote the policy function of the RL agent as π, the agent solves the following problem

We choose the Proximal Policy Optimization (PPO [20]) method to optimize our RL agents.

2.2 Limit Order Book (LOB) in a Continuous Double Auction (CDA) Market

Almost all traditional financial exchanges today use a Continuous Double Auction market model. A continuous double auction (CDA) market allows traders to place buy and sell orders continuously at any time [21]. The CDA market maintains two limit order books (LOBs), one for buy orders and one for sell orders. Each order is an instruction placed by a trader who wants to buy or sell an asset at a specific or better price. Since the instruction contains a range of prices for execution this type of order is called a limit order. A market order is an instruction to buy or sell an asset immediately at the current market price. Generally, limit orders stay in the LOB until they are matched with an incoming market order.

Authors:

(1) Zhiyuan Yao, Stevens Institute of Technology, Hoboken, New Jersey, USA (zyao9@stevens.edu);

(2) Zheng Li, Stevens Institute of Technology, Hoboken, New Jersey, USA (zli149@stevens.edu);

(3) Matthew Thomas, Stevens Institute of Technology, Hoboken, New Jersey, USA (mthomas3@stevens.edu);

(4) Ionut Florescu, Stevens Institute of Technology, Hoboken, New Jersey, USA (ifloresc@stevens.edu).

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.

L O A D I N G
. . . comments & more!

About Author

Reinforcement Technology Advancements@reinforcement

Leading research and publication in advancing reinforcement machine learning, shaping intelligent systems & automation.

Read my stories Learn More

TOPICS

tech-stories #continuous-double-auction #cda-markets #reinforcement-learning #agent-based-market-simulation #financial-market-modeling #stylized-facts-in-finance #machine-learning-in-finance #rl-based-agents

THIS ARTICLE WAS FEATURED IN...

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas

Reinforcement Learning Agents Optimizes Trading in CDA Markets

Too Long; Didn't Read

Table of Links

2. Important Concepts

2.1 Reinforcement Learning Agents

2.2 Limit Order Book (LOB) in a Continuous Double Auction (CDA) Market

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES