Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2. Relevant Work 2. Relevant Work 3. Methods 3. Methods 3.1 Models 3.1 Models 3.2 Summarising Features 3.2 Summarising Features 3.3 Calibration of Market Model Parameters 3.3 Calibration of Market Model Parameters 4. Experiments 4. Experiments 4.1 Zero Intelligence Trader 4.1 Zero Intelligence Trader 4.2 Extended Chiarella 4.2 Extended Chiarella 4.3 Historical Data 4.3 Historical Data 5. Discussion & Future Work 5. Discussion & Future Work 6. Significance, Acknowledgments, and References 6. Significance, Acknowledgments, and References 3.2 Summarising Features In order to evaluate the fidelity of a market simulator, a set of stylised facts are used which denote market features typically observed in historical data. In this work, we use these stylised facts as evaluation metrics to check that our simulator is able to faithfully reproduce realistic market behaviours. These stylised facts are typically extracted from summarising statistics of the data such as the auto-correlation of returns or distribution specific order types. As such, they represent biased metrics for calibrating a simulator to historical data. In order to make our calibration approach unbiased, we use the simulation and historical data directly, without constructing hand-crafted features. To do so, we pass the time-series data through an embedding network which transforms the high dimensional data to low dimensional summary features. We describe both stylised facts and embedding network features below. 3.2.1 Stylised Facts. Stylised facts represent features of markets that are thought to be universally true, regardless of the exchange or asset being traded. These include intuitively true features such as the inability to predict whether a price will go up or down using previous price trends alone. However, further analysis has shown that a number of these stylised facts do not hold under all circumstances, hence reference to them as ‘stylised’. Despite this, they have become broadly recognised as suitable metrics for assessing the fidelity of a market simulator. The stylised facts that we use for our assessment are described in [4, 13, 37, 41]. These are intermittency, absence of auto-correlations in return series, concavity of price impact, gain/loss asymmetry in returns, heavy tails and normality of log returns, long range memory of absolute returns, long range dependence of absolute returns, positive correlation between volume and volatility, negative correlation between returns and volatility, volatility clustering, Gamma distribution in order book volumes. 3.2.1 Stylised Facts. intermittency, absence of auto-correlations in return series, concavity of price impact, gain/loss asymmetry in returns, heavy tails and normality of log returns, long range memory of absolute returns, long range dependence of absolute returns, positive correlation between volume and volatility, negative correlation between returns and volatility, volatility clustering, Gamma distribution in order book volumes. 3.2.2 Embedding Network Features. The stylised facts above are a subset of a larger proportion of known metrics for quantifying the realism of a market simulator. However, their relevance to reproducing the features of a specific trading day is unclear. In this work, we explore new method for summarising market data by training an embedding network to transform high-dimensional data into low dimensional features that are then used for calibration. Here we use a multi-layer perceptron to perform the embedding, with further architecture details given in section 4. 3.2.2 Embedding Network Features. We use two different data features, the combination of price and total volume at the best bid and ask order at one second intervals, and the volume weighted average price (VWAP) of bid and ask orders. Future work will consider using larger models and more simulation output such as the entire LOB, as we discuss in section 5. Authors: (1) Namid R. Stillman, Simudyne Limited, United Kingdom (namid@simudyne.com); (2) Rory Baggott, Simudyne Limited, United Kingdom (rory@simudyne.com); (3) Justin Lyon, Simudyne Limited, United Kingdom (justin@simudyne.com); (4) Jianfei Zhang, Hong Kong Exchanges and Clearing Limited, Hong Kong (jianfeizhang@hkex.com.hk); (5) Dingqiu Zhu, Hong Kong Exchanges and Clearing Limited, Hong Kong (dingqiuzhu@hkex.com.hk); (6) Tao Chen, Hong Kong Exchanges and Clearing Limited, Hong Kong (taochen@hkex.com.hk); (7) Perukrishnen Vytelingum, Simudyne Limited, United Kingdom (krishnen@simudyne.com). Authors: Authors: (1) Namid R. Stillman, Simudyne Limited, United Kingdom (namid@simudyne.com); (2) Rory Baggott, Simudyne Limited, United Kingdom (rory@simudyne.com); (3) Justin Lyon, Simudyne Limited, United Kingdom (justin@simudyne.com); (4) Jianfei Zhang, Hong Kong Exchanges and Clearing Limited, Hong Kong (jianfeizhang@hkex.com.hk); (5) Dingqiu Zhu, Hong Kong Exchanges and Clearing Limited, Hong Kong (dingqiuzhu@hkex.com.hk); (6) Tao Chen, Hong Kong Exchanges and Clearing Limited, Hong Kong (taochen@hkex.com.hk); (7) Perukrishnen Vytelingum, Simudyne Limited, United Kingdom (krishnen@simudyne.com). This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv