In our last post “ ”, we introduced how to set up price data in five minutes and we got a lot of good feedback and contributions to the open source . How to Setup Bitcoin Historical Price Data for Algo Trading in Five Minutes bitcoin MarketStore The data speed is really important Today, I wanted to tell you how fast MarketStore is using the same data so that you can see the performance benefit of using the awesome open source financial timeseries database. Faster data means more backtesting and more training in machine learning Faster data means more backtesting and more training in machine learning for our trading algorithm. We are seeing a number of successful machine learning-based trading algos in the space, but one of the key points we learned is . It’s important not just for backtesting, but also for . the data speed is really important training AI-style algorithms since it by nature requires an iterative process This is another post to walk you through step by step. But TL;DR, it is really fast. Setup Last time, we showed how to setup the historical daily bitcoin price data with MarketStore. This time, we store all the minute-level historical prices using the same mechanism called background worker, but with a slightly different configuration. root_directory: /project/data/mktsdblisten_port: 5993# timezone: "America/New_York"log_level: infoqueryable: truestop_grace_period: 0wal_rotate_interval: 5enable_add: trueenable_remove: falseenable_last_known: falsetriggers: module: ondiskagg.soon: "*/1Min/OHLCV"config:destinations:- 5Min- 15Min- 1H- 1Dbgworkers: module: gdaxfeeder.soname: GdaxFetcherconfig:query_start: "2016-01-01 00:00"base_timefame: “1Min”symbols:- BTC Almost 2.5 years with more than 1 million bars The difference from last time is that background worker is configured to fetch 1-minute bar data instead of 1-day bar data, starting from 2016–01–01. That is . You will need to keep the server up and running for a day or so to fill all the data, since GDAX’s historical price API does not allow you to fetch that many data points quickly. almost 2.5 years with more than 1 million bars Again, the data fetch worker carefully controls the data fetch speed in case the API returns “Rate Limit” error. So you just need to sleep on it. Additional configuration here is something called “on-disk aggregate” trigger. What it does is to aggregate 1-minute bar data for lower resolutions (here 5 minutes, 15 minutes, 1 hour, and 1 day). Check the longer time horizon to verify the entry/exit signals In a typical trading strategy, . So it is a pretty important feature. You would need pretty complicated LEFT JOIN query to achieve the same time-windowed aggregate in SQL. But with MarketStore, all you need is this small section in the configuration file. you will need to check the longer time horizon to verify the entry/exit signals even if you are working on the minute level The machine we are using for this test is a typical Ubuntu virtual machine with 8 of Intel(R) Xeon(R) CPU E5–2673 v3 @ 2.40GHz, 32GB RAM and SSD. The Benchmark We are going to have a DataFrame object in which holds all the minute level historical price data of bitcoin since January of 2016 from the server. We compare MarketStore and PostgreSQL. python Unfortunately lots of people in this space are using some sort of SQL database PostgreSQL is not really meant to be the data store for this type of data, but for this purpose since there is no other alternative. That’s why we built MarketStore. unfortunately lots of people in this space are using some sort of SQL database The table definition of the bitcoin data in PostgreSQL side looks like this. btc=# \d pricesTable "public.prices"Column | Type | Modifiers--------+-----------------------------+-----------t | timestamp without time zone |open | double precision |high | double precision |low | double precision |close | double precision |volume | double precision | The code looks like this. # For postgresdef get_df_from_pg_one(conn, symbol):tbl = f'"{symbol}"'cur = conn.cursor() order by timestamp, so the client doesn’t have to do it cur.execute(f"SELECT t, open, high, low, close, volume FROM {tbl} ORDER BY t")times = []opens = []highs = []lows = []closes = []volumes = []for t, open, high, low, close, volume in cur.fetchall():times.append(t)opens.append(open)highs.append(high)lows.append(low)closes.append(close)volumes.append(volume) return pd.DataFrame(dict(open=opens,high=highs,low=lows,close=closes,volume=volumes,), index=times) # For MarketStoredef get_df_from_mkts_one(symbol):params = pymkts.Params(symbol, '1Min', 'OHLCV')return pymkts.Client('http://localhost:6000/rpc').query(params).first().df() You don’t need much client code to get the DataFrame object The input and output is basically the same, in that one symbol name is given, query the remote server over the network, and get one DataFrame. One strong benefit of MarketStore is since the wire protocol is designed to give an array of numbers efficiently. you don’t need much client code to get the DataFrame object The Result First, PostgreSQL %time df = example.get_df_from_pg_one(conn, 'prices')CPU times: user 8.11 s, sys: 414 ms, total: 8.53 sWall time: 15.3 s And MarketStore %time df = example.get_df_from_mkts_one('BTC')CPU times: user 109 ms, sys: 69.5 ms, total: 192 msWall time: 291 ms Both results of course look the same like below. In [21]: df.head()Out[21]:open high low close volume2016-01-01 00:00:00 430.35 430.39 430.35 430.39 0.07272016-01-01 00:01:00 430.38 430.40 430.38 430.40 0.94782016-01-01 00:02:00 430.40 430.40 430.40 430.40 1.63342016-01-01 00:03:00 430.39 430.39 430.36 430.36 12.56632016-01-01 00:04:00 430.39 430.39 430.39 430.39 1.9530 In [22]: df.shapeOut[22]: (1198274, 5) 50 times difference A bitcoin was about $430 back then… Anyway, . Remember, you may need to get the same data again and again for different kinds of backtesting and optimization as well as ML training. you can see the difference between 0.3 vs 15 seconds which is about 50 times difference Also you may want to query not just bitcoins but also other coins, stocks and fiat currencies, since the entire database wouldn’t fit into your main memory usually. Scalability advantage in MarketStore MarketStore can serve multiple symbol/timeframe in one query pretty efficiently, whereas with PostgreSQL and other relational databases you will need to query one table at a time, so there is also scalability advantage in MarketStore when you need multiple instruments. Querying 7.7K symbols for US stocks To give some sense of this power, here is the result of querying 7.7K symbols for US stocks done as an internal testing. %time dfs = example.get_dfs_from_pg(symbols)CPU times: user 52.9 s, sys: 2.33 s, total: 55.3 sWall time: 1min 26s %time dfs = example.get_dfs_from_mkts(symbols)CPU times: user 814 ms, sys: 313 ms, total: 1.13 sWall time: 6.24 s Again, the amount of data is the same, and in this case each DataFrame is not as large as the bitcoin case, yet the difference to expand to large number of instruments is significant (more than 10 times). You can imagine in real life these two (per instrument and multi-instruments) factors multiply the data cost. Alpaca has been using MarkStore in our production Alpaca has been using MarkStore in our production for algo trading use cases both in our proprietary customers and our own purposes. It is actually amazing that this software is available to everyone for free, and we leverage this technology to help our algo trading customers ( e). signup for early access is her Please follow and for fresh posts on Financial Market, Algorithmic Trading, Technology. Alpaca Automation Generation You can find us , if you use twitter. @AlpacaHQ If you’re a hacker and can create something cool that works in the financial market, please check out our project “ ” where we provide simple REST Trading API and real-time market data for free. Commission Free Stock Trading API Brokerage services are provided by Alpaca Securities LLC ( ), member FINRA/SIPC. Alpaca Securities LLC is a wholly-owned subsidiary of AlpacaDB, Inc. alpaca.markets