Developer-first API stock broker
Alpaca platform is now accepting signups for our waitlist for algo trading and we do believe trading automation is the future.
One of the great benefits of algorithmic trading is that you can test your trading strategy against historical data. Especially for new strategies you developed on your own, you don’t really know how it will perform without testing it against reliable data. Algo trading is not that different from software development. Today, most good software code is continuously tested. As the code evolves, developers continually test it against real use cases to make sure that alterations won’t result in future failures. You want to test your algo when you drop the very first version. But you also want to test the algo when you make changes or adjustments while the algo is running.
While it is always good to test as many angles as possible, running a number of backtests is going to be a very data-intensive workload that requires access to enough data to have visibility into a very long history. This is why you cannot run this iteration using GDAX API directly. You need to store the data somewhere for your own purposes.
So the first question to come to mind is always how to get the data and prepare it for successful backtesting. Well, today I am going to tell you how to use MarketStore to acquire a long history of Bitcoin price data for this purpose of running the most accurate backtest possible. And this setup tutorial is going to be quick. Don’t waste your time setting up this and that. All you need is:
Got’em? Alright, let’s get started.
Here is the high level picture of today’s system. We will start a MarketStore instance using docker container, and run a background worker that calls GDAX price API so that we can pull the bitcoin historical price from their endpoint quickly and make it available for backtest clients to query over HTTP.
We will start another container for the client using python anaconda with python3 image. We use the official client package named pymarkestore. You will get a DataFrame from MarketStore.
There is the official build of MarketStore docker image today publicly available in DockerHub, but first, let’s write a config file for the server.
In the github repository you can find an example config file in YAML format: https://github.com/alpacahq/marketstore/blob/master/mkts.yml but I’m putting our example here.
- module: gdaxfeeder.so
query_start: "2018-01-01 00:00"
This configures the server so that it fetches the GDAX historical price API for 1-day bars since 2018–01–01. Save this config as $PWD/mkts.yml file. The server listens on the port 5993 as default. Now let’s bring up the server.
$ docker run -v $PWD/mktsdb:/project/data/mktsdb -v $PWD/mkts.yml:/tmp/mkts.yml --net host alpacamarkets/marketstore:v2.1.1 marketstore -config /tmp/mkts.yml
The server should automatically download the docker images from DockerHub if you haven’t, and start the server process with the config. Hopefully, you will see something like this.
I0430 05:54:56.091770 1 log.go:14] Disabling "enable_last_known" feature until it is fixed...
I0430 05:54:56.092200 1 log.go:14] Initializing MarketStore...
I0430 05:54:56.092236 1 log.go:14] WAL Setup: initCatalog true, initWALCache true, backgroundSync true, WALBypass false:
I0430 05:54:56.092340 1 log.go:14] Root Directory: /project/data/mktsdb
I0430 05:54:56.097066 1 log.go:14] My WALFILE: WALFile.1525067696092950500.walfile
I0430 05:54:56.097104 1 log.go:14] Found a WALFILE: WALFile.1525067686432055600.walfile, entering replay...
I0430 05:54:56.100352 1 log.go:14] Beginning WAL Replay
I0430 05:54:56.100725 1 log.go:14] Partial Read
I0430 05:54:56.100746 1 log.go:14] Entering replay of TGData
I0430 05:54:56.100762 1 log.go:14] Replay of WAL file /project/data/mktsdb/WALFile.1525067686432055600.walfile finished
I0430 05:54:56.101506 1 log.go:14] Finished replay of TGData
I0430 05:54:56.109380 1 plugins.go:14] InitializeTriggers
I0430 05:54:56.110664 1 plugins.go:42] InitializeBgWorkers
I0430 05:54:56.110742 1 log.go:14] Launching rpc data server...
I0430 05:54:56.110800 1 log.go:14] Launching heartbeat service...
I0430 05:54:56.110822 1 log.go:14] Enabling Query Access...
I0430 05:54:56.110844 1 log.go:14] Launching tcp listener for all services...
If you see something like “Response error: Rate limit exceeded”, that’s a good sign, not a bad one, since it means the background worker successfully fetched the price data and reached to rate limit. The fetch worker will suspend for a while and restart to catch up to the current price automatically. You just need to keep it running.
MarketStore implements JSON-RPC and MessagePack-RPC for query. MessagePack-RPC is particularly important for performance of a query on a large dataset. Thankfully, there is already python and go client library so you don’t have to implement the protocol. In this article, we use python. We start from miniconda3 image from another terminal.
$ docker run -it --rm -v $PWD/client.py:/tmp/client.py --net host continuumio/miniconda3 bash
# pip install ipython pymarketstore
We have installed ipython and pymarketstore, including their dependencies. From this terminal, let’s start an ipython shell and query MarketStore data.
(base) root@hq-dev-01:/# ipython
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help.
In : import pymarketstore as pymkts
In : param = pymkts.Params('BTC', '1D', 'OHLCV', limit=100)
In : df = pymkts.Client('http://localhost:5993/rpc').query(param).first().df()
In : df[-10:]
Open High Low Close Volume
2018-04-14 00:00:00+00:00 7893.19 8150.00 7830.00 8003.11 9209.196953
2018-04-15 00:00:00+00:00 8003.12 8392.56 8003.11 8355.25 9739.103514
2018-04-16 00:00:00+00:00 8355.24 8398.98 7905.99 8048.93 13137.432715
2018-04-17 00:00:00+00:00 8048.92 8162.50 7822.00 7892.10 10537.460361
2018-04-18 00:00:00+00:00 7892.11 8243.99 7879.80 8152.05 10673.642535
2018-04-19 00:00:00+00:00 8152.05 8300.00 8101.47 8274.00 11788.032811
2018-04-20 00:00:00+00:00 8274.00 8932.57 8216.21 8866.27 16076.648797
2018-04-21 00:00:00+00:00 8866.27 9038.87 8610.70 8915.42 11944.464063
2018-04-22 00:00:00+00:00 8915.42 9015.00 8754.01 8795.01 7684.827002
2018-04-23 00:00:00+00:00 8795.00 8991.00 8775.10 8940.00 3685.109169
Voila! You just got the daily bitcoin price in hand in the DataFrame format. Note the second line (param = …) determines which symbol and timeframe to query, with some query predicates such as the number of rows or date range to query. From here, you can do a number of things including calculating indicators such as moving average and bollinger band, or find the statistical volume anomaly using some scipy package.
I want to emphasize that it is very important to build a performant historical dataset to study and develop a trading algorithm, and you can do it quickly with MarketStore as we have just walked through. This article demonstrated how to work with the bitcoin prices from GDAX, but you can hook up other data sources as well pretty easily using pymarketstore’s write method. You can also write your own custom background data fetcher.
Again, the query performance is going to be critical when in comes to backtesting, since you want to iterate quickly to get the results. now You may wonder how fast MarketStore can be. I will show the lightning fast query speed with huge data set in the next post.
In the meantime, please leave any questions in the comments or ask @AlpacaHQ regarding this tutorial.
If you’re a hacker and can create something cool that works in the financial market, please check out our project “Commission Free Stock Trading API” where we provide simple REST Trading API and real-time market data for free.
Brokerage services are provided by Alpaca Securities LLC (alpaca.markets), member FINRA/SIPC. Alpaca Securities LLC is a wholly-owned subsidiary of AlpacaDB, Inc.