tl;dr: Numerai offers the numerai-cli, a tool to help Data Scientists automate their weekly submission pipeline for the Numerai tournament. This CLI configures a Numerai Prediction Node in Amazon Web Services (AWS) and automatically deploys your model to it.
This guide describes how I set up my own weekly submission pipeline from scratch, using Microsoft Azure and python for free. 🚀
🤖 Just give me the code: https://github.com/papaemman/azure-functions-with-python
Numerai is a quant hedge founded in San Francisco, CA, in 2015, built on thousands of crowdsourced machine learning models. Every week, Numerai gives away data so that Data Scientists worldwide have free, hedge-fund quality data to develop their Machine Learning (ML) / Deep Learning (DL) models — data they would otherwise not have access to.
At the beginning of each round, competitors can stake NMR tokens (a cryptocurrency issued by Numerai) on the predictions they submit if they want to get paid (or "burnt") based on the predictive accuracy of their models. Staking on a model is fundamentally a type of "bet" competitors take towards the performance of their own model's predictions.
Competing in Numerai is an ongoing, never-ending challenge because competitors need to download the new data every week and submit their predictions based on them.
Numerai Compute is a framework aiming to help participants automate their weekly submission workflow. Using the numerai-cli, participants can "provision" their cloud infrastructure in Amazon Web Services (AWS) and deploy their pre-trained models as a Prediction Node that can
The numerai-cli configures a Numerai Prediction Node in Amazon Web Services (AWS). This solution is architected to cost less than $5/month on average, but actual costs may vary.
You need 4 things to use Numerai CLI: Docker, Python, Numerai API Keys, and AWS API Keys.
Last week was tough for the crypto market. The massive crash of LUNA carried away the whole crypto market. Currently (May 2022), the NMR token has a price of around 13$. Therefore, saving 5$ per month and investing them in buying more NMR to stake in your models is an appealing idea.
When building something from scratch, you have the opportunity to understand how it works under the hood. Last year, I deep-dived into Microsoft Azure cloud technologies, and I wanted to test if my skills were enough to build a real-world solution.
Many professionals prefer to work with other Cloud providers than AWS, such as Microsoft Azure or Google Cloud Platform. Setting up a new AWS account, adding billing information, and finding your way into the AWS console might seem like a lot of work. If you prefer to use Azure*, this guide is for you.*
Automate my weekly submission pipeline, for the Numerai Tournament, using Azure and python, with zero cost.
Azure offers many services adequate to solve this task, but I decided to use Azure Functions after some research.
Azure Functions is a cloud service available on-demand that provides all the continually updated infrastructure and resources needed to run your applications. You focus on the pieces of code that matter most to you, and Functions handles the rest. Functions provides serverless compute for Azure.
Going for a serverless architecture is a great way to save money and time because you don't need to pay for the infrastructure you don't use. And if you already have the code you want to run, setting up an Azure function is easy. Also, Azure offers a free tier, which is more than enough for this project's scope.
Thus my goal was to create an Azure Function that will:
MS Azure Architecture Design
1. Setup all prerequisites
Download and install the required software
Create your accounts in Azure and Numerai
2. Open VSCode and create a new local Azure function project
While you don't have any workspace opened in VSCode, go to the Azure extension, find FUNCTIONS and select _Create New Project…- S_elect Directory, Language (Python), and a python interpreter to create a virtual environment- Select the Time Trigger template, give a name to the function, and set up the timer to run every Sunday [* 0 0 * * SUN]. You don't need to set up the function to run in short time intervals. Azure functions support "Execute Function now…" to trigger it manually within VSCode for testing purposes.
3. Then, the VSCode will automatically create a Local project with a time-trigger function sample code.
## Azure function project codebase structure
.
├── .venv # Python Virtual environment
├── .vscode # Configuration options for VSCode
├── host.json # Configuration options that affect all functions in a function app instance
├── local.settings.json # Maintains settings used when running functions locally.These settings aren't used when running in Azure
|
├── AzureFunction_1
│ ├── function.json # Azure Function Settings
│ ├── __init__.py # Python Code
│ └── readme.md # Documentation
|
├── AzureFunction_2
│ ├── function.json
│ ├── __init__.py
│ └── readme.md
|
└── requirements.txt # Package dependencies
4. Run the Function locally (Open VSCode and Press F5 or Run →Start Debugging)
To debug, you must select a storage account for internal use by the Azure Functions runtime: Select a subscription, create a new storage account, create a new resource group, and select a location for resources. Keep in mind that this is a temporary resource group.
Go to Azure extension, find FUNCTIONS and go to Local Project. Find the function, right-click, and select Execute Function now…
Examine functions' logs
Stop the function (ctrl+c)
Delete the resource group
Important note ⚠️
If you get an error “connect ECONNREFUSED 127.0.0.1:9091”, update the extensionBundle to version “version”: “[2.*, 4.0.0)”, in host.json file.
5. Publish the project to Azure
This will create a new resource group with an App Service Plan, a Function App, a Storage Account, and an Application Insights service.
6. Run the Function in Azure
7. Download the function app settings
Open VSCode and download remote settings in the Local environment for the function App (F1 and select Azure Functions: Download Remote Settings)
This will update the local.settings.json file with info about App service Plan, Function App, Storage Account, and Application Insights.
8. Setup the Azure Storage account.
9. Purchase the Twilio SendGrid SaaS offering from Microsoft Azure Marketplace
To send a notification email every time the function triggered and submitted predictions, I decided to use the Twilio SendGrid's Email API
10. Configure Function App
Having the Numerai API Keys and Twillio Seng Grid API keys hardcoded in the source code is a terrible practice. Azure Functions offers a dedicated Function's App configuration to store environmental variables and secrets.
NumeaiAPIKeys →secret_key=***;public_id=***”
SendGridKeyss →***
11. Write the Python code
Now it's time to write (or copy-paste) the python code responsible for reading the numerai live data & pre-trained ML/DL models and submitting predictions using Numerai API.
The input and output configurations (configurations and paths to load and store files) are the only thing you have to modify in your previous python code (running locally) to run it as an Azure Function. Instead of local paths, you should use the newly created Storage Account. For this, you need to use the Azure Storage Python SDK.
12. Run the Function locally and verify that it is working
13. Redeploy and verify the updated App
Last Sunday, I woke up to this fun email from myself, letting me know that everything went as planned. The Azure Function was triggered on Sunday at 00:00:00 UTC, the Numerai submissions were successful, and I was able to enjoy my day! 🌞
Email Notification for successful submission
PS. Check the awesome quote at the end. I added a random quote generator inside the function creating the email notification just to spark things up ✨
I hope you liked this guide. If you have any questions or just want to introduce yourself, don't hesitate to reach out!
🔥 Check my Numerai Models under the name PAPAEMMANhttps://numer.ai/papaemman1
Also Published here