Automate Submissions for the Numerai Tournament Using Azure Functions and Python

Written by papaemman | Published 2022/06/28
Tech Story Tags: data-science | azure-functions | @numerai | python | programming | big-data | data-analysis | data-analytics

TLDRNumerai offers the numerai-cli, a tool to help Data Scientists automate their weekly submission pipeline for the Numerai tournament. This CLI configures a Numerai Prediction Node in Amazon Web Services (AWS) and automatically deploys your model to it. This guide describes how I set up my own weekly submission pipeline from scratch, using Microsoft Azure and python for free. 🚀via the TL;DR App

tl;dr: Numerai offers the numerai-cli, a tool to help Data Scientists automate their weekly submission pipeline for the Numerai tournament. This CLI configures a Numerai Prediction Node in Amazon Web Services (AWS) and automatically deploys your model to it.

This guide describes how I set up my own weekly submission pipeline from scratch, using Microsoft Azure and python for free. 🚀

🤖 Just give me the code: https://github.com/papaemman/azure-functions-with-python

What is Numerai?

Numerai is a quant hedge founded in San Francisco, CA, in 2015, built on thousands of crowdsourced machine learning models. Every week, Numerai gives away data so that Data Scientists worldwide have free, hedge-fund quality data to develop their Machine Learning (ML) / Deep Learning (DL) models — data they would otherwise not have access to.

At the beginning of each round, competitors can stake NMR tokens (a cryptocurrency issued by Numerai) on the predictions they submit if they want to get paid (or "burnt") based on the predictive accuracy of their models. Staking on a model is fundamentally a type of "bet" competitors take towards the performance of their own model's predictions.

Numerai Compute CLI

Competing in Numerai is an ongoing, never-ending challenge because competitors need to download the new data every week and submit their predictions based on them.

Numerai Compute is a framework aiming to help participants automate their weekly submission workflow. Using the numerai-cli, participants can "provision" their cloud infrastructure in Amazon Web Services (AWS) and deploy their pre-trained models as a Prediction Node that can

  • be triggered by Numerai every weekend
  • download new tournament data
  • run the ML models
  • and upload predictions to Numerai

The numerai-cli configures a Numerai Prediction Node in Amazon Web Services (AWS). This solution is architected to cost less than $5/month on average, but actual costs may vary.

You need 4 things to use Numerai CLI: Docker, Python, Numerai API Keys, and AWS API Keys.

Why not just use Numerai Compute?

1. Cost

Last week was tough for the crypto market. The massive crash of LUNA carried away the whole crypto market. Currently (May 2022), the NMR token has a price of around 13$. Therefore, saving 5$ per month and investing them in buying more NMR to stake in your models is an appealing idea.

2. Learn something new

When building something from scratch, you have the opportunity to understand how it works under the hood. Last year, I deep-dived into Microsoft Azure cloud technologies, and I wanted to test if my skills were enough to build a real-world solution.

4. Don't have an AWS account

Many professionals prefer to work with other Cloud providers than AWS, such as Microsoft Azure or Google Cloud Platform. Setting up a new AWS account, adding billing information, and finding your way into the AWS console might seem like a lot of work. If you prefer to use Azure*, this guide is for you.*

Goal

Automate my weekly submission pipeline, for the Numerai Tournament, using Azure and python, with zero cost.

Azure offers many services adequate to solve this task, but I decided to use Azure Functions after some research.

Azure Functions is a cloud service available on-demand that provides all the continually updated infrastructure and resources needed to run your applications. You focus on the pieces of code that matter most to you, and Functions handles the rest. Functions provides serverless compute for Azure.

Going for a serverless architecture is a great way to save money and time because you don't need to pay for the infrastructure you don't use. And if you already have the code you want to run, setting up an Azure function is easy. Also, Azure offers a free tier, which is more than enough for this project's scope.

Thus my goal was to create an Azure Function that will:

  • ⏲️ be triggered every Sunday
  • 🔋interact with the Numerai API to download numerai_live_data and upload my predictions
  • 💾 interact with Azure Storage to read pre-trained ML/DL models and store numerai_live_data and numerai_predictions data
  • 📥 send a fun email notification after the submission
  • 😊 allow me to enjoy my weekend

MS Azure Architecture Design

Technical Implementation

Prerequisites

  1. Azure Account
  2. VSCode and Azure Extension
  3. Python 3.8
  4. Azure Functions Core Tools
  5. Azure Storage Explorer
  6. Numerai API Keys
  7. Trained ML/DL models and inference python code

Setup

1. Setup all prerequisites

  • Download and install the required software

  • Create your accounts in Azure and Numerai

2. Open VSCode and create a new local Azure function project

While you don't have any workspace opened in VSCode, go to the Azure extension, find FUNCTIONS and select _Create New Project…- S_elect Directory, Language (Python), and a python interpreter to create a virtual environment- Select the Time Trigger template, give a name to the function, and set up the timer to run every Sunday [* 0 0 * * SUN]. You don't need to set up the function to run in short time intervals. Azure functions support "Execute Function now…" to trigger it manually within VSCode for testing purposes.

3. Then, the VSCode will automatically create a Local project with a time-trigger function sample code.

## Azure function project codebase structure
.
├── .venv                     # Python Virtual environment
├── .vscode                   # Configuration options for VSCode
├── host.json                 # Configuration options that affect all functions in a function app instance
├── local.settings.json       # Maintains settings used when running functions locally.These settings aren't used when running in Azure
|
├── AzureFunction_1
│ ├── function.json           # Azure Function Settings
│ ├── __init__.py             # Python Code
│ └── readme.md               # Documentation
|
├── AzureFunction_2
│ ├── function.json           
│ ├── __init__.py             
│ └── readme.md               
|
└── requirements.txt          # Package dependencies

4. Run the Function locally (Open VSCode and Press F5 or Run →Start Debugging)

To debug, you must select a storage account for internal use by the Azure Functions runtime: Select a subscription, create a new storage account, create a new resource group, and select a location for resources. Keep in mind that this is a temporary resource group.

  • Go to Azure extension, find FUNCTIONS and go to Local Project. Find the function, right-click, and select Execute Function now…

  • Examine functions' logs

  • Stop the function (ctrl+c)

  • Delete the resource group

Important note ⚠️

If you get an error “connect ECONNREFUSED 127.0.0.1:9091”, update the extensionBundle to version “version”: “[2.*, 4.0.0)”, in host.json file.

5. Publish the project to Azure

  • Go to Azure extension_,_ find FUNCTIONS, and select Deploy to Function App…
  • Select a subscription and create a new function App
  • Select a runtime stack and location for the new resource

This will create a new resource group with an App Service Plan, a Function App, a Storage Account, and an Application Insights service.

6. Run the Function in Azure

  • Go to Azure extension, find FUNCTIONS, open the related subscription and find the function app.
  • Open the function app, find the specific function and select Execute Function now…. This will run the function to the Azure environment.
  • You can see the function logs in the Live metrics tab using the Application Insights resource.

7. Download the function app settings

  • Open VSCode and download remote settings in the Local environment for the function App (F1 and select Azure Functions: Download Remote Settings)

  • This will update the local.settings.json file with info about App service Plan, Function App, Storage Account, and Application Insights.

8. Setup the Azure Storage account.

  • Open the Azure Storage Explorer, connect with your Azure account and find the related storage account inside the newly created resource group.
  • Create a new Blob container called production-models and upload all your pre-trained ML models there.
  • Create a new Blob container called live-data to store numerai_live_data for every round.
  • Create a new Blob container called predictions-files to store numerai_predictions for every round.

9. Purchase the Twilio SendGrid SaaS offering from Microsoft Azure Marketplace

To send a notification email every time the function triggered and submitted predictions, I decided to use the Twilio SendGrid's Email API

  • Purchase the Twilio SendGrid offer (free tier) from Microsoft Azure Marketplace and configure your account on their platform.
  • Get API keys from Twilio SendGrid

10. Configure Function App

Having the Numerai API Keys and Twillio Seng Grid API keys hardcoded in the source code is a terrible practice. Azure Functions offers a dedicated Function's App configuration to store environmental variables and secrets.

  • Open the Azure portal, find the Function App service and go to Configuration. The complete set of Environment Variables is stored in the Application settings by default. This helps developers define input/output bindings with other resources in Azure.
  • Create 2 new application settings and store Numeai API Keys and SendGridKeys

NumeaiAPIKeys →secret_key=***;public_id=***”

SendGridKeyss →***

  • Open VSCode and re-download remote settings to the local environment (F1 and Azure Functions: Download Remote Settings) to get the new Application Settings into the local.settings.json file.

11. Write the Python code

Now it's time to write (or copy-paste) the python code responsible for reading the numerai live data & pre-trained ML/DL models and submitting predictions using Numerai API.

The input and output configurations (configurations and paths to load and store files) are the only thing you have to modify in your previous python code (running locally) to run it as an Azure Function. Instead of local paths, you should use the newly created Storage Account. For this, you need to use the Azure Storage Python SDK.

  • Go to the init.py file and replace your python code.

12. Run the Function locally and verify that it is working

  • Run the Function locally ( F5 or Run → Start Debugging), go to A_zure extension,_ find the Function in Local Project, and select `ExecutFunctionon now…
  • Check Function's logs and the connection with the Azure Storage Account
  • Stop the Function (ctrl+c)

13. Redeploy and verify the updated App

  • Go to Azure extension, find FUNCTIONS, and select Deploy to function App…
  • After the deployment completes, find the Function under the respective subscription in VSCode, right-click to function, and select Execute Function now…
  • Check Function's logs in the Metrics tab for this Function.

Conclusions

Last Sunday, I woke up to this fun email from myself, letting me know that everything went as planned. The Azure Function was triggered on Sunday at 00:00:00 UTC, the Numerai submissions were successful, and I was able to enjoy my day! 🌞

Email Notification for successful submission

PS. Check the awesome quote at the end. I added a random quote generator inside the function creating the email notification just to spark things up ✨

Get in touch

I hope you liked this guide. If you have any questions or just want to introduce yourself, don't hesitate to reach out!

Panagiotis Papaemmanouil


Written by papaemman | Mathematician turned Data Scientist turned Technology Entrepreneur. Passionate about data-driven real-world problems.
Published by HackerNoon on 2022/06/28