Jupyter, TensorFlow/Keras, and TensorBoard are widely used in many Data Science & Machine Learning projects. Unfortunately, there are two main pain points often faced by scientists and engineers alike:
In some cases, work-arounds for these problems include the free Binder service as well as the notebook-based Google Colab, but both have their own strict limitations on resources (CPU, GPU, RAM, disk storage, and even uptime).
Instead, this short guide covers how to deploy an ML-ready Jupyter server and sync results with your preferred cloud compute provider. There are only 3 requirements:
To get a snazzy Jupyter workspace just like in the image above, download the accompanying code from this GitHub repository:
git clone https://github.com/iterative/blog-tpi-jupyter
cd blog-tpi-jupyter
terraform init # Setup local dependencies
If you like, have a look at the main.tf file — it contains all the config options you’d possibly want, such as custom hardware specs and spot price bidding. The default is an AWS EC2 G4 instance (with an NVIDIA Tesla T4 16GB GPU for a total cost of around $0.15/hour as of writing). Next, we need some environment variables:
export NGROK_TOKEN="..." # Sign up for free at https://ngrok.com
export TF_LOG_PROVIDER=INFO # (optional) Increase verbosity
export AWS_ACCESS_KEY_ID="..." # assuming AWS cloud provider
export AWS_SECRET_ACCESS_KEY="..."
See the authentication docs for other cloud providers.
Now time for magic! 🎩
terraform apply
In just a few minutes ⏱ this simple command:
shared
working directory to the bucket,
To see the logs (including the server URLs) at any point, simply run:
terraform refresh
⏳ If the URLs at the bottom of the output are blank ( urls = []
), the instance isn’t ready yet. Wait a few minutes before running terraform refresh
again. Eventually you’ll see:
Outputs:
urls = [
"Jupyter Lab: https://8c62-54-173-120-3.ngrok.io/lab?token=...",
"Jupyter Notebook: https://8c62-54-173-120-3.ngrok.io/tree?token=...",
"TensorBoard: https://6d52-54-173-120-3.ngrok.io",
]
Finally, when done experimenting, download the shared
working directory, delete the cloud storage, and terminate the cloud instance with one simple command:
terraform destroy
It uses Terraform Provider Iterative (TPI) under-the-hood. There are a few distinct advantages to TPI:
💰 Lower cost: use your preferred cloud provider's existing pricing, including on-demand per-second billing and bulk discounts. 🔄 Auto-recovery: spot/preemptible instances are cheap but unreliable. TPI reliably and automatically respawns such interrupted instances, caching & restoring the working directory in the cloud even when you are offline. 👓 Custom spec: full control over hardware & software requirements via a single main.tf config file — including machine types (CPU, GPU, RAM, storage) & images.
You can self-provision DS & ML hardware & software environments in the cloud with TPI.
⚡️ It's easier than you think:
git clone https://github.com/iterative/blog-tpi-jupyter
cd blog-tpi-jupyter
export NGROK_TOKEN="..." # Sign up for free at https://ngrok.com
export TF_LOG_PROVIDER=INFO # (optional) Increase verbosity
terraform init # Setup local dependencies
terraform apply # Create cloud resources & upload "shared" workdir
terraform refresh # Get Jupyter & TensorBoard URLs (rerun if blank)
# *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
# Click on the printed URLs and have fun!
# *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
# When done, download "shared" workdir & terminate cloud resources:
terraform destroy