3 Key Tools for Deploying AI/ML Workloads on Kubernetes

Things move fast in the world of infrastructure technology. It wasn’t too long ago when running a database on Kubernetes was considered too tricky to be worth it. But that was yesterday’s problem. Builders of cloud-native applications have gotten good at running stateful workloads because Kubernetes is a powerful way to create virtual data centers quickly and efficiently.

The last time I wrote about this, I widened the aperture a bit to consider other parts of the application stack in the virtual data center – in particular, streaming workloads and analytics.

With these two moving into the mainstream in Kubernetes, the discussion about use cases gets more interesting.

What will we do with these foundational data tools if we have access to them? Thankfully we don’t have to investigate too deeply, because the industry has already picked the direction: AI/ML workloads.

What’s driving this is the need for faster and more agile MLOps to support online prediction, otherwise known as real-time artificial intelligence (AI). Companies like Uber and Netflix have been early adopters, but a host of great projects are available to get you going faster with Kubernetes.

Feature Serving with Feast

Building and maintaining machine learning (ML) models are moving out of the back office and closer to users in production. A feature store acts as a bridge between data and machine learning models, providing a consistent way for models to access data in both offline and online phases. It manages data processing requirements during model training and provides low-latency real-time access to models during the online phase. This ensures data consistency for both phases and meets online and offline requirements.

Feast is an example of a feature store running in Kubernetes. It’s open source and enables organizations to store and serve features consistently for offline training and online inference. Feast goes beyond traditional databases by providing specialized features like point-in-time correctness.

Model Serving with KServe

KServe is an API endpoint for deploying machine learning models in Kubernetes, handling model fetching, loading, and determining whether CPU or GPU is required. It integrates with KNative eventing for scale-out and offers observability features like metrics and logging. The best part? It’s simple to use. Just point KServe at your model file and it will create an API and handle the rest.

The explainer feature provides insight into why a decision was made for each prediction, offering feature importance and highlighting factors in the model that led to a particular outcome. This can be used to detect model drift and bias, which are some of the “important but hard” parts of machine learning. These features reduce the effort involved in MLOps and build trust in the application. KServe recently split away from the Google KubeFlow project and has been highlighted by Bloomberg as part of its efforts to build an ML inference platform.

Vector Similarity Search

Augmenting the traditional ways we find data, vector similarity search (VSS) is a machine learning tool that uses vector mathematics to find how "close" two things are to one another. This is done through the K-nearest neighbor (KNN) algorithm, which expresses data as a vector. The data is then vectorized using a CPU-intensive KNN algorithm and indexed for less CPU-intensive searching. End users can provide a vector and find things that are close to it using the query mechanism provided by VSS servers. Open-source VSS servers that you can deploy in Kubernetes include Weaviate and Milvus. Both provide everything you need to add a similarity search to your application stack.

Assemble the Team

Combine my previous article with this one and you have a recipe for the full stack deployed in Kubernetes. The outcome every organization should try to realize is increased productivity and reduced costs. Recent surveys show that leaders in the data space are finding both when deploying data infrastructure in Kubernetes.

AI/ML workloads may be something you’re just starting to explore, so now might be the best time to start on the right foot. The three areas mentioned — feature serving, model serving, and vector similarity search — are all covered in the book I co-authored with Jeff Carpenter, “Managing Cloud-Native Data with Kubernetes.” The bigger picture for AI/ML in the application stack: Real-time requirements will soon become prevalent in most AI applications. Using Kubernetes to go fast and build reliably is no longer an AI hallucination.

Also published here.

3 Key Tools for Deploying AI/ML Workloads on Kubernetes

Too Long; Didn't Read

People Mentioned

Feature Serving with Feast

Model Serving with KServe

Vector Similarity Search

Assemble the Team

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps

3 Key Tools for Deploying AI/ML Workloads on Kubernetes

Too Long; Didn't Read

People Mentioned

Feature Serving with Feast

Model Serving with KServe

Vector Similarity Search

Assemble the Team

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps