Introduction Teams that work with Machine Learning (ML) workloads in production know that added complexity can bring projects for a grinding halt. While deploying simple ML workloads might seem like an easy task, the process becomes a lot more involved when you begin to scale and distribute these loads and implement tools like Kubernetes. Although Kubernetes allows teams to rapidly scale their organization's infrastructure, it also adds a layer of complexity that can become a major burden without the right tools. Today I'm going to introduce you to an OSS project known as Kubeflow that seeks to assist engineering teams with deploying ML workloads into production in Kubernetes. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. What is Kubeflow? Kubeflow is . Learn about Kubeflow use cases . the machine learning toolkit for Kubernetes here To use Kubeflow, the basic workflow is: Download and run the Kubeflow deployment binary. Customize the resulting configuration files. Run the specified script to deploy your containers to your specific environment. You can adapt the configuration to choose the platforms and services that you want to use for each stage of the ML workflow: data preparation, model training, prediction serving, and service management. You can choose to deploy your Kubernetes workloads locally, on-premises, or to a cloud environment. Deploying Kubeflow to Linode Kubernetes Service This guide describes how to use the CLI to deploy Kubeflow on Linode Kubernetes Service. kfctl Prerequisites Install kubectl Create LKS cluster Modify config file to point to LKS cluster .kube We are going to use the Kubeflow Operator to help deploy, monitor and manage the lifecycle of Kubeflow. It is built using the which offers an open source toolkit to build, test, package operators and manage the lifecycle of operators. Operator Framework The Kubeflow Operator is currently in incubation phase and is based on this . It is built on top of CR, and uses as the nucleus for Controller. design doc kfdef kfctl Deployment Instructions 1. Clone repository and deploy the CRD and controller this # git clone https: OPERATOR_NAMESPACE=operators kubectl create ns ${OPERATOR_NAMESPACE} kubectl create -f deploy/crds/kfdef.apps.kubeflow.org_kfdefs_crd.yaml kubectl create -f deploy/service_account.yaml -n ${OPERATOR_NAMESPACE} kubectl create clusterrolebinding kubeflow-operator --clusterrole cluster-admin --serviceaccount=${OPERATOR_NAMESPACE}:kubeflow-operator kubectl create -f deploy/operator.yaml -n ${OPERATOR_NAMESPACE} //github.com/kubeflow/kfctl.git && cd kfctl 2. Deploy . You can optionally apply if your Kubernetes version is 1.15+, which will allow only one instance or one deployment of Kubeflow on this cluster, which follows the singleton model. is used to provide constraints that only one instance of kfdef is allowed within the Kubeflow namespace. kfdef ResourceQuota kfdef ResourceQuota KUBEFLOW_NAMESPACE=kubeflow kubectl create ns ${KUBEFLOW_NAMESPACE} # kubectl create -f deploy/crds/kfdef_quota.yaml -n ${KUBEFLOW_NAMESPACE} # only deploy the k8s cluster is + and has resource quota support kubectl create -f <kfdef> -n ${KUBEFLOW_NAMESPACE} this if 1.15 The above can point to a remote URL or to a local kfdef file. For e.g., command will be: kubectl create -f https: //raw.githubusercontent.com/kubeflow/manifests/master/kfdef/kfctl_ibm.yaml -n ${KUBEFLOW_NAMESPACE} Since we are using Linode, you will obviously replace IBM Cloud with Linode! Testing Watcher and Reconciler One of the major benefits of using kfctl as an Operator is to leverage the functionalities around being able to watch and reconcile your Kubeflow deployments. The Operator is watching all the resources with the label. If one of the resources is deleted, the reconciler will be triggered and re-apply the kfdef to the Kubernetes Cluster. kfctl 1. Check the deployment is running. tf-job-operator kubectl get deploy -n ${KUBEFLOW_NAMESPACE} tf-job-operator # NAME READY UP-TO-DATE AVAILABLE AGE # tf-job-operator / m15s 1 1 1 1 7 2. Delete the deployment tf-job-operator kubectl deploy -n ${KUBEFLOW_NAMESPACE} tf-job-operator # deployment.extensions deleted delete "tf-job-operator" 3. Wait for 10 to 15 seconds, then check the deployment again. You will be able to see that the deployment is being recreated by the Operator's reconciliation logic. tf-job-operator kubectl get deploy -n ${KUBEFLOW_NAMESPACE} tf-job-operator # NAME READY UP-TO-DATE AVAILABLE AGE # tf-job-operator / s 0 1 0 0 10 Delete KubeFlow Delete KubeFlow deployment kubectl kfdef -n ${KUBEFLOW_NAMESPACE} --all delete Delete KubeFlow Operator kubectl -f deploy/operator.yaml -n ${OPERATOR_NAMESPACE} kubectl clusterrolebinding kubeflow-operator kubectl -f deploy/service_account.yaml -n ${OPERATOR_NAMESPACE} kubectl -f deploy/crds/kfdef.apps.kubeflow.org_kfdefs_crd.yaml kubectl ns ${OPERATOR_NAMESPACE} delete delete delete delete delete Deploying a basic Kubeflow Pipeline Now that you have Kubeflow running, let's port-forward to the Istio Gateway so that we can access the central UI. Learn more about Istio and its capabilities . here Access the UI Use the following command to set up port forwarding to the . Istio gateway NAMESPACE=istio-system kubectl port-forward -n istio-system svc/istio-ingressgateway : export 8080 80 Access the central navigation dashboard at: http: //localhost:8080/ Depending on how you’ve configured Kubeflow, not all UIs work behind port-forwarding to the reverse proxy. For some web applications, you need to configure the base URL on which the app is serving. Open the Pipelines UI When Kubeflow is running, access the Kubeflow UI at . https://localhost:8080 The Kubeflow UI looks like this: Click to access the pipelines UI. The pipelines UI looks like this: Pipelines Run a Basic Pipeline The pipelines UI offers a few samples that you can use to try out pipelines quickly. The steps below show you how to run a basic sample that includes some Python operations, but doesn’t include a machine learning (ML) workload: 1. Click the name of the sample, , on the pipelines UI: [Sample] Basic - Parallel Execution 2. Click : Create an experiment 3. Follow the prompts to create an and then create a . The sample supplies default values for all the parameters you need. The following screenshot assumes you’ve already created an experiment named and are now creating a run named : experiment run My experiment My first run 4. Click to create the run. Start 5. Click the name of the run on the dashboard: Experiments 6. Explore the graph and other aspects of your run by clicking on the components of the graph and other UI elements: Finishing up And that's about it! Now you should be ready to start taking the complexity out of running your ML workloads in your own Kubernetes clusters with Kubeflow. I hope you liked this post! More to come soon. Till then, here are some more Kubernetes and Docker best practices for managing and deploying containers.