paint-brush
Installing and Configuring Kubeflow with MinIO Operatorby@minio
5,823 reads
5,823 reads

Installing and Configuring Kubeflow with MinIO Operator

by MinIOAugust 24th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

In this blog post we are going to configure Kubeflow to use a large MinIO Tenant on the same Kubernetes cluster.
featured image - Installing and Configuring Kubeflow with MinIO Operator
MinIO HackerNoon profile picture

Kubeflow is a modern solution to design, build and orchestrate Machine Learning pipelines using the latest and most popular frameworks. Out of the box, Kubeflow ships with MinIO inside to store all of its pipelines, artifacts and logs, however that MinIO is limited to a single PVC and thus cannot benefit from all the features a distributed MinIO brings to the table such as Active-Active Replication, unlimited storage via Tiering - and so much more.


In this blog post we are going to configure Kubeflow to use a large MinIO Tenant on the same Kubernetes cluster, but of course, this configuration applies to Kubeflow and MinIO being on different clusters as well. For your reference, please see our earlier blog post, Machine Learning Pipelines with Kubeflow and MinIO on Azure, and the Kubeflow site.


While we go from soup to nuts in this blog post, if you already have a Kubeflow setup and a MiniO setup, you can skip straight to the Configure Kubeflow section of this blog post to see what needs to be configured.

Setting up the MinIO Operator

Let's start by installing the MinIO Operator and creating a tenant that Kubeflow will use. My favorite way to install MinIO Operator is via kubectl apply -k but we also have Helm Charts available, and we are also available on the AWS MarketplaceGoogle Cloud Marketplace and Azure Marketplace.



kubectl apply -k github.com/minio/operator/


Setting up the MinIO Operator


This will install the latest and greatest MinIO Operator, now we just need to log into the Operator UI and create a tenant. For this step we'll get a service account JWT token to login, but   this UI can also be secured with AD/LDAP or OIDC.



kubectl -n minio-operator  get secret $(kubectl -n minio-operator get serviceaccount console-sa -o jsonpath="{.secrets[0].name}") -o jsonpath="{.data.token}" | base64 --decode && echo ""


Logging into the Operator


Now let's port forward the UI and login.



kubectl -n minio-operator port-forward svc/console 9090


Port forwarding the UI and login


Now open a browser, go to http://localhost:9090 and login with the JWT token we got on the previous step.


Logging in with the JWT token


After logging in, click on Create Tenant and set up a 1TiB tenant.


Creating Tenant


Enter the name of the new tenant and the namespace for it.


Entering the new tenant


If the namespace doesn't exist you have the option to create the namespace.


Creating a new namespace


Now let's size the tenant. I'll be setting up a 4 node cluster that has 4 drives on each node, in this case, because we’re on Kubernetes, node or server translates to pods and drives per server translates to PVCs per pod.


I'm also starting with 1TiB of capacity but you can always expand the capacity of the tenant.


Sizing the tenant


Let's go to Identity Provider and create a basic user that will be used by Kubeflow. If you choose to configure an external identity provider that uses OpenID or Active Directory/LDAP, you can just go ahead and create a service account after you log in to the tenant.


Identity Provider


Lastly, we'll disable TLS just to keep this blog post from getting too long, but if you want to have TLS enabled on your tenant, you'll need a certificate configured on the tenant that Kubeflow trusts.


Disabling TLS


And that's it, just hit Create and the tenant will be created in a few minutes.


Creating the tenant


New Tenant Created


That's it, now you have a distributed, high performance, hyper scale object storage that can be expanded endlessly. From here, let's configure Kubeflow to use this MinIO deployment.

Setting up Kubeflow

In this section, we'll set up Kubeflow from scratch on Kubernetes. This works for on-premise deployments, development environments or any public cloud, although cloud providers frequently  offer a pre-configured version of Kubeflow.


We'll be using the kubeflow/manifest repository. Bear in mind there are some strict requirements for this to work, for example, the highest version of Kubernetes supported by Kubeflow 1.5.0 (at the time of writing) is 1.21 so make sure you’re using  a Kubernetes cluster that meets this requirement.


One additional requirement is to have Kustomize version 3.2.0, and that's it.


Let's start by cloning the kubeflow/manifest repository



git clone https://github.com/kubeflow/manifests


Cloning kubeflow/manifest repository


Then change directories the manifest folder and run the following command:



cd manifests
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done


This command will take a few minutes to install all the resources needed by Kubeflow. If anything fails to apply, the command will continue attempting to apply it until it succeeds entirely.


Installing Kubeflow


After a few minutes, you can confirm all the pods in the kubeflow namespace are up and running:



kubectl -n kubeflow get pods


Pods in kubeflow


Now we will configure Kubeflow to use our new MinIO.

Configure Kubeflow

The following section is the core of connecting Kubeflow and MinIO. Please note that the resources that need to be modified in this section are also what you'd tweak if you were starting with an existing Kubeflow deployment.


We are going to edit a variety of Config Maps, Secrets and Deployments on the kubeflow namespace first, and then on any existing user namespaces.


All of these steps assume MinIO is running in the ns-1 namespace and running on port 80. If you were running the tenant with TLS you'd use port 443.


Tenant URL: minio.ns-1.svc.cluster.local


Tenant Port: 80

Edit Configmaps

pipeline-install-config


Edit the pipeline-install-config config map and add the following fields to .data:

minioServiceHost: minio.ns-1.svc.cluster.local
minioServicePort: "80"


Edit command:


kubectl -n kubeflow edit cm pipeline-install-config

workflow-controller-configmap

Edit the configmap workflow-controller-configmap and configure the endpoint field inside the s3 section to point to your tenant


s3:
      endpoint: "minio.ns-1.svc.cluster.local:80"


Use this command to edit the configmap:

kubectl -n kubeflow edit cm workflow-controller-configmap

ml-pipeline-ui-configmap

Edit the ml-pipeline-ui-configmap configmap and replace the json content of viewer-pod-template.json with the following json:


{
  "spec": {
    "containers": [
      {
        "env": [
          {
            "name": "AWS_ACCESS_KEY_ID",
            "valueFrom": {
              "secretKeyRef": {
                "name": "mlpipeline-minio-artifact",
                "key": "accesskey"
              }
            }
          },
          {
            "name": "AWS_SECRET_ACCESS_KEY",
            "valueFrom": {
              "secretKeyRef": {
                "name": "mlpipeline-minio-artifact",
                "key": "secretkey"
              }
            }
          },
          {
            "name": "AWS_REGION",
            "valueFrom": {
              "configMapKeyRef": {
                "name": "pipeline-install-config",
                "key": "minioServiceRegion"
              }
            }
          }
        ]
      }
    ]
  }
}


Use this command to edit the configmap:


kubectl -n kubeflow edit cm ml-pipeline-ui-configmap


Make sure the indentation structure of the json matches the existing format.


Making sure the indentation structure matches the existing format

Edit Secrets

We will update the secret that holds the credentials to MinIO, however these are meant to be base64 encoded, so you can encode them with shell:



echo -n "kubeflow" | base64 
echo -n "kubeflow123" | base64 


Updating the secret that holds the credentials to MinIO

mlpipeline-minio-artifact

Edit the secret mlpipeline-minio-artifact and set these values in the .data field

data:
  accesskey: a3ViZWZsb3c=
  secretkey: a3ViZWZsb3cxMjM=


Use this command to edit the configmap:


kubectl -n kubeflow edit secret mlpipeline-minio-artifact

Edit Deployments

We will now edit the deployments last to cause a pod restart and to get everything ready.

ml-pipeline-ui

Edit the ml-pipeline-ui deployment and add the following environment variables:


- name: AWS_ACCESS_KEY_ID
  valueFrom:
    secretKeyRef:
      name: mlpipeline-minio-artifact
      key: accesskey
- name: AWS_SECRET_ACCESS_KEY
  valueFrom:
    secretKeyRef:
      name: mlpipeline-minio-artifact
      key: secretkey
- name: MINIO_NAMESPACE

- name: MINIO_HOST
  value: minio.ns-1.svc.cluster.local
- name: MINIO_PORT
  value: "80"


Note: make sure to edit the MINIO_NAMESPACE environment variable to be empty, this is critical as that environment variable is already present in the deployment.


Use the following command to edit the configmap:


kubectl -n kubeflow edit deployment ml-pipeline-ui

ml-pipeline

Edit the ml-pipeline deployment and add the following environment variables:


- name: OBJECTSTORECONFIG_HOST
  valueFrom:
    configMapKeyRef:
      name: pipeline-install-config
      key: minioServiceHost
- name: OBJECTSTORECONFIG_PORT
  value: "80"


Use the following command to edit the deployment:

kubectl -n kubeflow edit deployment ml-pipeline

Configure Every User Namespace

This is also very important, for every user namespace, patch the ml-pipeline-ui-artifact deployment in that namespace and the artifact secret. For example, in my case my namespace is kubeflow-user-example-com since we used the example manifest.


Edit the secret mlpipeline-minio-artifact and set these values in the .data field:


data:
  accesskey: a3ViZWZsb3c=
  secretkey: a3ViZWZsb3cxMjM=


Edit the  ml-pipeline-ui-artifact and add the following environment variables


- name: MINIO_NAMESPACE
- name: MINIO_HOST
  value: minio.ns-1.svc.cluster.local
- name: MINIO_PORT
  value: "80"


Use the following command to edit the artifact:


kubectl -n kubeflow-user-example-com edit secret mlpipeline-minio-artifact

kubectl -n kubeflow-user-example-com edit deployment ml-pipeline-ui-artifact


At this point Kubeflow is properly configured to use your tenant. There’s one last step and then we are good to test our deployment.

Migrate All Data from Kubeflow's Internal MinIO to the New Tenant

Now that we have configured everything, we just need to make sure the data Kubeflow is expecting to be in its buckets is actually there. Let’s copy that data over and then shutdown the internal MinIO that we’re replacing.


To achieve this we will use MinIO Client (mc), a CLI tool for managing MinIO. We'll do all these operations from a pod running inside Kubernetes, but you can do this via port-forwarding and using mc from your own machine if you choose to do so.


Let's run a pod with an Ubuntu shell:


kubectl -n kubeflow run my-shell  -i --tty --image ubuntu -- bash


Running a pod with an Ubuntu shell


This shell runs on a pod running inside our Kubernetes cluster in the Kubeflow namespace.


Now we will:


  1. Install wget

  2. Download mc

  3. Make mc executable

  4. Add an alias to the current MinIO

  5. Add an alias to the new MinIO

  6. Copy all the data


To accomplish this we run the following commands:


apt update && apt install -y wget
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
mv mc /usr/local/bin/
mc config host add kubeflow http://minio-service.kubeflow.svc.cluster.local:9000 minio minio123
mc config host add tenant http://minio.ns-1.svc.cluster.local kubeflow kubeflow123
mc mirror kubeflow tenant


Finally, turn off the internal MinIO as it is no longer required.

kubectl -n kubeflow scale deploy minio --replicas=0


All right! We are done moving to the full MinIO deployment.

Validate that Kubeflow is Using the new MinIO

Next we’ll validate  the setup and run some pipelines.


If you go to MinIO Operator, you can see the tenant now has data:


Tenant data


Click the tenant, and then click Console in the top right of the browser window to open MinIO Console in order to browse that tenant.


Console window


From this view, you can see the mlpipeline bucket. Click browse to see its contents.


The mlpipeline bucket


You'll see the existing demo pipelines have been copied over.


Demo pipelines


Now let's go into Kubeflow and run some pipelines, you can use port forwarding  to expose the Kubeflow central dashboard:



kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80


Then in your browser go to http://localhost:8080.


Logging into your account


Login with the default credentials for this example setup:
Email Address: [email protected]

Password: 12341234


Kubeflow console


Then go to the Pipelines menu in the left menu bar. We’re going to run the most basic pipeline, "[Tutorial] DSL - Control structures":


Running the basic pipeline


Click on the pipeline’s name.


DSL - Control Structures


From here, click Create Experiment on the top right. This will create a new experiment since it's the first time it is running, but in subsequently you can re-use this experiment.


Create Experiment


And click on Start:

Starting a run


After the run is complete, explore the pipeline to verify that it ran successfully.


Verifying pipeline


Verifying pipeline

Kubeflow and MinIO for Multi-Cloud Machine Learning

This blog post taught you how to replace the MinIO that ships with Kubeflow with the MinIO Operator. You’re now prepared to take your Kubeflow use to the next level and back it with Kubernetes-native high performance and highly scalable MinIO object storage.


When it comes to Machine Learning pipelines and infrastructure, use MinIO's Lifecycle Management to deploy tenants backed by super fast NVMe drives as your hot tier for fast training and model serving, and  also set up a warm tier backed up by SSDs or HDDs for your aging datasets. MinIO does this transparently  without disrupting your applications. Tiering  is configured on a per-bucket basis or even for a single prefix within a bucket, providing  granular control over  which data gets moved to a slower tier.


With MinIO's Active-Active Replication, you can configure  buckets serving production machine learning models to be replicated instantly across multiple sites for disaster recovery and fast failover.


I truly hope this blog post helped you discover how easy it is to set up MinIO object storage on Kubernetes and to consume it with Kubeflow. If you have any questions, please join our Slack community and ask!


Also published here.