GPU Scheduling on Kubernetes Using Device Plugins

Kubernetes has become the de facto platform for orchestrating workloads on a cluster of nodes. Kubernetes has a powerful built-in scheduler that considers available resources such as CPU and memory, and schedules workloads based on their requested resource amounts on nodes that can meet the requests. While CPU and memory are required by all workloads, some workloads need other resources too, such as Network Interface Cards (NICs) for connecting to a particular network, or GPU for running inference. Making these hardware resources available to the nodes is not enough, users need to be able to view how many of these devices are available and how much can they request. Along with visibility for workload operators, even the Kubernetes scheduler needs to be able to view available vs allocatable capacity for these devices to make workload-scheduling decisions. One way of doing so is by extending the Kubernetes apiserver components. However, this doesn't scale well and isn't flexible if we want to continue adding more such devices to the nodes, or modify the behavior of an existing resource provider.

Hence, Kubernetes offers two frameworks for orchestrating workloads based on hardware device requests: Device Plugins and Dynamic Resource Allocation (DRA). This post focuses solely on Device Plugins. In a future article, I’ll go deeper into Dynamic Resource Allocation (DRA) and compare both models.

Device Plugins

Kubernetes has a concept of Kubelet plugins, that are extensions that allow the Kubelet to discover and manage resources on a node beyond standard container resources. Device Plugin is a kubelet plugin, which allows you to advertise hardware devices as resources.

How Device Plugins Work

A device plugin runs on every node, as a Daemonset, identifies devices on the node, and advertises these node-local devices. The Device Plugin framework follows a specific lifecycle:

Registration: The device plugin registers with the kubelet via a gRPC connection, sending its Unix socket path and the resource name it manages (e.g., nvidia.com/gpu)
Discovery: Following successful registration, the device plugin sends the kubelet a list of devices via the ListAndWatch() RPC call. This is a streaming connection that continuously reports device health status
Advertisement: The kubelet receives the device list and advertises these resources to the API server by updating the node's status fields (capacity and allocatable)
Allocation: When a pod requesting GPU resources is scheduled to the node, the kubelet calls the device plugin's Allocate() method to assign specific device IDs to the container

Once the API server gets the request to update node.status with device capacity, it saves those changes in etcd. The cluster users can then view the resource availability on node statuses while requesting resource through pod spec.

Installing NVIDIA GPU Device Plugin

Let’s install the NVIDIA GPU Device Plugin on a cluster with the following configuration:

NVIDIA H100 GPU-only node
2 CPU-only nodes

Kubernetes cluster details

➜ kubectl get nodes -o wide
NAME                   STATUS   ROLES    AGE    VERSION   INTERNAL-IP   EXTERNAL-IP       OS-IMAGE                       KERNEL-VERSION        CONTAINER-RUNTIME
pool-5g6iv5y25-syrfb   Ready    <none>   9m3s   v1.34.1   10.100.0.4    162.243.117.11    Debian GNU/Linux 13 (trixie)   6.12.48+deb13-amd64   containerd://1.7.28
pool-5g6iv5y25-syrfw   Ready    <none>   9m4s   v1.34.1   10.100.0.5    162.243.217.82    Debian GNU/Linux 13 (trixie)   6.12.48+deb13-amd64   containerd://1.7.28
pool-x7np0z8v0-syrfr   Ready    <none>   8m9s   v1.34.1   10.100.0.6    192.241.185.173   Debian GNU/Linux 13 (trixie)   6.12.48+deb13-amd64   containerd://1.7.28

Installing the NVIDIA device plugin

NVIDIA’s device plugin Daemonset can be installed on to the cluster with this command:

➜ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.1/deployments/static/nvidia-device-plugin.yml
daemonset.apps/nvidia-device-plugin-daemonset created

Since I created the Kubernetes cluster on Digital Ocean selecting a node with NVIDIA H100 GPU, the plugin was pre-installed.

This is the node with NVIDIA H100 GPU attached:

➜ kubectl get nodes --show-labels | grep gpu
pool-x7np0z8v0-syrfr   Ready    <none>   11m   v1.34.1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=gpu-h100x1-80gb,beta.kubernetes.io/os=linux,doks.digitalocean.com/gpu-brand=nvidia,doks.digitalocean.com/gpu-model=h100,doks.digitalocean.com/managed=true,doks.digitalocean.com/node-id=b2b789f1-e075-46a3-82db-226a26b34ddf,doks.digitalocean.com/node-pool-id=98da6c44-0cc5-462c-bf01-3a692f7dcdf8,doks.digitalocean.com/node-pool=pool-x7np0z8v0,doks.digitalocean.com/nvidia-dcgm-enabled=true,doks.digitalocean.com/version=1.34.1-do.0,failure-domain.beta.kubernetes.io/region=nyc2,kubernetes.io/arch=amd64,kubernetes.io/hostname=pool-x7np0z8v0-syrfr,kubernetes.io/os=linux,node.kubernetes.io/instance-type=gpu-h100x1-80gb,nvidia.com/gpu=1,region=nyc2,topology.kubernetes.io/region=nyc2

If we view the node’s status after installing the device plugin, we can see that the nvidia.com/gpu resource appears in the status.Capacity and status.Allocatable fields

➜ kubectl describe node pool-x7np0z8v0-syrfr | grep -A 10 "Capacity\|Allocatable"
Capacity:
  cpu:                20
  ephemeral-storage:  742911020Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             247414548Ki
  nvidia.com/gpu:     1
  pods:               110
Allocatable:
  cpu:                19850m
  ephemeral-storage:  684666794899
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             235611924Ki
  nvidia.com/gpu:     1
  pods:               110
System Info:
  Machine ID:                 d3a4687b27d344d7adb268f98ee8422e
  System UUID:                d3a4687b-27d3-44d7-adb2-68f98ee8422e

Testing GPU access with a workload

Now that the Device Plugin has identified and advertised the GPU, let’s test access to it by creating a pod that requests access to the GPU:

➜ cat <<EOF | kubectl apply -f -
pipe heredoc> apiVersion: v1
kind: Pod
metadata:
  name: gpu-test-device-plugin
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-test
    image: nvidia/cuda:11.0.3-base-ubuntu20.04
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: 1
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
pipe heredoc> EOF

pod/gpu-test-device-plugin created

Note the toleration for `nvidia.com/gpu` taint. This taint was applied by DigitalOcean Kubernetes cluster provisioner. It is important to schedule this workload with the toleration, otherwise the pod cannot be scheduled on GPU nodes.

Many managed clusters (GKE, EKS, AKS, DOKS) vary in how they apply taints, so it is important to consider the taints, nodeSelectors, and node affinity labels applied by managed cluster providers.

Let’s verify that the pod got scheduled correctly:

➜  device-plugin kubectl get po -o wide
NAME                     READY   STATUS      RESTARTS   AGE   IP            NODE                   NOMINATED NODE   READINESS GATES
gpu-test-device-plugin   0/1     Completed   0          10m   10.108.1.38   pool-x7np0z8v0-syrfr   <none>           <none>

This output shows the pod successfully getting scheduled on the GPU-attached node.

Now let’s verify that the pod ran successfully. The pod was made to run the following command-line utility for monitoring and managing NVIDIA GPUs:

nvidia-smi

We can check the logs of the pod to see if the command ran and discovered the GPU spec and utilization

➜ kubectl logs gpu-test-device-plugin
Mon Nov 24 02:09:29 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08              Driver Version: 575.57.08      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          On  |   00000000:00:09.0 Off |                    0 |
| N/A   31C    P0             71W /  700W |       0MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

And there we have it! The pod successfully ran on the GPU-attached node, as we can see the H100 GPU information in the pod logs 🎉

Device Plugin limitations

While Device Plugins offer a great mechanism for managing node-local hardware devices in a Kubernetes cluster, they have certain limitations due to which they cannot be used for every use case. One such limitation is the inability to share the devices among pods. This is because devices get requested and allocated as a whole unit. So if your use case requires sharing the same device among different pods, device plugins might not be the right choice.

Partial allocation of devices such as GPUs is possible, but only when the vendor exposes the partitions as separate, static devices (for example, NVIDIA MIG instances). In these cases, the device plugin simply advertises each MIG slice as if it were its own GPU. However, the plugin cannot create or modify these slices dynamically in response to pod requests. If your workload needs finer-grained sharing such as GPU time-slicing, dynamic MIG creation, configurable accelerator parameters, or any resource that must be provisioned per pod, Device Plugins cannot express those workflows. For such use cases, Kubernetes’ Dynamic Resource Allocation (DRA) framework is a better fit because it allows resources to be managed dynamically rather than treated as fixed, node-level hardware.

Conclusion

In this post, we explored why Kubernetes relies on kubelet extensions such as Device Plugins and DRA to expose hardware resources cleanly and flexibly. We walked through how the NVIDIA Device Plugin is installed and used in a real cluster, and reviewed scenarios where Device Plugins fall short. In a follow-up article, we’ll dive into Dynamic Resource Allocation (DRA) and discuss how it addresses many of these limitations while enabling more advanced accelerator orchestration on Kubernetes.