Kubernetes has become the de facto platform for orchestrating workloads on a cluster of nodes. Kubernetes has a powerful built-in scheduler that considers available resources such as CPU and memory, and schedules workloads based on their requested resource amounts on nodes that can meet the requests. While CPU and memory are required by all workloads, some workloads need other resources too, such as Network Interface Cards (NICs) for connecting to a particular network, or GPU for running inference. Making these hardware resources available to the nodes is not enough, users need to be able to view how many of these devices are available and how much can they request. Along with visibility for workload operators, even the Kubernetes scheduler needs to be able to view available vs allocatable capacity for these devices to make workload-scheduling decisions. One way of doing so is by extending the Kubernetes apiserver components. However, this doesn't scale well and isn't flexible if we want to continue adding more such devices to the nodes, or modify the behavior of an existing resource provider. Hence, Kubernetes offers two frameworks for orchestrating workloads based on hardware device requests: Device Plugins and Dynamic Resource Allocation (DRA). This post focuses solely on Device Plugins. In a future article, I’ll go deeper into Dynamic Resource Allocation (DRA) and compare both models. Device Plugins Kubernetes has a concept of Kubelet plugins, that are extensions that allow the Kubelet to discover and manage resources on a node beyond standard container resources. Device Plugin is a kubelet plugin, which allows you to advertise hardware devices as resources. How Device Plugins Work A device plugin runs on every node, as a Daemonset, identifies devices on the node, and advertises these node-local devices. The Device Plugin framework follows a specific lifecycle: Registration: The device plugin registers with the kubelet via a gRPC connection, sending its Unix socket path and the resource name it manages (e.g., nvidia.com/gpu) Discovery: Following successful registration, the device plugin sends the kubelet a list of devices via the ListAndWatch() RPC call. This is a streaming connection that continuously reports device health status Advertisement: The kubelet receives the device list and advertises these resources to the API server by updating the node's status fields (capacity and allocatable) Allocation: When a pod requesting GPU resources is scheduled to the node, the kubelet calls the device plugin's Allocate() method to assign specific device IDs to the container Registration: The device plugin registers with the kubelet via a gRPC connection, sending its Unix socket path and the resource name it manages (e.g., nvidia.com/gpu) Registration nvidia.com/gpu Discovery: Following successful registration, the device plugin sends the kubelet a list of devices via the ListAndWatch() RPC call. This is a streaming connection that continuously reports device health status Discovery ListAndWatch() Advertisement: The kubelet receives the device list and advertises these resources to the API server by updating the node's status fields (capacity and allocatable) Advertisement capacity allocatable Allocation: When a pod requesting GPU resources is scheduled to the node, the kubelet calls the device plugin's Allocate() method to assign specific device IDs to the container Allocation Allocate() Once the API server gets the request to update node.status with device capacity, it saves those changes in etcd. The cluster users can then view the resource availability on node statuses while requesting resource through pod spec. Installing NVIDIA GPU Device Plugin Let’s install the NVIDIA GPU Device Plugin on a cluster with the following configuration: NVIDIA H100 GPU-only node 2 CPU-only nodes NVIDIA H100 GPU-only node 2 CPU-only nodes Kubernetes cluster details ➜ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME pool-5g6iv5y25-syrfb Ready <none> 9m3s v1.34.1 10.100.0.4 162.243.117.11 Debian GNU/Linux 13 (trixie) 6.12.48+deb13-amd64 containerd://1.7.28 pool-5g6iv5y25-syrfw Ready <none> 9m4s v1.34.1 10.100.0.5 162.243.217.82 Debian GNU/Linux 13 (trixie) 6.12.48+deb13-amd64 containerd://1.7.28 pool-x7np0z8v0-syrfr Ready <none> 8m9s v1.34.1 10.100.0.6 192.241.185.173 Debian GNU/Linux 13 (trixie) 6.12.48+deb13-amd64 containerd://1.7.28 ➜ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME pool-5g6iv5y25-syrfb Ready <none> 9m3s v1.34.1 10.100.0.4 162.243.117.11 Debian GNU/Linux 13 (trixie) 6.12.48+deb13-amd64 containerd://1.7.28 pool-5g6iv5y25-syrfw Ready <none> 9m4s v1.34.1 10.100.0.5 162.243.217.82 Debian GNU/Linux 13 (trixie) 6.12.48+deb13-amd64 containerd://1.7.28 pool-x7np0z8v0-syrfr Ready <none> 8m9s v1.34.1 10.100.0.6 192.241.185.173 Debian GNU/Linux 13 (trixie) 6.12.48+deb13-amd64 containerd://1.7.28 Installing the NVIDIA device plugin NVIDIA’s device plugin Daemonset can be installed on to the cluster with this command: ➜ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.1/deployments/static/nvidia-device-plugin.yml daemonset.apps/nvidia-device-plugin-daemonset created ➜ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.1/deployments/static/nvidia-device-plugin.yml daemonset.apps/nvidia-device-plugin-daemonset created Since I created the Kubernetes cluster on Digital Ocean selecting a node with NVIDIA H100 GPU, the plugin was pre-installed. Since I created the Kubernetes cluster on Digital Ocean selecting a node with NVIDIA H100 GPU, the plugin was pre-installed. This is the node with NVIDIA H100 GPU attached: ➜ kubectl get nodes --show-labels | grep gpu pool-x7np0z8v0-syrfr Ready <none> 11m v1.34.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=gpu-h100x1-80gb,beta.kubernetes.io/os=linux,doks.digitalocean.com/gpu-brand=nvidia,doks.digitalocean.com/gpu-model=h100,doks.digitalocean.com/managed=true,doks.digitalocean.com/node-id=b2b789f1-e075-46a3-82db-226a26b34ddf,doks.digitalocean.com/node-pool-id=98da6c44-0cc5-462c-bf01-3a692f7dcdf8,doks.digitalocean.com/node-pool=pool-x7np0z8v0,doks.digitalocean.com/nvidia-dcgm-enabled=true,doks.digitalocean.com/version=1.34.1-do.0,failure-domain.beta.kubernetes.io/region=nyc2,kubernetes.io/arch=amd64,kubernetes.io/hostname=pool-x7np0z8v0-syrfr,kubernetes.io/os=linux,node.kubernetes.io/instance-type=gpu-h100x1-80gb,nvidia.com/gpu=1,region=nyc2,topology.kubernetes.io/region=nyc2 ➜ kubectl get nodes --show-labels | grep gpu pool-x7np0z8v0-syrfr Ready <none> 11m v1.34.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=gpu-h100x1-80gb,beta.kubernetes.io/os=linux,doks.digitalocean.com/gpu-brand=nvidia,doks.digitalocean.com/gpu-model=h100,doks.digitalocean.com/managed=true,doks.digitalocean.com/node-id=b2b789f1-e075-46a3-82db-226a26b34ddf,doks.digitalocean.com/node-pool-id=98da6c44-0cc5-462c-bf01-3a692f7dcdf8,doks.digitalocean.com/node-pool=pool-x7np0z8v0,doks.digitalocean.com/nvidia-dcgm-enabled=true,doks.digitalocean.com/version=1.34.1-do.0,failure-domain.beta.kubernetes.io/region=nyc2,kubernetes.io/arch=amd64,kubernetes.io/hostname=pool-x7np0z8v0-syrfr,kubernetes.io/os=linux,node.kubernetes.io/instance-type=gpu-h100x1-80gb,nvidia.com/gpu=1,region=nyc2,topology.kubernetes.io/region=nyc2 If we view the node’s status after installing the device plugin, we can see that the nvidia.com/gpu resource appears in the status.Capacity and status.Allocatable fields nvidia.com/gpu status.Capacity status.Allocatable ➜ kubectl describe node pool-x7np0z8v0-syrfr | grep -A 10 "Capacity\|Allocatable" Capacity: cpu: 20 ephemeral-storage: 742911020Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 247414548Ki nvidia.com/gpu: 1 pods: 110 Allocatable: cpu: 19850m ephemeral-storage: 684666794899 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 235611924Ki nvidia.com/gpu: 1 pods: 110 System Info: Machine ID: d3a4687b27d344d7adb268f98ee8422e System UUID: d3a4687b-27d3-44d7-adb2-68f98ee8422e ➜ kubectl describe node pool-x7np0z8v0-syrfr | grep -A 10 "Capacity\|Allocatable" Capacity: cpu: 20 ephemeral-storage: 742911020Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 247414548Ki nvidia.com/gpu: 1 pods: 110 Allocatable: cpu: 19850m ephemeral-storage: 684666794899 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 235611924Ki nvidia.com/gpu: 1 pods: 110 System Info: Machine ID: d3a4687b27d344d7adb268f98ee8422e System UUID: d3a4687b-27d3-44d7-adb2-68f98ee8422e Testing GPU access with a workload Now that the Device Plugin has identified and advertised the GPU, let’s test access to it by creating a pod that requests access to the GPU: ➜ cat <<EOF | kubectl apply -f - pipe heredoc> apiVersion: v1 kind: Pod metadata: name: gpu-test-device-plugin spec: restartPolicy: OnFailure containers: - name: cuda-test image: nvidia/cuda:11.0.3-base-ubuntu20.04 command: ["nvidia-smi"] resources: limits: nvidia.com/gpu: 1 tolerations: - key: nvidia.com/gpu operator: Exists effect: NoSchedule pipe heredoc> EOF pod/gpu-test-device-plugin created ➜ cat <<EOF | kubectl apply -f - pipe heredoc> apiVersion: v1 kind: Pod metadata: name: gpu-test-device-plugin spec: restartPolicy: OnFailure containers: - name: cuda-test image: nvidia/cuda:11.0.3-base-ubuntu20.04 command: ["nvidia-smi"] resources: limits: nvidia.com/gpu: 1 tolerations: - key: nvidia.com/gpu operator: Exists effect: NoSchedule pipe heredoc> EOF pod/gpu-test-device-plugin created Note the toleration for `nvidia.com/gpu` taint. This taint was applied by DigitalOcean Kubernetes cluster provisioner. It is important to schedule this workload with the toleration, otherwise the pod cannot be scheduled on GPU nodes. Many managed clusters (GKE, EKS, AKS, DOKS) vary in how they apply taints, so it is important to consider the taints, nodeSelectors, and node affinity labels applied by managed cluster providers. Note the toleration for `nvidia.com/gpu` taint. This taint was applied by DigitalOcean Kubernetes cluster provisioner. It is important to schedule this workload with the toleration, otherwise the pod cannot be scheduled on GPU nodes. Many managed clusters (GKE, EKS, AKS, DOKS) vary in how they apply taints, so it is important to consider the taints, nodeSelectors, and node affinity labels applied by managed cluster providers. Let’s verify that the pod got scheduled correctly: ➜ device-plugin kubectl get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES gpu-test-device-plugin 0/1 Completed 0 10m 10.108.1.38 pool-x7np0z8v0-syrfr <none> <none> ➜ device-plugin kubectl get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES gpu-test-device-plugin 0/1 Completed 0 10m 10.108.1.38 pool-x7np0z8v0-syrfr <none> <none> This output shows the pod successfully getting scheduled on the GPU-attached node. Now let’s verify that the pod ran successfully. The pod was made to run the following command-line utility for monitoring and managing NVIDIA GPUs: nvidia-smi nvidia-smi We can check the logs of the pod to see if the command ran and discovered the GPU spec and utilization ➜ kubectl logs gpu-test-device-plugin Mon Nov 24 02:09:29 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA H100 80GB HBM3 On | 00000000:00:09.0 Off | 0 | | N/A 31C P0 71W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ➜ kubectl logs gpu-test-device-plugin Mon Nov 24 02:09:29 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA H100 80GB HBM3 On | 00000000:00:09.0 Off | 0 | | N/A 31C P0 71W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ And there we have it! The pod successfully ran on the GPU-attached node, as we can see the H100 GPU information in the pod logs 🎉 Device Plugin limitations While Device Plugins offer a great mechanism for managing node-local hardware devices in a Kubernetes cluster, they have certain limitations due to which they cannot be used for every use case. One such limitation is the inability to share the devices among pods. This is because devices get requested and allocated as a whole unit. So if your use case requires sharing the same device among different pods, device plugins might not be the right choice. Partial allocation of devices such as GPUs is possible, but only when the vendor exposes the partitions as separate, static devices (for example, NVIDIA MIG instances). In these cases, the device plugin simply advertises each MIG slice as if it were its own GPU. However, the plugin cannot create or modify these slices dynamically in response to pod requests. If your workload needs finer-grained sharing such as GPU time-slicing, dynamic MIG creation, configurable accelerator parameters, or any resource that must be provisioned per pod, Device Plugins cannot express those workflows. For such use cases, Kubernetes’ Dynamic Resource Allocation (DRA) framework is a better fit because it allows resources to be managed dynamically rather than treated as fixed, node-level hardware. Conclusion In this post, we explored why Kubernetes relies on kubelet extensions such as Device Plugins and DRA to expose hardware resources cleanly and flexibly. We walked through how the NVIDIA Device Plugin is installed and used in a real cluster, and reviewed scenarios where Device Plugins fall short. In a follow-up article, we’ll dive into Dynamic Resource Allocation (DRA) and discuss how it addresses many of these limitations while enabling more advanced accelerator orchestration on Kubernetes.