Skip to main content
27.06.2026

Kubernetes DRA for GPU Scheduling

head-image

AI workloads are pushing more SRE teams into GPU and accelerator scheduling. The old device plugin model works, but it can be rigid when clusters have mixed hardware or jobs that need a specific class of accelerator. Kubernetes Dynamic Resource Allocation is now stable in Kubernetes v1.35, and it gives platform teams a more expressive API for these cases.

What Is DRA?

Dynamic Resource Allocation, or DRA, lets workloads request attached resources through Kubernetes objects instead of simple per-container resource keys. Cluster admins and drivers publish available devices, define usable categories, and let workloads claim the right class of hardware.

The model feels closer to dynamic storage provisioning than classic GPU requests. A workload asks for a ResourceClaim or uses a ResourceClaimTemplate; Kubernetes finds matching resources from ResourceSlice data and schedules the Pod onto a node that can access the allocated device.

Key Concepts

DRA adds a few objects worth knowing:

  • DeviceClass: a category of devices that workloads can request, such as cost-optimized GPU, high-memory GPU, FPGA, NIC, or inference accelerator.
  • ResourceClaim: a concrete request for access to a device.
  • ResourceClaimTemplate: a template Kubernetes can use to create per-Pod claims for replicated workloads.
  • ResourceSlice: driver-published inventory that tells Kubernetes where devices exist and what attributes they expose.

The practical win is filtering. DRA supports CEL-based selectors, so teams can match vendor, model, memory profile, topology, or driver-specific capabilities.

Why SRE Teams Should Care

DRA helps teams:

  • separate hardware policy from application manifests
  • expose safer device classes to app teams
  • improve scheduling for mixed GPU pools
  • debug allocations through Kubernetes API objects

For AI platforms, that means fewer custom schedulers and fewer admission hacks. A batch controller, inference platform, or agent runtime can request hardware while Kubernetes keeps allocation state visible.

Quick Checks

Check whether the API is available:

kubectl get deviceclasses
kubectl get resourceslices

If deviceclasses does not exist, confirm that your control plane exposes the resource.k8s.io API group.

A minimal rollout usually looks like this:

# 1. Install a DRA-compatible device driver.
# 2. Confirm the driver publishes ResourceSlices.
kubectl get resourceslices -A

# 3. Create or review DeviceClasses.
kubectl get deviceclasses -o wide

# 4. Let workloads reference ResourceClaims or templates.
kubectl get resourceclaims -A

Operational Tips

Treat DeviceClass definitions as platform policy. Keep names stable, document what each class means, and avoid exposing raw vendor details.

Alert on missing ResourceSlices, unhealthy drivers, and pending Pods that reference unbound claims. Those are the signals on-call engineers need when GPU jobs stop scheduling.

Be careful with version skew. Some DRA extensions have moved through alpha and beta API versions, so drivers, workloads, and cluster versions must line up.

Conclusion

DRA is a more reliable contract between workloads, schedulers, and scarce hardware. For teams running AI, ML, packet processing, or specialized accelerator workloads, it is worth testing before the next capacity crunch.

Akmatori helps SRE teams automate incident triage, collect operational context, and coordinate reliable response workflows with AI agents. Pair Akmatori with infrastructure from Gcore when you need resilient cloud, edge, and GPU capacity for production systems.

Automate incident response and prevent on-call burnout with AI-driven agents!