Skip to main content
28.04.2026

Kubernetes v1.36 Improves Suspended Job Scheduling

head-image

The Kubernetes v1.36 release includes mutable pod resources for suspended Jobs, a beta feature that matters to any SRE team running batch, CI, or machine learning workloads. It removes a frustrating limitation in the Job API and makes queue-based scheduling much easier to automate.

What Is Mutable Pod Resources for Suspended Jobs?

Before this change, a Job's pod resource requests and limits were effectively locked in once the Job was created. If a queue controller later decided the workload should run with fewer CPUs, less memory, or fewer GPUs, the usual workaround was deleting and recreating the Job.

Kubernetes v1.36 relaxes that rule for suspended Jobs. While spec.suspend: true is set, controllers and operators can update resource requests and limits in both containers and init containers. That means you can create a Job early, keep its metadata and status intact, then tune the pod shape right before execution.

Why SRE Teams Should Care

This is especially useful for clusters with bursty batch traffic or scarce accelerators. Tools like Kueue can admit a suspended Job, inspect actual cluster conditions, and reduce or raise resource requests before resuming it.

The result is better utilization and less operational glue. Instead of rebuilding Jobs when capacity shifts, teams can adapt the original object in place. That is cleaner for audits, safer for automation, and easier to integrate with platform workflows that track Job history.

Installation

If your cluster runs Kubernetes v1.36, the MutablePodResourcesForSuspendedJobs feature gate is enabled by default. On v1.35, you must enable the feature gate on the API server before testing it.

Usage

A simple pattern is to create a Job in a suspended state, then adjust resources before resuming it.

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job
spec:
  suspend: true
  template:
    spec:
      containers:
        - name: trainer
          image: example.com/trainer:latest
          resources:
            requests:
              cpu: "4"
              memory: "16Gi"
            limits:
              cpu: "4"
              memory: "16Gi"
      restartPolicy: Never

After that, a controller or operator can update the resource block and resume the Job with:

kubectl patch job training-job -p '{"spec":{"suspend":false}}'

Operational Tips

Keep in mind that Kubernetes only accepts these mutations while the Job is suspended. If you suspend a Job after it has already started, all active Pods must fully terminate before the API will accept resource changes. Also note that DRA resourceClaimTemplates stay immutable, so specialized hardware workflows may still need extra handling.

Conclusion

Mutable pod resources for suspended Jobs is the kind of feature SRE teams appreciate because it removes friction from real production scheduling. It helps batch controllers make smarter placement decisions without forcing a delete-and-recreate cycle every time capacity changes.

If you are building reliable, AI-assisted operations, Akmatori helps teams automate infrastructure workflows and incident response. Backed by Gcore, we are building tools for modern SRE and platform teams.

Automate incident response and prevent on-call burnout with AI-driven agents!