Skip to main content
05.05.2026

Kubernetes v1.36 Cuts Controller Staleness Risk

head-image

The official Kubernetes v1.36 controller staleness announcement is one of the more important release notes for operators who run busy clusters. Controller bugs are painful enough on their own. They are worse when the root cause is a cache that has fallen behind the API server.

What Changed in Kubernetes v1.36?

Kubernetes controllers rely on informer caches so they can reconcile quickly without hammering the API server. The catch is that a controller can sometimes make decisions using an outdated cache view, especially after restarts, watch interruptions, or heavy contention.

In v1.36, client-go adds AtomicFIFO queue handling plus LastStoreSyncResourceVersion() so controllers can reason about how fresh their local cache really is. On top of that, kube-controller-manager now uses the new logic in four high-churn controllers by default:

  • DaemonSet
  • StatefulSet
  • ReplicaSet
  • Job

If a controller sees that its cache has not yet caught up to what it already wrote to the API server, it can skip acting instead of reconciling from stale state.

Why SRE Teams Should Care

This is a reliability feature, not just an implementation detail. A stale controller can create extra churn, delay the right action, or take the wrong action altogether. In production, that can look like pods being recreated too aggressively, status lag that confuses on-call engineers, or rollout behavior that feels random under pressure.

Kubernetes v1.36 also adds better observability for this path. The new stale_sync_skips_total metric shows how often a controller avoids work because its cache is stale, and store_resource_version helps expose what each informer has actually seen.

Installation

There is no separate component to install. The feature is built into Kubernetes v1.36 and enabled by default for supported controllers. After upgrading, you can verify the new metrics from kube-controller-manager:

kubectl -n kube-system port-forward deploy/kube-controller-manager 10257:10257
curl -ks https://127.0.0.1:10257/metrics | grep -E 'stale_sync_skips_total|store_resource_version'

Operational Tips

Watch these metrics during controller-manager restarts, API server disruptions, and large deployment waves. If stale_sync_skips_total rises during cluster stress, that is often a good sign. It means the controller is waiting for fresher state instead of making a risky guess.

If you build custom controllers, the new ConsistencyStore pattern in client-go is worth studying too. This release is a strong hint that read-your-own-writes semantics are becoming a more practical default in Kubernetes.

Conclusion

Kubernetes v1.36 does not eliminate controller bugs, but it does reduce a nasty class of cache-related mistakes in some of the busiest control plane paths. For SRE teams, that is exactly the kind of low-drama improvement that pays off during real incidents.

If you are building reliable, AI-assisted operations, Akmatori helps teams automate infrastructure workflows and incident response. Backed by Gcore, we are building tools for modern SRE and platform teams.

Automate incident response and prevent on-call burnout with AI-driven agents!