Skip to main content
25.06.2026

kagent: Kubernetes-Native AI Agents

head-image

AI agents are moving from local experiments into shared operations platforms. kagent is worth watching because it treats agents as Kubernetes-native workloads instead of a separate hosted control plane. That matters for platform teams that already use kubectl, RBAC, Helm, GitOps, Prometheus, and OpenTelemetry to manage production systems.

The project describes itself as a Kubernetes-native framework for building and managing AI agents. Its current positioning is straightforward: agents, tools, and model configuration should be declarative resources that can be reviewed, rolled out, traced, and constrained.

What Is kagent?

kagent defines agents as Kubernetes custom resources. An agent combines a system prompt, tools, other agents, and an LLM configuration. Tool servers are also Kubernetes resources, which means the same tool definition can be reused across multiple agents.

The project supports multiple LLM providers, including OpenAI, Azure OpenAI, Anthropic, Google Vertex AI, Ollama, and custom providers behind AI gateways. It also includes MCP tool integrations for Kubernetes, Istio, Helm, Argo, Prometheus, Grafana, Cilium, and related cloud-native systems.

Key Features

  • Agents as CRDs: define operational agents in YAML, review them in pull requests, and roll them out with GitOps.
  • MCP tool servers: expose Kubernetes, observability, deployment, and platform tools through reusable tool resources.
  • OpenTelemetry tracing: inspect prompts, tool calls, latency, and agent behavior in the same telemetry pipeline as the rest of the stack.
  • Human approval gates: keep high-impact actions behind explicit operator review instead of letting automation jump straight to remediation.
  • Bring your own stack: run different LLM providers and agent frameworks without forcing every team into one vendor path.

Installation

For a quick local evaluation, kagent documents a bootstrap script:

curl https://raw.githubusercontent.com/kagent-dev/kagent/refs/heads/main/scripts/get-kagent | bash

The project lists kind, helm, and kubectl as prerequisites for the quick start. Production teams should translate that into a reviewed Helm release, a pinned chart version, scoped namespaces, and explicit RBAC before letting agents touch incident systems.

Operational Workflow

A practical first agent should be read-only. Give it access to Kubernetes events, Prometheus queries, Grafana links, and runbook search. Then ask it to assemble incident context, not execute fixes.

apiVersion: kagent.dev/v1alpha1
kind: Agent
metadata:
  name: incident-triage
spec:
  tools:
    - kubernetes-readonly
    - prometheus-query
    - runbook-search
  approvals:
    requiredFor:
      - create_pull_request
      - restart_workload
      - update_helm_release

That pattern gives operators a safe path to evaluate usefulness. The agent can gather evidence, build a timeline, and suggest next steps while humans keep control over changes.

Operational Tips

Treat agent configuration like production infrastructure. Pin model providers, separate read-only and write-capable tools, and keep destructive actions behind approval. Export OpenTelemetry traces from day one so you can answer which prompt called which tool, how long it took, and what evidence supported the recommendation.

Also test agents against old incidents. Replay known alerts and compare agent findings with the postmortem. If it misses key context, fix the tools, prompts, or runbooks before expanding access.

Conclusion

kagent is interesting because it pulls AI operations back into the systems SRE teams already understand: Kubernetes resources, Git review, RBAC, observability, and controlled rollout. That is the right shape for production agent work.

If your team wants AI-assisted incident workflows with strong operational guardrails, Akmatori helps SRE teams detect, explain, and respond to production issues with agents built for real infrastructure. Akmatori runs on Gcore infrastructure for reliable global performance.

Automate incident response and prevent on-call burnout with AI-driven agents!