Skip to main content
21.05.2026

Docker Sandboxes for Agent Isolation

head-image

AI coding agents need to install packages, run tests, edit files, and execute unknown code. That is exactly the type of workload SRE teams should treat as untrusted, even when the agent is helping a trusted developer.

That is why Rivet's reverse engineering writeup on Docker Sandbox's microVM API is worth reading. Docker's public Sandbox feature focuses on safe agent execution, but the underlying design points toward a broader platform pattern: familiar Docker ergonomics wrapped around stronger VM-level boundaries.

What Are Docker Sandboxes?

Docker Sandboxes provide isolated, ephemeral environments for AI agents. Instead of relying only on Linux namespaces and a shared host kernel, Docker Desktop runs sandbox workloads inside microVMs on macOS and Windows.

That distinction matters. Containers are excellent for packaging and deployment, but they are not a complete trust boundary for arbitrary code. A container escape, kernel bug, overbroad socket mount, or leaked credential can turn a local automation task into a host compromise.

MicroVMs reduce that blast radius by giving each sandbox a separate kernel boundary while keeping startup and workflow overhead lower than a traditional virtual machine.

Why SRE Teams Should Care

Agent infrastructure is becoming part of the production toolchain. A coding agent may touch deployment manifests, CI scripts, incident runbooks, Terraform, database migrations, and credentials used by local developer tools.

That creates a new operational checklist:

  • isolate agent execution from the host and from other tenants
  • scope network access through policy, not trust
  • sync workspaces deliberately instead of mounting broad host paths
  • clean up sessions even when agents crash or hang
  • log enough activity to review what the agent changed and executed

Docker Sandboxes make this pattern easier to reason about because the operator can keep Docker-like workflows while changing the isolation model beneath them.

The Undocumented API Angle

Rivet found that Docker's sandbox daemon listens on a local Unix socket and exposes primitives to list, create, and destroy microVMs. Each VM gets its own Docker daemon socket, so containers inside one sandbox do not share the ordinary host Docker socket.

For platform engineers, the key lesson is not to depend on an undocumented API in production without caution. The useful lesson is architectural: agent runtime, container runtime, network egress, workspace sync, and cleanup need to be explicit resources.

Treat a sandbox like a short-lived compute unit with a lifecycle, not like a glorified docker run.

Operational Tips

Start by separating use cases. Local developer experimentation, CI execution, customer plugin hosting, and production remediation agents have different trust levels. Do not give them the same sandbox policy.

For agent workloads, define:

  • maximum runtime and idle timeout
  • allowed outbound domains
  • workspace paths that can sync into the sandbox
  • secret injection rules
  • artifact capture for logs, diffs, and test output
  • cleanup behavior for failed sessions

Also remember the current platform limits. Docker Sandboxes require Docker Desktop and are currently aimed at macOS and Windows workflows. Linux server-side agent fleets may still need Firecracker, Kata Containers, gVisor, or a managed sandbox provider.

Conclusion

Docker Sandboxes are part of a larger trend: AI agent execution is pushing teams to revisit isolation, network policy, and runtime lifecycle design. Containers remain the workflow layer, but untrusted code needs a stronger boundary.

If your agents can run commands, install packages, or touch infrastructure code, treat the sandbox as production infrastructure. The safer default is to assume the agent will eventually execute something surprising and build the runtime so the surprise stays contained.

Looking to automate infrastructure operations? Akmatori helps SRE teams reduce toil with AI agents built for real production workflows. For reliable global infrastructure, check out Gcore.

Automate incident response and prevent on-call burnout with AI-driven agents!