Skip to main content
22.06.2026

AI Agent Logs Can Fill Disks Fast

head-image

AI agents are becoming normal developer and operations tools. They stream model output, call tools, watch files, write telemetry, and keep local state. That makes their logging behavior part of your reliability surface.

A trending Hacker News thread pointed to a Codex issue where local SQLite feedback logs were reported as the main continuous writer on one machine. The report estimated about 37 TB written after 21 days of uptime, with retained rows hiding a much larger insert-and-prune workload.

The specific bug belongs to one project, but the lesson is broader: agent runtimes need the same log hygiene as any production service.

Why SRE Teams Should Care

Disk exhaustion is not a glamorous incident, but it is still an incident. A background agent that fills a workstation, CI runner, bastion host, or shared operations VM can break deploys, stop monitoring scripts, corrupt local databases, and create noisy secondary failures.

The risk grows when agents run for long sessions. A CLI that looks idle may still be writing traces, OpenTelemetry mirrors, raw protocol payloads, SQLite WAL files, or tool feedback records. If the default level is too verbose, one process can create a steady write stream for days.

What Went Wrong

The public report highlights several common failure modes:

  • Trace-level persistence by default: low-value dependency logs dominated retained bytes.
  • Raw protocol payload logging: websocket and SSE traces created large records.
  • Telemetry mirroring: duplicated OpenTelemetry-style events added volume without much operator value.
  • SQLite write amplification: rows were inserted, indexed, written to WAL, then pruned.
  • Missing global caps: per-thread limits did not stop total database growth or write churn.

These are not exotic problems. They are ordinary logging problems amplified by agent workloads.

Practical Guardrails

Treat local agent state as managed operational data:

# Watch agent log and database growth
du -sh ~/.codex ~/.openclaw 2>/dev/null
find ~/.codex ~/.openclaw -type f -size +500M -ls 2>/dev/null

# Check recent write-heavy files
find ~/.codex ~/.openclaw -type f -mmin -30 -printf '%TY-%Tm-%Td %TH:%TM %s %p\n' 2>/dev/null | sort -k3 -nr | head

For team environments, go further:

  • Put agent home directories on filesystems with quotas.
  • Rotate or vacuum local SQLite logs on a schedule.
  • Alert on disk usage, inode pressure, and high write rates.
  • Default to info or warn logging outside short debug sessions.
  • Disable raw payload persistence unless a responder explicitly enables it.
  • Store summaries for model streams: event type, duration, status, token counts, and byte length.
  • Document where agent state lives so responders know what can be cleaned.

Safer Defaults for Agent Platforms

Agent platforms should ship with boring limits. That means global log caps, bounded WAL behavior, retention by age and size, redaction, and a simple switch to disable optional feedback logs.

The best debug mode is temporary and obvious. If an SRE enables trace logs, the UI or CLI should show that state, warn about disk impact, and make it easy to return to normal. Long-lived trace logging should be treated like leaving packet capture running on a busy interface.

Conclusion

AI agents do not get a pass on operational discipline. They need quotas, retention, redaction, observability, and safe defaults before they become part of on-call workflows.

Akmatori helps SRE teams automate infrastructure operations with AI agents built for real production workflows. For reliable cloud and edge infrastructure, check out Gcore.

Automate incident response and prevent on-call burnout with AI-driven agents!