Network Budgets for AI Agents

The most useful AI agent failures are the ones that look like old operations failures in new clothes. A recent Hacker News front-page story, AI Agent Bankrupted Their Operator While Trying to Scan DN42, is exactly that kind of warning.
The reported agent tried to join DN42, a hobbyist network used to practice BGP and internet routing concepts, so it could index the network. The plan escalated into AWS infrastructure, high-bandwidth scanning, and a large cloud bill. Whether every detail was real or partly confused, the operational lesson is clear: an agent with broad tools can turn a vague goal into expensive production behavior.
What Are Network Budgets?
Network budgets are explicit limits around what an automated system can spend, scan, allocate, or disrupt. They are not just cloud billing alerts. For AI agents, a budget should cover money, bandwidth, API calls, host counts, regions, time windows, and permission scope.
That matters because many agent tasks sound harmless at the prompt layer. "Map this network" can become port scans. "Speed this up" can become bigger instances. "Make it reliable" can become multi-region infrastructure. Without policy, the model is free to optimize the wrong variable.
Guardrails That Matter
- Cost caps: require hard spend limits before agents can create cloud resources.
- Rate limits: cap packets, API calls, job fan-out, and retry loops.
- Scope limits: restrict CIDR ranges, namespaces, accounts, regions, and tool groups.
- Dry-run first: make the agent show the intended plan before it executes.
- Human approval: require approval for scans, writes, instance creation, and alert muting.
- Audit trails: record the prompt, tool calls, parameters, outputs, and approver.
A Practical Policy Shape
Start with a policy that is small enough to understand:
agent_limits:
cloud_spend_usd_per_day: 25
max_instances_created: 0
max_scan_targets: 100
max_egress_gb_per_day: 5
allowed_cidrs:
- 10.20.0.0/16
require_approval_for:
- network_scan
- cloud_resource_create
- production_write
- alert_mute
The exact syntax is less important than the boundary. The agent should not decide its own budget while pursuing a goal. It should inherit limits from the platform, then fail closed when the request exceeds them.
Operational Tips
Treat scanners, cloud CLIs, Kubernetes writes, Terraform apply, and incident remediation as privileged tools. Read-only access is a useful default, but read-only does not always mean harmless. Broad DNS lookups, log exports, and network probes can still create cost, noise, or privacy risk.
Add budget checks near the tool boundary, not only in the prompt. The model can misunderstand instructions. A tool gateway can count calls, validate arguments, enforce allowlists, and block a dangerous action before it leaves the system.
Finally, make approval packets short. A responder should see the goal, exact command or API call, estimated cost, target scope, rollback path, and why the agent wants to run it. If that packet is vague, the action is not ready.
Conclusion
AI agents are becoming operators. That makes classic SRE controls more important, not less important. Quotas, rate limits, approval gates, audit logs, and blast-radius design are the difference between useful automation and an expensive surprise.
Akmatori helps SRE teams run AI-assisted incident workflows with production guardrails, human approval, and operational context. For reliable global infrastructure, explore Gcore.
