Datadog Pup: CLI for AI-Driven SRE Workflows

AI agents are becoming a real interface for operations work. That changes what good tooling looks like. A CLI built for humans alone is not enough when automation needs discoverable commands, structured output, scoped authentication, and broad API coverage. Datadog Pup is an interesting step in that direction.
Pup is an open-source Rust CLI from Datadog that exposes a large slice of the Datadog platform through a command line interface that works well for both humans and agents. At the time of writing, the project is in preview and advertises 320+ subcommands across 56 command groups, covering monitors, logs, metrics, incidents, workflows, audit logs, cloud integrations, CI visibility, and more.
Why Pup matters for SRE teams
Most operational CLIs were designed around manual use. They are often inconsistent, hard to introspect, and frustrating to parse inside automated workflows. That becomes a problem when you want to let an AI assistant investigate a production incident, gather metrics, search recent errors, check on-call state, and trigger a workflow without hand-writing brittle API wrappers.
Pup targets exactly that gap.
According to the project README, the CLI focuses on four properties that matter for agentic operations:
- Self-discoverable commands
- Structured JSON and YAML output
- Scoped OAuth2 authentication with PKCE
- Broad coverage across Datadog product domains
Those details matter because they reduce glue code. Instead of stitching together ad hoc REST calls, an SRE automation layer can call a stable CLI surface and keep the result machine readable.
What you can do with it
The most obvious use case is investigation automation. For example, an agent or runbook can list monitors for a team, search logs for recent errors, query metrics, and then correlate that data with incidents or change events.
Typical examples look like this:
pup monitors list --tags="team:api-platform"
pup logs search --query="status:error service:payments" --from="1h"
pup metrics query --query="avg:system.cpu.user{service:payments}"
pup incidents list
That is not revolutionary on its own. The useful part is that the commands are discoverable and designed to be consumed programmatically. If you are building AI-assisted operations, that is a much better starting point than screen-scraping a web UI or maintaining dozens of handwritten API clients.
Strong coverage where SRE work actually happens
Pup is not a toy wrapper around one or two APIs. The project already spans core observability and operational workflows, including:
- Monitors, dashboards, SLOs, and synthetics
- Logs, events, RUM, and APM service queries
- Incidents, on-call teams, investigations, and workflows
- Audit logs and security monitoring
- AWS, Azure, GCP, and OCI integrations
- CI/CD visibility, test data, and DORA metrics
That breadth matters because incidents rarely stay in one product surface. A real investigation crosses metrics, logs, deployments, ownership, and recent changes. A CLI that covers only one slice of the platform forces your automation to stop early.
Authentication is a bigger deal than it sounds
One of the better design choices in Pup is its preference for OAuth2 with PKCE, plus secure token storage in the local keychain or secret service. That is much healthier than spraying long-lived API keys across scripts, CI jobs, and agent sandboxes.
For AI-enabled workflows, scoped auth is not optional. If you are going to let agents touch observability and incident tooling, you want clear boundaries around what they can access and how credentials are stored. Pup does not solve every governance problem, but it points in the right direction.
Where it still needs maturity
The project is explicitly marked as preview software. Some Datadog domains are still missing, and coverage is uneven across the platform. For example, the README notes that traces, profiling, containers, and some other areas are not yet implemented.
That means Pup is promising, but it is not yet a universal replacement for the Datadog API or UI. Teams should treat it as a fast-moving operational interface, validate the specific commands they need, and expect some rough edges.
How SRE teams can use it today
If you already centralize operations in Datadog, Pup is worth evaluating for three practical scenarios:
1. AI-assisted investigation
Use Pup as a controlled execution layer for an internal AI agent. Let the agent search logs, inspect monitors, query metrics, and pull incident context through explicit commands rather than unrestricted API access.
2. Runbook automation
Replace one-off shell scripts that call raw APIs with a consistent CLI interface. This makes runbooks easier to read, test, and reuse.
3. ChatOps and operator tooling
A bot can take a Slack or Telegram command, run a small set of Pup queries, and return the result with much less custom integration code.
Installation and quick start
On macOS or Linux, installation via Homebrew is straightforward:
brew tap datadog-labs/pack
brew install datadog-labs/pack/pup
You can also build from source:
git clone https://github.com/DataDog/pup.git
cd pup
cargo build --release
cp target/release/pup /usr/local/bin/pup
Then authenticate and inspect the environment:
export DD_SITE="datadoghq.com"
pup auth login
pup auth status
pup monitors list
Final take
Pup is one of the more practical signs that observability vendors are adapting to AI-native operations. The real story is not just that Datadog shipped another CLI. It is that the CLI is being shaped as an interface for agents, not only for humans.
That is a meaningful shift for DevOps and SRE teams. If your future workflows include AI-assisted triage, automated postmortem prep, or guided incident response, tools like Pup will matter more than another dashboard widget.
It is still early, but this is exactly the kind of project worth tracking.
At Akmatori, we build open-source AI agents for SRE and DevOps teams. Learn more about our platform, powered by Gcore cloud infrastructure.
