What is AI-powered incident management?

AI-powered incident management uses intelligent agents to automate incident response tasks like alert triage, runbook execution, root cause analysis, and team notifications. Instead of relying solely on on-call engineers, AI agents can handle initial diagnosis and remediation steps 24/7, reducing MTTR and on-call burnout.

Is Akmatori open source?

Yes, Akmatori is an Apache 2.0 open-source project. You can inspect the code, run it on your own infrastructure using Docker or Kubernetes, and keep complete control over your data and incident workflows.

What LLM providers does Akmatori support?

Akmatori supports multiple LLM providers including OpenAI (GPT-4), Anthropic (Claude), Google (Gemini), and OpenRouter. You can also use on-premise models like Mistral, GLM, Kimi, or Minimax for data sovereignty requirements.

How does Akmatori integrate with PagerDuty?

Akmatori integrates with PagerDuty via webhooks and the PagerDuty API. When an incident is triggered, Akmatori's AI agents can automatically acknowledge alerts, gather context from your observability stack, execute diagnostic runbooks, and post updates to incident channels.

Can Akmatori reduce on-call burnout?

Yes. Akmatori handles routine incidents autonomously, filters alert noise, and only escalates to human engineers when necessary. This significantly reduces the number of pages during off-hours and allows SRE teams to focus on high-impact work instead of repetitive troubleshooting.

What observability tools does Akmatori work with?

Akmatori integrates with popular observability tools including Prometheus, Grafana, Datadog, New Relic, Splunk, and CloudWatch. AI agents can query metrics, logs, and traces to diagnose issues automatically.

18.03.2026

pg_stat_ch: Stream Every PostgreSQL Query to ClickHouse

pg_stat_ch

Quick Reference:

# Clone and build
git clone https://github.com/ClickHouse/pg_stat_ch
cd pg_stat_ch && make && make install

# Enable in postgresql.conf
shared_preload_libraries = 'pg_stat_ch'
pg_stat_ch.clickhouse_host = 'clickhouse.example.com'
pg_stat_ch.clickhouse_database = 'postgres_events'

# Restart PostgreSQL
systemctl restart postgresql

ClickHouse just open-sourced pg_stat_ch, a PostgreSQL extension that captures every query execution as a fixed-size event (~4.6KB) and streams it to ClickHouse. This gives you APM-level observability for PostgreSQL without the overhead of traditional monitoring.

Why This Matters

Traditional PostgreSQL monitoring with pg_stat_statements gives you aggregated stats. You see total execution time and call counts, but you lose the individual events. When something goes wrong at 2:47 PM, you can't drill into exactly which queries ran during that minute.

pg_stat_ch changes this. Every SELECT, INSERT, UPDATE, DDL statement, and even failed queries become queryable events in ClickHouse. You get:

p50 to p99 latency over any time window
Top queries by runtime with full context
Error patterns by application, user, or database
Historical comparison across days or months

How It Works

The architecture is elegantly simple:

PostgreSQL Backend → Ring Buffer → Background Worker → ClickHouse
     (memcpy)         (shared)        (1s flush)      (native protocol)

Hot path is fast: Each query completion triggers a memcpy into a shared-memory ring buffer. No network calls, no blocking.
Background worker handles export: Once per second, a worker drains up to 10,000 events, compresses them with LZ4, and sends to ClickHouse via native binary protocol.
ClickHouse does the heavy lifting: Raw events land in events_raw. Materialized views pre-aggregate into dashboards you can query immediately.

Event Schema

Each event captures 45 fields:

Category	Fields
Timing	execution time, planning time, lock wait time
I/O	shared/local buffer hits and reads, temp blocks
WAL	records, FPI, bytes
CPU	user time, system time (via getrusage)
JIT	functions, generation/optimization/emission time
Context	database, user, application_name, client_addr
Query	query_id fingerprint, truncated text (2KB default)

Pre-Built Materialized Views

pg_stat_ch ships with four materialized views in ClickHouse:

-- Rolling 1-hour window for real-time monitoring
SELECT * FROM events_recent_1h
WHERE query_id = 12345678;

-- 5-minute buckets with percentiles
SELECT
    toStartOfFiveMinutes(event_time) AS bucket,
    quantilesTDigest(0.50, 0.95, 0.99)(total_exec_time) AS latency_pcts
FROM query_stats_5m
WHERE datname = 'production'
GROUP BY bucket;

-- Per-database/app/user load attribution
SELECT
    datname,
    application_name,
    sum(calls) AS total_queries,
    sum(total_exec_time) AS total_time_ms
FROM db_app_user_1m
WHERE event_time > now() - INTERVAL 1 HOUR
GROUP BY datname, application_name
ORDER BY total_time_ms DESC;

-- Error analysis with full context
SELECT
    error_message,
    count(*) AS occurrences,
    groupArray(10)(query_text) AS sample_queries
FROM errors_recent
WHERE event_time > now() - INTERVAL 1 DAY
GROUP BY error_message
ORDER BY occurrences DESC;

Installation

Requirements: PostgreSQL 16, 17, or 18.

# Build from source
git clone https://github.com/ClickHouse/pg_stat_ch
cd pg_stat_ch
make
sudo make install

# Or use Docker for quick testing
docker-compose up -d

Configure in postgresql.conf:

shared_preload_libraries = 'pg_stat_ch'

# ClickHouse connection
pg_stat_ch.clickhouse_host = 'clickhouse.example.com'
pg_stat_ch.clickhouse_port = 9000
pg_stat_ch.clickhouse_database = 'postgres_events'
pg_stat_ch.clickhouse_user = 'default'
pg_stat_ch.clickhouse_password = 'secret'

# Tuning
pg_stat_ch.queue_capacity = 65536  # events in ring buffer
pg_stat_ch.flush_interval = 1000   # ms between flushes
pg_stat_ch.batch_size = 10000      # max events per flush

Restart PostgreSQL:

systemctl restart postgresql

Memory Footprint

The ring buffer uses predictable memory:

queue_capacity × event_size = total memory
65,536 × 4.6 KB = ~294 MB (default)

You can tune this based on your query rate and acceptable memory overhead.

Design Decisions for SREs

No Back-Pressure

If ClickHouse slows down or goes offline, PostgreSQL keeps running at full speed. Events are dropped rather than queued indefinitely.

Why? For an OLTP system doing 50K QPS, back-pressure adding 10ms stalls would push p99 from 5ms to 15ms. That breaks SLOs. Observability should observe, not obstruct.

The dropped counter is visible via pg_stat_ch_stats() so you can monitor overflow.

Fixed-Size Events

Query texts longer than 2KB get truncated. You still get the query_id fingerprint and all 45 metrics. For full text, look up by query_id in pg_stat_statements.

The tradeoff is predictable memory usage. Variable-length events would require dynamic allocation in the hot path.

Cache-Line Aligned Ring Buffer

The ring buffer separates producer-hot and consumer-hot fields onto different cache lines. This eliminates false sharing when many backends write concurrently.

Per-Transaction Batching

Under high concurrency, backends batch events locally and flush once per transaction. This reduces lock acquisitions from ~150K/s to ~30K/s on TPC-B workloads.

Overhead

The extension adds overhead primarily from:

memcpy into ring buffer (fast)
LWLock acquisition (minimized via batching)
Name resolution (get_database_name, GetUserNameFromId)
getrusage() for CPU timing

On typical OLTP workloads, expect 1-3% overhead. The team is still hardening for production but actively driving toward deployment in ClickHouse's managed Postgres offering.

Comparison with Alternatives

Feature	pg_stat_statements	pg_stat_monitor	pg_stat_ch
Individual events	No	No	Yes
External storage	No	No	Yes (ClickHouse)
Historical drill-down	Limited	Limited	Full
Query plans	No	Yes	Not yet
Overhead	Low	Medium	Low-Medium
Failed queries	No	Yes	Yes

pg_stat_ch complements rather than replaces pg_stat_statements. Use both: statements for quick aggregates, pg_stat_ch for deep analysis.

Use Cases

Incident Investigation

"What changed between 2 PM and 3 PM?"

SELECT
    toStartOfMinute(event_time) AS minute,
    count(*) AS queries,
    avg(total_exec_time) AS avg_latency,
    quantile(0.99)(total_exec_time) AS p99
FROM events_raw
WHERE event_time BETWEEN '2026-03-18 14:00:00' AND '2026-03-18 15:00:00'
GROUP BY minute
ORDER BY minute;

Regression Detection

Compare query performance before and after a deployment:

WITH before AS (
    SELECT query_id, avg(total_exec_time) AS avg_time
    FROM events_raw
    WHERE event_time BETWEEN '2026-03-17 00:00:00' AND '2026-03-17 23:59:59'
    GROUP BY query_id
),
after AS (
    SELECT query_id, avg(total_exec_time) AS avg_time
    FROM events_raw
    WHERE event_time BETWEEN '2026-03-18 00:00:00' AND '2026-03-18 23:59:59'
    GROUP BY query_id
)
SELECT
    a.query_id,
    b.avg_time AS before_ms,
    a.avg_time AS after_ms,
    (a.avg_time - b.avg_time) / b.avg_time * 100 AS pct_change
FROM after a
JOIN before b ON a.query_id = b.query_id
WHERE a.avg_time > b.avg_time * 1.5  -- 50%+ regression
ORDER BY pct_change DESC
LIMIT 20;

Load Attribution

Who's hammering the database?

SELECT
    application_name,
    usename,
    client_addr,
    count(*) AS queries,
    sum(total_exec_time) AS total_time_ms
FROM events_raw
WHERE event_time > now() - INTERVAL 1 HOUR
GROUP BY application_name, usename, client_addr
ORDER BY total_time_ms DESC
LIMIT 20;

The Unified Stack Vision

pg_stat_ch is part of ClickHouse's broader vision: PostgreSQL for OLTP, ClickHouse for analytics and observability. Your transactional database stays focused on transactions. Your analytical database handles the heavy lifting of storing and querying billions of events.

This pattern works because:

PostgreSQL is excellent at ACID transactions
ClickHouse is excellent at columnar analytics
Streaming events between them keeps both doing what they're best at

Conclusion

pg_stat_ch brings event-level observability to PostgreSQL without the overhead of traditional APM agents. If you run PostgreSQL at scale and want to understand exactly what your database is doing, this is worth trying.

The extension is Apache 2.0 licensed and actively developed. Query plans and longer query text capture are on the roadmap.

Running PostgreSQL in production? Akmatori AI agents can analyze your pg_stat_ch events to automatically detect performance regressions and suggest optimizations.

Akmatori team