pg_durable: Durable Execution in Postgres

Long-running jobs are where simple automation gets messy. A batch import, embedding pipeline, cleanup task, or maintenance workflow starts as a few SQL statements, then grows retry counters, status columns, queue consumers, and manual replay instructions. pg_durable moves that reliability machinery into Postgres itself.
What Is pg_durable?
pg_durable is a PostgreSQL extension from Microsoft that runs durable SQL workflows inside the database. A durable function is a graph of SQL steps. PostgreSQL executes the graph, checkpoints progress, and can resume from the last durable checkpoint after crashes, restarts, failovers, or failed work.
The project is in preview and targets PostgreSQL 17 and 18. It is built with pgrx and runs a background worker in the PostgreSQL server process. Under the hood it uses duroxide for orchestration and duroxide-pg for runtime state.
Why SREs Should Care
For operators, the appeal is a clearer failure model.
- Checkpointed progress: restart from the last completed step instead of guessing which rows were already processed
- SQL-native visibility: inspect workflow instances and status from Postgres tables such as
df.instances - Less scheduler glue: reduce bespoke worker code for jobs that mostly touch database state
- Parallel execution: model fan-out aggregation or batch work as a workflow graph
- Auditable operations: keep job state under the same backup, auth, and SQL access patterns as the data
This fits ingest pipelines, vector embedding jobs, scheduled maintenance, enrichment, and runbook steps that already live close to Postgres.
Installation
Tagged releases publish Debian packages for PostgreSQL 17 and 18. Install the matching package, preload the extension, and restart PostgreSQL.
sudo dpkg -i pg-durable-postgresql-17_<version>-1_amd64.deb
sudo systemctl restart postgresql
Create the extension in the configured database:
CREATE EXTENSION pg_durable;
Access is not granted to PUBLIC automatically. Grant usage to application roles:
SELECT df.grant_usage('app_role');
Example Workflow
The official quick example shows the shape of a durable workflow:
SELECT df.start(
'SELECT id FROM documents WHERE processed = false LIMIT 100' |=> 'batch'
~> 'UPDATE documents SET processed = true WHERE id = ANY($batch)'
);
That pattern is useful when the work is bigger than one statement, but still close enough to the database that a separate orchestrator feels heavy.
Operational Tips
Treat pg_durable like production database infrastructure. Test crash recovery, failover behavior, WAL growth, lock impact, and backup restore flows before putting important workflows on it.
Keep steps idempotent where possible. Durable execution helps with replay and recovery, but external API calls and other side effects still need careful design.
Be selective. If a job spans many external systems or needs rich application logic, Temporal, Airflow, Argo, or a normal service worker may still be the better control plane.
Conclusion
pg_durable gives Postgres-heavy teams a practical middle ground between fragile scripts and a full external orchestrator. For SREs, the useful idea is simple: background work needs durable state, visible progress, and a replay story before it becomes an incident.
If your team wants AI-assisted incident workflows with strong operational context, Akmatori helps SRE teams investigate alerts, coordinate response, and automate safe infrastructure actions. Powered by Gcore for global infrastructure reliability.
