Keeping Postgres Job Queues Healthy

Using Postgres as a job queue is still one of the most practical patterns for internal automation. It keeps application state and background work in the same transaction boundary, which makes retries and consistency much easier to reason about. But the workload profile is brutal: rows are inserted, claimed, updated or deleted, and then replaced again at very high frequency. If cleanup falls behind, performance degrades fast.
A recent PlanetScale engineering write-up on keeping a Postgres queue healthy is a good reminder that queue performance problems are often cleanup problems, not raw throughput problems.
What Makes Queue Tables Tricky
The core issue is MVCC. When workers delete completed jobs, those rows do not disappear immediately. They become dead tuples that still consume heap and index space until vacuum reclaims them.
For queue-heavy systems, this creates a dangerous feedback loop:
- workers generate dead tuples continuously
- indexes keep pointers to dead rows until vacuum catches up
- fetch queries scan more useless entries over time
- latency rises even when job volume stays flat
If your queue shares a database with analytics, reporting, or ad hoc queries, long-running transactions can pin the MVCC horizon and delay vacuum even more.
A Safe Worker Pattern
A common pattern is to claim the next pending job with FOR UPDATE SKIP LOCKED:
BEGIN;
SELECT * FROM jobs
WHERE status = 'pending'
ORDER BY run_at
LIMIT 1
FOR UPDATE SKIP LOCKED;
DELETE FROM jobs WHERE id = $1;
COMMIT;
This works well, but only if the transaction stays extremely short. A worker that holds an open transaction while doing network calls, rendering templates, or waiting on external APIs will make cleanup harder for the entire table.
Operational Tips
Keep queue workers boring and fast:
- commit immediately after claiming or finishing work
- move slow business logic outside long transactions
- monitor
n_dead_tup, autovacuum activity, and index growth - separate queue traffic from analytical traffic when possible
- use retry logic so killed or throttled workers recover cleanly
It is also worth setting clear limits on long-running queries in shared Postgres clusters. A queue table can survive high churn, but it cannot stay healthy if other workloads prevent vacuum from reclaiming space.
When to Stop Using Postgres
Postgres is a strong default for transactional background jobs, especially when correctness matters more than extreme throughput. But if you need massive fan-out, long retention, or highly bursty event processing, a dedicated queue or log system may be the better fit.
For most platform teams, the right answer is not "never use Postgres for queues." The right answer is "use it with discipline."
Conclusion
Healthy Postgres queues depend on fast workers, aggressive cleanup, and protection from mixed-workload interference. If you watch dead tuples, keep transactions short, and treat vacuum as part of your queue design, Postgres remains a solid foundation for reliable job processing.
Looking to automate your infrastructure operations? Akmatori provides AI-powered agents that help SRE teams manage complex operational workflows with confidence. Built on Gcore's global infrastructure, Akmatori brings intelligent automation to your operational stack.
