Understanding Linux delay accounting for performance insights

When containers miss SLOs despite acceptable CPU utilization, traditional metrics fail to tell the full story. Linux delay accounting fills this gap by measuring the time tasks spend waiting for kernel resources. This feature, available in the Linux kernel documentation, transforms performance debugging from guesswork into data-driven investigation.
What is delay accounting?
Delay accounting is a Linux kernel subsystem that tracks per-task delays across eight resource categories. Tasks experience delays when waiting for:
- CPU scheduling - waiting for processor availability
- Synchronous block I/O - disk operation completion
- Page swap-in - swapping pages from disk to memory
- Memory reclaim - kernel freeing memory under pressure
- Page cache thrashing - excessive page eviction and reloading
- Memory compaction - defragmenting memory pages
- Write-protect copy - copy-on-write operations
- IRQ/SOFTIRQ - interrupt request handling
The feature exposes cumulative delay statistics through the taskstats interface, enabling both real-time monitoring during task execution and post-exit analysis. This granular visibility helps operators distinguish between benign resource usage and problematic starvation.
Unlike traditional utilization metrics that show how much of a resource is consumed, delay accounting reveals how long tasks wait for resources to become available—a critical distinction for diagnosing performance issues.
Why Use delay accounting?
Standard utilization metrics reveal resource consumption but hide contention impacts. A container consuming 100% CPU might run smoothly or struggle with constant delays—utilization alone cannot tell the difference.
Delay accounting provides:
- Noisy Neighbor Detection: Identify resource-hungry containers degrading latency-sensitive applications.
- Throttling Diagnosis: Separate CPU contention from hard limit enforcement.
- Correlation Analysis: Link observed delays directly to service latency or error rates.
- QoS Validation: Verify that priority classes protect critical workloads as intended.
Enabling delay accounting
To activate this functionality, compile your kernel with CONFIG_TASK_DELAY_ACCT=y and CONFIG_TASKSTATS=y. Most modern distributions include these options.
Check if your kernel supports delay accounting:
# Check kernel config
grep CONFIG_TASK_DELAY_ACCT /boot/config-$(uname -r)
# Should return: CONFIG_TASK_DELAY_ACCT=y
Enable at boot by adding the delayacct kernel parameter to your bootloader configuration, or toggle it at runtime:
# Enable delay accounting
sysctl -w kernel.task_delayacct=1
# Make it persistent across reboots
echo "kernel.task_delayacct = 1" >> /etc/sysctl.conf
Verify the setting:
sysctl kernel.task_delayacct
# Output: kernel.task_delayacct = 1
Note that enabling delay accounting introduces minimal overhead—typically less than 1% CPU impact—making it suitable for production environments.
Accessing delay statistics
The kernel provides several methods to access delay accounting data:
Using getdelays
The getdelays utility ships in the kernel source under tools/accounting/getdelays.c. It queries per-process or thread group delay statistics through the Netlink interface.
To build the tool:
# Install kernel headers if needed
apt-get install linux-headers-$(uname -r)
# Navigate to kernel source accounting tools
cd /usr/src/linux-headers-$(uname -r)/tools/accounting
# Compile getdelays
make getdelays
# Or download from kernel source repository
wget https://raw.githubusercontent.com/torvalds/linux/master/tools/accounting/getdelays.c
gcc -o getdelays getdelays.c -lnl-3 -lnl-genl-3
Basic usage examples:
# Monitor a specific process (requires root)
sudo getdelays -d -p <pid>
# Run a command and display its delay statistics
sudo getdelays -d -c "stress --cpu 2 --timeout 10s"
# Monitor a thread group
sudo getdelays -d -t <tgid>
# Include I/O accounting statistics
sudo getdelays -di -p <pid>
The output shows cumulative delays across all resource categories:
CPU count real total virtual total delay total delay average
15 100000000 95000000 5000000 333333ns
IO count delay total delay average
42 84000000 2000000ns
SWAP count delay total delay average
0 0 0ns
RECLAIM count delay total delay average
8 120000 15000ns
THRASH count delay total delay average
2 45000 22500ns
COMPACT count delay total delay average
0 0 0ns
WPCOPY count delay total delay average
5 150000 30000ns
IRQ count delay total delay average
20 800000 40000ns
All delay values are reported in nanoseconds, providing precise measurements of resource wait times.
Container monitoring with Prometheus
Modern observability platforms expose delay accounting metrics in Prometheus format. The Coroot node-agent aggregates per-process counters into container-level metrics:
container_resources_cpu_delay_seconds_total: Time spent waiting for CPUcontainer_resources_disk_delay_seconds_total: Time spent waiting for I/Ocontainer_resources_cpu_throttled_seconds_total: Duration throttled by CPU limits
These metrics integrate seamlessly with existing Prometheus/Grafana dashboards, enabling correlation with application SLIs.
Real-world examples
Example 1: CPU scheduling delays
Testing two stress processes with different priorities on a 2-core system reveals dramatic differences:
High priority (default niceness):
- Average CPU delay: ~1.2ms per context switch
Low priority (niceness 19):
- Average CPU delay: ~265ms per context switch
This 200x difference demonstrates how CPU contention affects lower-priority workloads. In production, this helps identify when batch jobs interfere with latency-sensitive services.
Example 2: I/O throttling detection
Running a dd operation without limits:
dd if=/dev/zero of=/tmp/test bs=1M count=8192
Average I/O delay: ~2ms
The same operation throttled to 1MB/s using Docker's --device-write-bps:
Average I/O delay: ~2677ms (1000x increase)
Delay accounting immediately reveals the I/O bottleneck, distinguishing throttling from genuine disk performance issues.
Example 3: Noisy neighbor detection
A production Kubernetes cluster experiences intermittent latency spikes in a reservations service. Traditional metrics show:
- CPU utilization: 65%
- Memory usage: Normal
- Network latency: Normal
Examining delay accounting metrics reveals:
container_resources_cpu_delay_seconds_total{container="reservations"}
increasing by 2-5 seconds per minute
Further investigation shows a stats-aggregator sidecar consuming CPU bursts during aggregation windows. The CPU delay metric correlates perfectly with P99 latency increases, confirming the noisy neighbor hypothesis.
Solution: Move the aggregator to a separate pod with QoS class BestEffort, isolating it from latency-sensitive workloads.
Operational insights
Delay accounting shines during incident response. When a reservations service violates latency SLOs, CPU delay metrics might reveal that a stats-aggregator container causes runqueue saturation—even though total CPU usage appears normal. This correlation pinpoints root causes faster than utilization analysis alone.
Operators also use delay accounting to right-size container limits. By monitoring delay patterns under load, teams determine whether throttled containers need higher ceilings or if co-located workloads require isolation.
Integration with alerting
Set up Prometheus alerts based on delay thresholds:
- alert: HighCPUDelay
expr: rate(container_resources_cpu_delay_seconds_total[5m]) > 0.1
for: 5m
annotations:
summary: "Container experiencing CPU starvation"
description: "{{ $labels.container }} has excessive CPU delay"
This proactive approach catches performance degradation before users report issues.
Troubleshooting common scenarios
Scenario 1: Database query slowdowns
Symptoms: Application reports slow database queries, but database CPU/memory metrics look normal.
Investigation:
# Monitor PostgreSQL process
sudo getdelays -d -p $(pgrep -f postgres)
Findings: High I/O delay values (>100ms average) indicate disk contention. Check:
- Other processes competing for disk I/O
- Storage backend performance (cloud disk IOPS limits)
- Whether queries trigger excessive page swaps
Resolution: Increase IOPS limits, optimize queries, or move to faster storage tier.
Scenario 2: Container restart loops
Symptoms: Container restarts frequently with OOMKilled status, but memory usage appears below limits.
Investigation: Check memory reclaim delays:
# Monitor container processes
for pid in $(pgrep -f my-app); do
sudo getdelays -d -p $pid
done
Findings: Extremely high RECLAIM delays (>1 second) indicate the kernel struggles to free memory before the OOM killer intervenes.
Resolution: Increase memory limits or optimize application memory footprint. Memory pressure occurs before hitting hard limits.
Scenario 3: Batch job interference
Symptoms: API latency spikes correlate with batch job execution times.
Investigation: Compare delay metrics:
# Monitor API process CPU delays
watch -n 1 'sudo getdelays -d -p $(pgrep -f api-server) | grep "CPU.*delay"'
Findings: CPU delay increases from 1ms to 50ms+ during batch job windows.
Resolution: Use CPU cgroups to limit batch job CPU shares or schedule batch jobs during off-peak hours. Consider using nice values to deprioritize batch workloads.
Scenario 4: Slow container startup
Symptoms: Containers take 30+ seconds to start, but no obvious bottleneck in logs.
Investigation:
# Monitor startup process
sudo getdelays -d -c "docker run my-image"
Findings: High IO and COMPACT delays during image layer extraction.
Resolution: Optimize container image layers, use pre-pulled images, or increase node disk performance.
Best practices
Monitor continuously
Enable delay accounting permanently on production systems. The minimal overhead justifies the diagnostic value during incidents.
Establish baselines
Record normal delay patterns for your workloads. Deviations from baseline often indicate issues before traditional metrics trigger alerts.
Correlate with application metrics
Combine delay accounting data with application-level metrics (request latency, error rates) to build complete performance profiles.
Use delay metrics for capacity planning
Persistent CPU delays indicate undersized compute resources. Persistent I/O delays suggest storage upgrades. Let delay data drive infrastructure decisions.
Integrate with runbooks
Include delay accounting commands in incident response runbooks. Quick access to delay statistics accelerates root cause analysis.
Streamline performance monitoring with Akmatori
Delay accounting is one tool in a comprehensive observability strategy. To automate incident response, minimize downtime, and prevent alert fatigue, try Akmatori. Akmatori handles alerts seamlessly, ensuring your systems run smoothly.
Boost your DevOps and SRE workflows today with Akmatori.
Why Gcore complements your monitoring workflow
Need reliable infrastructure for running latency-sensitive workloads? Gcore offers high-performance virtual machines and bare metal servers across the globe. Whether you're testing kernel features or running production services, Gcore ensures top-notch performance and scalability.
Conclusion
Linux delay accounting transforms CPU performance analysis from utilization guesswork into objective measurement. By tracking per-task resource waits, operators identify contention issues, validate QoS policies, and correlate infrastructure metrics with application SLOs.
Enable delay accounting today and gain the visibility needed for proactive performance management. And don't forget to check out Akmatori and Gcore for comprehensive system monitoring and reliable infrastructure.
Thanks for reading. Let us know how you're using delay accounting in your production environments.
