11.11.2025

Understanding Linux delay accounting for performance insights

head-image

When containers miss SLOs despite acceptable CPU utilization, traditional metrics fail to tell the full story. Linux delay accounting fills this gap by measuring the time tasks spend waiting for kernel resources. This feature, available in the Linux kernel documentation, transforms performance debugging from guesswork into data-driven investigation.


What is delay accounting?

Delay accounting is a Linux kernel subsystem that tracks per-task delays across eight resource categories. Tasks experience delays when waiting for:

  1. CPU scheduling - waiting for processor availability
  2. Synchronous block I/O - disk operation completion
  3. Page swap-in - swapping pages from disk to memory
  4. Memory reclaim - kernel freeing memory under pressure
  5. Page cache thrashing - excessive page eviction and reloading
  6. Memory compaction - defragmenting memory pages
  7. Write-protect copy - copy-on-write operations
  8. IRQ/SOFTIRQ - interrupt request handling

The feature exposes cumulative delay statistics through the taskstats interface, enabling both real-time monitoring during task execution and post-exit analysis. This granular visibility helps operators distinguish between benign resource usage and problematic starvation.

Unlike traditional utilization metrics that show how much of a resource is consumed, delay accounting reveals how long tasks wait for resources to become available—a critical distinction for diagnosing performance issues.


Why Use delay accounting?

Standard utilization metrics reveal resource consumption but hide contention impacts. A container consuming 100% CPU might run smoothly or struggle with constant delays—utilization alone cannot tell the difference.

Delay accounting provides:

  • Noisy Neighbor Detection: Identify resource-hungry containers degrading latency-sensitive applications.
  • Throttling Diagnosis: Separate CPU contention from hard limit enforcement.
  • Correlation Analysis: Link observed delays directly to service latency or error rates.
  • QoS Validation: Verify that priority classes protect critical workloads as intended.

Enabling delay accounting

To activate this functionality, compile your kernel with CONFIG_TASK_DELAY_ACCT=y and CONFIG_TASKSTATS=y. Most modern distributions include these options.

Check if your kernel supports delay accounting:

# Check kernel config
grep CONFIG_TASK_DELAY_ACCT /boot/config-$(uname -r)

# Should return: CONFIG_TASK_DELAY_ACCT=y

Enable at boot by adding the delayacct kernel parameter to your bootloader configuration, or toggle it at runtime:

# Enable delay accounting
sysctl -w kernel.task_delayacct=1

# Make it persistent across reboots
echo "kernel.task_delayacct = 1" >> /etc/sysctl.conf

Verify the setting:

sysctl kernel.task_delayacct
# Output: kernel.task_delayacct = 1

Note that enabling delay accounting introduces minimal overhead—typically less than 1% CPU impact—making it suitable for production environments.


Accessing delay statistics

The kernel provides several methods to access delay accounting data:

Using getdelays

The getdelays utility ships in the kernel source under tools/accounting/getdelays.c. It queries per-process or thread group delay statistics through the Netlink interface.

To build the tool:

# Install kernel headers if needed
apt-get install linux-headers-$(uname -r)

# Navigate to kernel source accounting tools
cd /usr/src/linux-headers-$(uname -r)/tools/accounting

# Compile getdelays
make getdelays

# Or download from kernel source repository
wget https://raw.githubusercontent.com/torvalds/linux/master/tools/accounting/getdelays.c
gcc -o getdelays getdelays.c -lnl-3 -lnl-genl-3

Basic usage examples:

# Monitor a specific process (requires root)
sudo getdelays -d -p <pid>

# Run a command and display its delay statistics
sudo getdelays -d -c "stress --cpu 2 --timeout 10s"

# Monitor a thread group
sudo getdelays -d -t <tgid>

# Include I/O accounting statistics
sudo getdelays -di -p <pid>

The output shows cumulative delays across all resource categories:

CPU    count     real total  virtual total    delay total  delay average
          15      100000000       95000000        5000000        333333ns

IO     count    delay total  delay average
          42       84000000        2000000ns

SWAP   count    delay total  delay average
           0              0              0ns

RECLAIM count    delay total  delay average
           8        120000        15000ns

THRASH count    delay total  delay average
           2         45000        22500ns

COMPACT count    delay total  delay average
           0              0              0ns

WPCOPY count    delay total  delay average
           5        150000        30000ns

IRQ    count    delay total  delay average
          20        800000        40000ns

All delay values are reported in nanoseconds, providing precise measurements of resource wait times.

Container monitoring with Prometheus

Modern observability platforms expose delay accounting metrics in Prometheus format. The Coroot node-agent aggregates per-process counters into container-level metrics:

  • container_resources_cpu_delay_seconds_total: Time spent waiting for CPU
  • container_resources_disk_delay_seconds_total: Time spent waiting for I/O
  • container_resources_cpu_throttled_seconds_total: Duration throttled by CPU limits

These metrics integrate seamlessly with existing Prometheus/Grafana dashboards, enabling correlation with application SLIs.


Real-world examples

Example 1: CPU scheduling delays

Testing two stress processes with different priorities on a 2-core system reveals dramatic differences:

High priority (default niceness):

  • Average CPU delay: ~1.2ms per context switch

Low priority (niceness 19):

  • Average CPU delay: ~265ms per context switch

This 200x difference demonstrates how CPU contention affects lower-priority workloads. In production, this helps identify when batch jobs interfere with latency-sensitive services.

Example 2: I/O throttling detection

Running a dd operation without limits:

dd if=/dev/zero of=/tmp/test bs=1M count=8192

Average I/O delay: ~2ms

The same operation throttled to 1MB/s using Docker's --device-write-bps:

Average I/O delay: ~2677ms (1000x increase)

Delay accounting immediately reveals the I/O bottleneck, distinguishing throttling from genuine disk performance issues.

Example 3: Noisy neighbor detection

A production Kubernetes cluster experiences intermittent latency spikes in a reservations service. Traditional metrics show:

  • CPU utilization: 65%
  • Memory usage: Normal
  • Network latency: Normal

Examining delay accounting metrics reveals:

container_resources_cpu_delay_seconds_total{container="reservations"}
  increasing by 2-5 seconds per minute

Further investigation shows a stats-aggregator sidecar consuming CPU bursts during aggregation windows. The CPU delay metric correlates perfectly with P99 latency increases, confirming the noisy neighbor hypothesis.

Solution: Move the aggregator to a separate pod with QoS class BestEffort, isolating it from latency-sensitive workloads.


Operational insights

Delay accounting shines during incident response. When a reservations service violates latency SLOs, CPU delay metrics might reveal that a stats-aggregator container causes runqueue saturation—even though total CPU usage appears normal. This correlation pinpoints root causes faster than utilization analysis alone.

Operators also use delay accounting to right-size container limits. By monitoring delay patterns under load, teams determine whether throttled containers need higher ceilings or if co-located workloads require isolation.

Integration with alerting

Set up Prometheus alerts based on delay thresholds:

- alert: HighCPUDelay
  expr: rate(container_resources_cpu_delay_seconds_total[5m]) > 0.1
  for: 5m
  annotations:
    summary: "Container experiencing CPU starvation"
    description: "{{ $labels.container }} has excessive CPU delay"

This proactive approach catches performance degradation before users report issues.


Troubleshooting common scenarios

Scenario 1: Database query slowdowns

Symptoms: Application reports slow database queries, but database CPU/memory metrics look normal.

Investigation:

# Monitor PostgreSQL process
sudo getdelays -d -p $(pgrep -f postgres)

Findings: High I/O delay values (>100ms average) indicate disk contention. Check:

  • Other processes competing for disk I/O
  • Storage backend performance (cloud disk IOPS limits)
  • Whether queries trigger excessive page swaps

Resolution: Increase IOPS limits, optimize queries, or move to faster storage tier.

Scenario 2: Container restart loops

Symptoms: Container restarts frequently with OOMKilled status, but memory usage appears below limits.

Investigation: Check memory reclaim delays:

# Monitor container processes
for pid in $(pgrep -f my-app); do
  sudo getdelays -d -p $pid
done

Findings: Extremely high RECLAIM delays (>1 second) indicate the kernel struggles to free memory before the OOM killer intervenes.

Resolution: Increase memory limits or optimize application memory footprint. Memory pressure occurs before hitting hard limits.

Scenario 3: Batch job interference

Symptoms: API latency spikes correlate with batch job execution times.

Investigation: Compare delay metrics:

# Monitor API process CPU delays
watch -n 1 'sudo getdelays -d -p $(pgrep -f api-server) | grep "CPU.*delay"'

Findings: CPU delay increases from 1ms to 50ms+ during batch job windows.

Resolution: Use CPU cgroups to limit batch job CPU shares or schedule batch jobs during off-peak hours. Consider using nice values to deprioritize batch workloads.

Scenario 4: Slow container startup

Symptoms: Containers take 30+ seconds to start, but no obvious bottleneck in logs.

Investigation:

# Monitor startup process
sudo getdelays -d -c "docker run my-image"

Findings: High IO and COMPACT delays during image layer extraction.

Resolution: Optimize container image layers, use pre-pulled images, or increase node disk performance.


Best practices

Monitor continuously

Enable delay accounting permanently on production systems. The minimal overhead justifies the diagnostic value during incidents.

Establish baselines

Record normal delay patterns for your workloads. Deviations from baseline often indicate issues before traditional metrics trigger alerts.

Correlate with application metrics

Combine delay accounting data with application-level metrics (request latency, error rates) to build complete performance profiles.

Use delay metrics for capacity planning

Persistent CPU delays indicate undersized compute resources. Persistent I/O delays suggest storage upgrades. Let delay data drive infrastructure decisions.

Integrate with runbooks

Include delay accounting commands in incident response runbooks. Quick access to delay statistics accelerates root cause analysis.


Streamline performance monitoring with Akmatori

Delay accounting is one tool in a comprehensive observability strategy. To automate incident response, minimize downtime, and prevent alert fatigue, try Akmatori. Akmatori handles alerts seamlessly, ensuring your systems run smoothly.

Boost your DevOps and SRE workflows today with Akmatori.


Why Gcore complements your monitoring workflow

Need reliable infrastructure for running latency-sensitive workloads? Gcore offers high-performance virtual machines and bare metal servers across the globe. Whether you're testing kernel features or running production services, Gcore ensures top-notch performance and scalability.


Conclusion

Linux delay accounting transforms CPU performance analysis from utilization guesswork into objective measurement. By tracking per-task resource waits, operators identify contention issues, validate QoS policies, and correlate infrastructure metrics with application SLOs.

Enable delay accounting today and gain the visibility needed for proactive performance management. And don't forget to check out Akmatori and Gcore for comprehensive system monitoring and reliable infrastructure.

Thanks for reading. Let us know how you're using delay accounting in your production environments.

Automate incident response and prevent on-call burnout with AI-driven agents!