03.02.2026

Understanding Linux OOM Killer: A Complete Guide for SREs

head-image

Every SRE has experienced that dreaded moment: a critical process suddenly dies with no apparent reason, and the only clue is a cryptic kernel message about "Out of memory." The culprit? Linux's Out-of-Memory (OOM) Killer. Understanding how it works is essential for managing production systems effectively.

What is the OOM Killer?

The OOM Killer is a Linux kernel mechanism that activates when the system runs critically low on memory. Rather than allowing the entire system to become unresponsive or crash, the kernel selects and terminates processes to free up memory. While this prevents total system failure, it can cause significant disruption if it kills the wrong process.

When Does OOM Killer Activate?

The OOM Killer triggers when:

  1. Physical memory is exhausted and no more pages can be reclaimed
  2. Swap space is full or swap is disabled
  3. Memory allocation fails despite the kernel's attempts to free memory through page cache eviction and other mechanisms

You can check current memory pressure with:

cat /proc/meminfo | grep -E "MemFree|MemAvailable|SwapFree"

Or monitor memory pressure in real-time:

watch -n 1 'free -h'

How Does OOM Killer Select Its Victims?

The kernel uses a scoring system called oom_score to determine which process to kill. Each process receives a score from 0 to 1000, with higher scores making a process more likely to be terminated.

The scoring algorithm considers:

  • Memory consumption: Processes using more memory get higher scores
  • Process age: Newer processes are slightly preferred for killing
  • Nice value: Lower-priority (higher nice) processes are preferred targets
  • Root privileges: Root processes receive a slight score reduction
  • oom_score_adj: Manual adjustments set by administrators

View a process's OOM score:

cat /proc/<pid>/oom_score

View the adjustable component:

cat /proc/<pid>/oom_score_adj

Tuning OOM Killer Behavior

Adjusting Process Priority

You can influence which processes get killed by modifying oom_score_adj. The value ranges from -1000 to 1000:

  • -1000: Never kill this process (OOM immune)
  • 0: Default behavior
  • 1000: Always kill this process first

Protect a critical process:

echo -1000 > /proc/<pid>/oom_score_adj

Make a process a preferred target:

echo 500 > /proc/<pid>/oom_score_adj

For persistent configuration, use systemd service files:

[Service]
OOMScoreAdjust=-500

System-Wide OOM Settings

Control overall OOM behavior through sysctl:

# Panic on OOM instead of killing processes (use with caution)
sysctl -w vm.panic_on_oom=1

# Allow overcommit (0=heuristic, 1=always, 2=never)
sysctl -w vm.overcommit_memory=0

# Overcommit ratio (when overcommit_memory=2)
sysctl -w vm.overcommit_ratio=50

Make settings persistent in /etc/sysctl.conf:

vm.panic_on_oom = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

Reading OOM Killer Logs

When OOM Killer strikes, it logs detailed information to the kernel ring buffer:

dmesg | grep -i "out of memory"

Or search system logs:

journalctl -k | grep -i "oom"

A typical OOM message includes:

Out of memory: Killed process 12345 (nginx) total-vm:2048000kB, anon-rss:1500000kB, file-rss:10000kB, shmem-rss:0kB

Key fields:

  • total-vm: Total virtual memory allocated
  • anon-rss: Anonymous (heap/stack) resident memory
  • file-rss: File-backed resident memory
  • shmem-rss: Shared memory resident size

Preventing OOM Situations

1. Monitor Memory Proactively

Set up alerts before memory becomes critical:

# Check available memory percentage
awk '/MemTotal/{total=$2} /MemAvailable/{available=$2} END{print (available/total)*100}' /proc/meminfo

2. Set Resource Limits

Use cgroups to limit memory per application:

# Create a cgroup with 2GB memory limit
mkdir /sys/fs/cgroup/memory/myapp
echo 2G > /sys/fs/cgroup/memory/myapp/memory.limit_in_bytes
echo <pid> > /sys/fs/cgroup/memory/myapp/cgroup.procs

With systemd:

[Service]
MemoryLimit=2G
MemoryHigh=1.8G

3. Configure Swap Appropriately

While swap can prevent OOM, excessive swapping degrades performance:

# Check swap usage
swapon --show

# Adjust swappiness (0-100, lower = less swapping)
sysctl -w vm.swappiness=10

4. Use Memory Overcommit Wisely

Strict overcommit prevents memory oversubscription:

# Disable overcommit (strict mode)
sysctl -w vm.overcommit_memory=2
sysctl -w vm.overcommit_ratio=80

This ensures processes can only allocate memory that actually exists.

Kubernetes and OOM Killer

In containerized environments, understanding OOM behavior is crucial:

resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

When a container exceeds its memory limit, it gets OOM-killed by the container runtime (not the kernel OOM Killer). Check for OOM kills:

kubectl describe pod <pod-name> | grep -i oom

Or check container status:

kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[*].lastState.terminated.reason}'

Debugging Memory Issues

Find Memory-Hungry Processes

ps aux --sort=-%mem | head -20

Check Process Memory Details

cat /proc/<pid>/status | grep -E "VmSize|VmRSS|VmSwap"

Monitor Memory Allocation Patterns

# Watch memory allocations in real-time
vmstat 1

# Detailed memory statistics
cat /proc/meminfo

Best Practices for Production

  1. Never set critical services to oom_score_adj=-1000 unless absolutely necessary—if all important processes are immune, the OOM Killer may kill init or other essential system processes

  2. Use cgroups/containers to isolate memory usage and prevent runaway processes from affecting the entire system

  3. Monitor and alert on memory usage trends before reaching critical levels

  4. Test OOM behavior in staging environments to understand how your applications respond

  5. Document your OOM tuning so team members understand why certain processes have adjusted scores

Conclusion

The Linux OOM Killer is a critical safety mechanism that prevents complete system failure during memory exhaustion. Understanding how it selects victims and how to tune its behavior is essential for maintaining reliable production systems. By proactively monitoring memory, setting appropriate limits, and configuring OOM scores correctly, you can minimize unexpected process terminations and maintain system stability. For automated incident response and intelligent alerting when OOM events occur, consider using Akmatori, an open-source AI agent platform that helps SRE teams respond to infrastructure incidents faster.

FAQ

  • Can I completely disable OOM Killer?

    • You can set vm.panic_on_oom=1 to make the system panic instead of killing processes, but this is rarely desirable in production as it causes a full system crash.
  • Why was my process killed even with low oom_score?

    • If all processes have low scores, the kernel still needs to kill something. The lowest-scored process among those with sufficient memory to make a difference will be selected.
  • How do I know if a process was OOM-killed?

    • Check dmesg or journalctl -k for OOM messages, or look for exit code 137 (128 + signal 9) in your process logs.
  • Does setting oom_score_adj=-1000 guarantee my process won't be killed?

    • Almost, but if the kernel has no other option, even protected processes may be terminated as a last resort.

Automate incident response and prevent on-call burnout with AI-driven agents!