Why Your Linux Server OOMs at the Worst Possible Moment

Three years ago, I watched a production service go down at 2 AM because the OOM killer terminated our database connection pooler. The system had 128GB of RAM. The Java heap was set to 32GB. No one could figure out why the system ran out of memory when "only" 40GB was being used.

I spent six hours debugging this. Six hours reading /proc/meminfo output, /var/log/messages, and wishing I had understood Linux memory management thirty years earlier. This article is what I wish someone had explained to me before that night.

The Lie You're Believing About Memory

Most engineers think memory is simple: you have X GB of RAM, your process allocates Y GB, and when Y > X, you're out of memory. That's not how Linux works. Not even close.

Linux uses memory overcommit. The kernel will happily promise memory to processes that ask for it, even if that total exceeds physical RAM. Why? Because most programs don't actually use all the memory they allocate.

Think about malloc(). You call malloc(1GB). The kernel says "sure, here's a gigabyte" without checking if that gigabyte actually exists in hardware. The memory only gets allocated (committed) when you write to it. If you allocate a gigabyte but only touch the first page, the kernel only needs to provide 4KB of actual RAM.

This is fundamentally sound engineering—it lets systems run workloads that would be impossible otherwise. But it creates a situation where "free memory" calculations can be deeply misleading.

What `/proc/meminfo` Actually Tells You

Let me show you what I mean. Here's a typical /proc/meminfo from a production server that looks healthy:

MemTotal:       67067924 kB
MemFree:         8912340 kB
MemAvailable:   12345678 kB
Buffers:          512345 kB
Cached:          8765432 kB

"MemAvailable: 12GB" looks comfortable, right? Your process is using 8GB, you're at 66% capacity.

But here's what you're missing:

Shmem:            2345678 kB
SReclaimable:     1234567 kB
VmallocTotal:    34359738367 kB
VmallocUsed:       567890 kB
VmallocChunk:     567890 kB

And then there's this gem:

Committed_AS:   89012345 kB

Committed_AS is the key metric nobody looks at. This shows the total amount of memory that has been promised to processes, regardless of whether it exists in physical RAM. In this case, that's about 85GB of committed memory on a 64GB system.

When the kernel's committed memory exceeds physical RAM + swap, that's when the OOM killer gets involved. It doesn't matter if your Java process is "only" using 32GB on a 64GB machine. If other processes have promised themselves 90GB total, you're one malloc away from a kill order.

The Overcommit Modes

Linux has three overcommit modes, controlled by /proc/sys/vm/overcommit_memory:

Mode 0 (Heuristic): The default. The kernel makes "reasonable" guesses about when to allow overcommit. The heuristics are... optimistically stupid for most production workloads.

Mode 1 (Always): The kernel will always promise memory, never failing allocations. This is what most containers run with. If you run out, the OOM killer handles it.

Mode 2 (Never): Strict accounting. The kernel only promises memory that actually exists. This catches bugs in your application but breaks most software that expects overcommit behavior.

Here's how to check your current mode:

cat /proc/sys/vm/overcommit_memory

And here's the setting you should understand before changing anything:

# The overcommit ratio (only matters in mode 0)
cat /proc/sys/vm/overcommit_ratio

# Maximum allowed committed memory in bytes
cat /proc/sys/vm/overcommit_kbytes

How the OOM Killer Decides Who Dies

When the kernel finally runs out of memory—not when your process runs out, but when the system runs out—the OOM killer springs into action. And it has to make an impossible decision: which process should die to save the system?

The kernel uses a scoring system. Each process gets an oom_score based on several factors:

Memory usage: Processes using more memory score higher (more likely to die)
Process age: Newer processes score slightly higher
nice value: Higher nice values (lower priority) score higher
Root processes: Slightly lower score (some protection)
OOM killer adjustments: You can manually set /proc/PID/oom_score_adj to tune this

Let me show you how to see who's at risk:

# Find the OOM score for all processes
for pid in $(ps aux | awk '{print $2}' | grep -E '^[0-9]+$'); do 
    score=$(cat /proc/$pid/oom_score 2>/dev/null)
    cmd=$(cat /proc/$pid/cmdline 2>/dev/null | tr '\0' ' ')
    if [ -n "$score" ] && [ "$score" -gt 100 ]; then
        echo "PID: $pid Score: $score Cmd: $cmd"
    fi
done | sort -t: -k2 -n -r | head -20

This will show you which processes are most likely to die first. I ran this on a production box and found that our log aggregation agent had a score of 847—highest on the system. It was using minimal memory but had been running for 47 days. The OOM killer was one system-wide pressure event away from terminating it.

The Scenario That Killed Our Service

Here's what happened that night. Our system looked like this:

Java service: 32GB heap, using 28GB
Node.js API layer: 8GB configured, using 5GB
Prometheus metrics: 2GB
PostgreSQL: 16GB shared buffers
Kernel cache: 4GB

Total "usage": 63GB out of 64GB. Looks fine.

But Committed_AS was 78GB. And here's what nobody noticed: our Node.js service was running a library that pre-allocated a 4GB buffer pool "just in case." It never used it, but the kernel had promised it.

At 11:58 PM, traffic spiked. The Java service's GC ran and created pressure. It touched more heap pages, increasing RSS. Another process started a background job and touched its reserved memory. Suddenly, the kernel needed to commit 79GB and only had 64GB + 8GB swap.

The OOM killer evaluated the situation and... killed the Java checkout service.

Why? Because despite only "using" 28GB of heap, the JVM had memory-mapped files (for JIT caches and garbage collection metadata) that pushed its actual memory footprint higher. Plus, Java's memory allocator was holding onto regions the kernel had promised to other processes.

The killer didn't care about "available memory." It cared about committed memory exceeding physical capacity.

How to Actually Debug OOM Events

When an OOM event happens, the kernel logs it. Here's what to look for:

# The canonical location
dmesg | grep -i "out of memory"
dmesg | grep -i "oom"
dmesg | grep -i "killed process"

You'll see something like:

[145923.456789] java invoked oom-killer: gfp_mask=0x6200cc0(GFP_KERNEL), order=0, oom_score_adj=0
[145923.456791] java: page allocation failure: order:0, mode:0x6200cc0(GFP_KERNEL)
[145923.456793] Node in normal state, dropping a charge. Turn off to debug.
[145923.456795] Memory cgroup out of memory: Killed process 18432 (java) total-vm:38765432kB, anon-rss:29345678kB, file-rss:0kB

That total-vm:38765432kB is committed virtual memory. The anon-rss:29345678kB is actual physical memory (anonymous pages). The gap between them is your overcommit headroom that just ran out.

Prevention Strategies That Actually Work

1. Monitor Committed Memory

This is the single most important thing you can do:

# Add to your monitoring
echo "Committed: $(cat /proc/meminfo | grep Committed_AS | awk '{print $2}') kB"
echo "Total: $(cat /proc/meminfo | grep MemTotal | awk '{print $2}') kB"
echo "Swap: $(cat /proc/meminfo | grep SwapTotal | awk '{print $2}') kB"

Set an alert when Committed_AS > (MemTotal + SwapTotal) * 0.8.

2. Limit Process Memory with cgroups

This is how serious production environments handle this:

# Create a memory cgroup
sudo cgcreate -a $USER -t $USER -g memory:/limited_services

# Add your process
echo 8G > /sys/fs/cgroup/memory/limited_services/memory.limit_in_bytes
echo 8G > /sys/fs/cgroup/memory/limited_services/memory.memsw.limit_in_bytes

# Move process into cgroup
sudo cgexec -g memory:/limited_services /path/to/your/service

The memsw.limit_in_bytes sets both memory and swap limits. When a process hits this limit, malloc() will actually fail instead of triggering the OOM killer.

3. Set Per-Process OOM Preferences

You can't always prevent OOM conditions, but you can influence who survives:

# Make a process unlikely to be killed (negative adjustment)
echo -500 > /proc/PID/oom_score_adj

# Make a process likely to be killed (positive adjustment)  
echo 500 > /proc/PID/oom_score_adj

In our case, we set our checkout Java service to -300 and our log aggregator to +300. When pressure comes, the log aggregator dies first—much preferrable to a critical transaction failing.

4. Configure Swap Correctly

I know, I know. Swap is "slow." But here's the reality: without swap, the kernel has no headroom for memory overcommit. When you run out, you immediately trigger OOM. With swap, the kernel can page out infrequently used memory and buy time.

# Size swap at 10-20% of RAM for production systems
sudo swapon --size 8G /dev/some_partition

# Or use a swap file
sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

5. Use the Right Overcommit Mode for Your Workload

For most containerized workloads, Mode 2 (strict) is actually the right answer:

# Check current mode
cat /proc/sys/vm/overcommit_memory

# Set to strict accounting (mode 2)
echo 2 > /proc/sys/vm/overcommit_memory

# This causes malloc() to return NULL when memory is truly exhausted
# Your application MUST handle NULL returns correctly!

This will crash your application at the allocation site instead of having the OOM killer randomly kill something. For a well-written service, that's actually better behavior.

The Real Lesson

The OOM killer isn't a bug. It's a feature—the kernel's last resort when all other memory management strategies have failed. The problem is that most engineers treat it like a random event that "just happens sometimes."

It doesn't just happen. It happens because your mental model of memory doesn't match how the kernel actually works. The gap between "memory used" and "memory committed" is where production incidents hide.

Check your Committed_AS right now. If it's higher than your physical memory plus swap, you're living on borrowed time. The question isn't whether you'll hit the OOM killer—it's which process will die when you do.

Don't find out at 2 AM.

If you've survived your own OOM kill stories, you know the feeling of watching processes die for seemingly impossible reasons. The kernel is doing exactly what it was designed to do—the problem is usually that we didn't understand the design.