Memory Limits and Out-of-Heap Errors in Node.js

When a Node.js process exhausts V8’s managed heap, the runtime emits a fatal FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory and aborts immediately — no catch, no cleanup, no recovery. This page is part of the JavaScript Memory Fundamentals & Runtime Mechanics reference and covers the exact V8 architecture behind heap limits, a deterministic diagnostic workflow for OOM crashes, safe --max-old-space-size tuning, and the code patterns needed to detect pressure before it becomes fatal. See also why Node.js processes hit the heap limit and how to fix it for a quick-reference diagnostic matrix.

Conceptual Grounding: V8 Heap Architecture and the OOM Threshold

V8 partitions managed memory into generations. The new space (young generation) holds short-lived objects in two semi-spaces of roughly 1–8 MB each; minor garbage collection (Scavenge) runs here frequently and cheaply. The old space (old generation) receives objects that survive one or more Scavenge cycles or that are allocated too large for new space directly. On 64-bit systems the old-space default ceiling is approximately 1.5 GB, though V8 scales this heuristically based on available physical RAM and Node.js version.

Two additional regions matter for OOM diagnosis:

  • Large object space: Objects larger than ~512 KB land here and are never moved, which makes fragmentation a one-way process.
  • Code space: Compiled JIT machine code lives here; it is capped separately and rarely causes user-visible OOM but can starve other spaces indirectly.

The hard limit is enforced synchronously by V8’s Heap::CollectAllAvailableGarbage path. Once heapTotal approaches the configured ceiling and a new allocation request cannot be satisfied after a forced major GC, V8 calls FatalProcessOutOfMemory, which writes the error to stderr and calls abort(). The following diagram shows the memory regions and the allocation path that triggers OOM.

V8 Heap Regions and OOM Allocation Path Diagram showing V8 memory spaces (new space, old space, large object space, code space) and the allocation flow that leads to a fatal out-of-memory error when the heap ceiling is reached. New Space (Young Gen) 1–8 MB per semi-space Old Space (Old Gen) Default ceiling ~1.5 GB Large Object Space >512 KB, never moved Code Space Allocation Request new Object() / Buffer / string small/short-lived promoted >512 KB Major GC Forced CollectAllAvailableGarbage() ceiling hit FATAL ERROR → abort() still insufficient

Understanding the V8 heap layout and memory segments in detail matters because different spaces have different GC mechanics: an OOM from old-space exhaustion calls for a different fix than one driven by large-object-space fragmentation or code-space pressure.

Diagnostic Workflow: Isolating the OOM Root Cause

Follow this step-by-step procedure before adjusting any flags. Raising --max-old-space-size without understanding why the heap grew only delays the crash.

Step 1 — Enable GC tracing at startup

# --trace-gc emits one line per minor or major collection to stderr.
# Pipe to a file so you can analyse timing outside the terminal.
node --trace-gc app.js 2>gc.log

Expected output lines look like:

[87234:0x...] 4821 ms: Mark-sweep (reduce) 1486.3 (1538.1) -> 1421.7 (1538.1) MB, 312.4 / 0.0 ms

Fields: PID, timestamp (ms), collection type, before/after heap used, before/after heap total, wall time for the collection.

Step 2 — Capture a baseline heap snapshot before load

# --heapsnapshot-signal enables on-demand heap dumps without modifying app code.
# Send SIGUSR2 to write a .heapsnapshot file into the working directory.
node --heapsnapshot-signal=SIGUSR2 app.js &
kill -USR2 $!   # triggers immediately for the baseline

The file appears as Heap-<timestamp>.heapsnapshot (Node.js ≥ 14). Rename it baseline.heapsnapshot.

Step 3 — Apply realistic load

Run your load test (k6, autocannon, or Artillery) against the server. Watch process.memoryUsage().heapUsed via a monitoring endpoint or by polling heapUsed in a setInterval. Continue until heapUsed stabilises at its peak value for at least 60 seconds.

Step 4 — Capture a peak heap snapshot

kill -USR2 <pid>   # capture at peak heapUsed

Rename the output peak.heapsnapshot.

Step 5 — Compare in Chrome DevTools → Memory panel

Open DevTools → Memory → Heap Snapshot → Load and load baseline.heapsnapshot. Then click Load again and load peak.heapsnapshot. Switch the view dropdown to Comparison. Sort by Retained Size (delta) descending.

Look for:

  • (array) — unbounded cache arrays, event-listener lists growing without bound.
  • (string) — accumulated log messages, SSR output buffers never flushed.
  • Buffer — Node.js Buffer objects from HTTP body accumulation or stream misuse.
  • Constructor names from your framework: IncomingMessage, ServerResponse, ReactElement.

Expand the Retainers tree on the largest delta to find the exact closure, module-cache entry, or event emitter that is keeping the objects alive.

Step 6 — Verify GC reclamation after load ends

After the load test finishes, poll process.memoryUsage().heapUsed every 5 seconds for 90 seconds. A healthy process shows heapUsed dropping by at least 70% within 2–3 major GC cycles (typically 10–30 seconds post-load). A leaking process stays within 20% of its peak value regardless of how many major collections run.

Code Patterns & Signatures

Pattern 1 — Proactive heap monitoring with automatic snapshot on threshold breach

Captures a snapshot when heap usage approaches a configurable limit, enabling offline analysis without halting the process. v8.writeHeapSnapshot() is synchronous and may pause the event loop for 1–5 seconds on large heaps; call it only when already near the crash threshold.

const v8 = require('v8');
const os = require('os');

// Snapshot when heapUsed exceeds 85% of the configured old-space ceiling.
// Pass --max-old-space-size in MB via env so this function stays config-driven.
const MAX_OLD_SPACE_MB = Number(process.env.MAX_OLD_SPACE_MB || 1536);
const THRESHOLD_BYTES  = MAX_OLD_SPACE_MB * 1024 * 1024 * 0.85;

function checkHeapAndSnapshot() {
  const { heapUsed, heapTotal, external, rss } = process.memoryUsage();

  if (heapUsed > THRESHOLD_BYTES) {
    const file = v8.writeHeapSnapshot(); // writes Heap-<timestamp>.heapsnapshot
    console.error(
      `[OOM-GUARD] heapUsed=${(heapUsed / 1e6).toFixed(1)} MB exceeded ` +
      `${(THRESHOLD_BYTES / 1e6).toFixed(0)} MB threshold. ` +
      `Snapshot: ${file}`
    );
  }
}

// Poll every 30 s. Avoid setInterval drift by using a corrected timer.
setInterval(checkHeapAndSnapshot, 30_000).unref(); // .unref() prevents blocking process exit

Pattern 2 — Exposing a live memory health endpoint

Useful for health-check endpoints and alerting integrations. Returns structured JSON so monitoring tools can threshold-alert on individual fields.

const v8 = require('v8');

// Call from an Express/Fastify route: GET /healthz/memory
function memoryHealthReport() {
  const mu   = process.memoryUsage();
  const heap = v8.getHeapStatistics();

  return {
    heapUsedMB:    (mu.heapUsed  / 1e6).toFixed(1),
    heapTotalMB:   (mu.heapTotal / 1e6).toFixed(1),
    externalMB:    (mu.external  / 1e6).toFixed(1),      // native module allocations
    rssMB:         (mu.rss       / 1e6).toFixed(1),      // total process resident set
    heapLimitMB:   (heap.heap_size_limit / 1e6).toFixed(0), // V8's configured ceiling
    heapUsedPct:   ((mu.heapUsed / heap.heap_size_limit) * 100).toFixed(1),
    mallocedMemMB: (heap.malloced_memory / 1e6).toFixed(1), // allocator bookkeeping overhead
  };
}

Pattern 3 — Tuning the heap limit safely in the start command

# 4 GB old-space ceiling, GC tracing to a log file, on-demand snapshots via SIGUSR2.
# --expose-gc is optional; it enables global.gc() for manual GC in tests only.
node \
  --max-old-space-size=4096 \
  --trace-gc \
  --heapsnapshot-signal=SIGUSR2 \
  app.js 2>>logs/gc-$(date +%Y%m%d).log

Rule of thumb: set --max-old-space-size to at most 75% of available RAM on the instance, leaving room for the OS, libuv’s thread pool, and native module (Buffer, sharp, bcrypt) allocations that live outside V8’s managed heap.

Pattern 4 — Programmatic GC tracing with --trace-gc-verbose for fragmentation diagnosis

When process.memoryUsage().rss is significantly higher than heapTotal, native fragmentation or external allocations are likely the culprit. Add --trace-gc-verbose to surface compaction decisions:

# --trace-gc-verbose adds per-space stats (new/old/large/code space) after each major GC.
# Useful for identifying large-object space saturation vs old-space growth.
node --trace-gc-verbose app.js 2>&1 | grep -E "(Mark-sweep|New space|Old space|Large object)"

Symptom-to-Fix Reference Table

Symptom Root Cause Immediate Action Measurable Impact
FATAL ERROR: CALL_AND_RETRY_LAST on startup Default 1.5 GB ceiling too low for the dataset loaded during initialisation Add --max-old-space-size=<MB> set to 75% of available RAM Process starts and stabilises; heap_size_limit reported by v8.getHeapStatistics() matches the new value
heapUsed grows monotonically across requests with no GC reclamation Structural memory leak: unbounded cache, retained closure, or undrained event emitter Capture baseline + peak heap snapshots; diff in DevTools → Memory → Comparison → sort Retained Size Identify and remove the retainer; post-fix heapUsed drops >70% after peak load
process.memoryUsage().rss is 2–4× heapTotal Native module allocations (e.g., sharp, canvas, database drivers) consuming memory outside V8’s managed heap Add --trace-gc-verbose; inspect external field of memoryUsage(); audit native addon versions external value falls; rss converges toward heapTotal + external
Major GC pause times exceed 300 ms after raising --max-old-space-size Larger old-space means longer mark-and-sweep traversal; the mark-and-sweep algorithm scales with live-set size Reduce --max-old-space-size; enable incremental marking with --incremental-marking; profile with --trace-gc P99 GC pause drops below 100 ms; event-loop lag decreases
OOM crash under Worker Threads with large data transfer postMessage() copies ArrayBuffer across isolates, doubling heap usage Pass ArrayBuffer via transferList to move ownership without copying: worker.postMessage(buf, [buf.buffer]) heapUsed stays flat during transfer; no duplicate allocation spike
OOM in Next.js SSR under concurrent requests Recursive component render accumulates HTML strings in old space; streaming disabled Enable React 18 streaming SSR (renderToPipeableStream); cap concurrency with a semaphore heapUsed at peak drops 40–60% with streaming vs buffered render
heapUsed spikes then recovers but P99 latency stays high GC forced into stop-the-world mode by allocation rate exceeding incremental marking capacity Throttle inbound request rate; use --max-semi-space-size to give the young generation more room Stop-the-world GC events disappear from --trace-gc output; P99 latency normalises
Kernel OOM kill before V8 FATAL ERROR --max-old-space-size set equal to or above total system RAM; OS has no headroom Reduce --max-old-space-size to ≤75% of physical RAM; add a swap partition as a safety net Process terminates cleanly with V8 fatal error instead of SIGKILL; restartable by process manager

Edge Cases & Gotchas

1. GC Starvation Masking a Structural Leak

When the event loop is saturated with synchronous work, V8’s incremental marking cannot make progress between turns. Minor collections appear in --trace-gc output but major (mark-and-sweep) collections are deferred. Heap growth looks gradual and can be misattributed to traffic volume rather than a leak. Fix: offload CPU-bound work to Worker Threads or child processes to give the event loop idle time for marking. Confirm with --trace-gc: you should see major collections completing, not just Scavenge entries.

2. rss vs heapUsed Confusion

process.memoryUsage().rss is the total resident set size including code segments, stack, and native allocations. heapUsed is only V8-managed objects. A process where rss is 3× heapUsed likely has large native module footprints (sharp, canvas, node-gyp-based drivers) or OS-level page cache allocations. Diagnosing this as a V8 heap leak and raising --max-old-space-size is ineffective; audit the external field instead and profile native modules directly.

3. Snapshot Write Pauses the Event Loop

v8.writeHeapSnapshot() triggers a full garbage collection pass and then serialises the entire heap to disk synchronously. On a process with 2 GB of live objects this can pause the event loop for 5–15 seconds, making the monitoring tool appear to cause the very incident it is trying to diagnose. Mitigate by writing snapshots only when heapUsed is already above the alarm threshold, or in a separate diagnostic process that attaches to the target via node --inspect.

4. Promotion Spikes Under Bursty Traffic

When a burst of requests arrives simultaneously, large numbers of objects are allocated in new space and survive the first Scavenge because they are still reachable mid-request. They are promoted to old space en masse — a promotion spike — saturating old space faster than steady-state traffic would predict. Standard load tests that ramp up gradually miss this pattern. Test with a step-function load profile (zero → peak instantly) to reproduce it.

5. Worker Thread Isolate Limits Are Independent

Each Worker Thread runs its own V8 isolate with its own heap and its own --max-old-space-size limit. A parent process tuned to 4 GB does not automatically give each worker 4 GB; workers inherit the same default (~1.5 GB) unless launched with workerData carrying a flag or with execArgv: ['--max-old-space-size=2048'] passed to the Worker constructor. OOM in a worker appears in the parent’s stderr but the parent process continues running.

FAQ

What is the default V8 heap limit in Node.js?

On 64-bit systems V8 sets the old-space ceiling heuristically based on available physical RAM. On machines with ≥ 4 GB RAM the default is approximately 1.5 GB. Inspect it at runtime with v8.getHeapStatistics().heap_size_limit. The value is printed in bytes; divide by 1e6 for MB. Node.js 18+ exposes it more directly via v8.getHeapCodeStatistics() and the --v8-options flag list.

Can I catch or handle a FATAL ERROR: CALL_AND_RETRY_LAST?

No. V8 calls abort() synchronously after writing the error to stderr; no JavaScript can run after the allocation failure is confirmed. You cannot wrap heap-intensive code in try/catch or process.on('uncaughtException') to survive this error. The only resilience strategy is proactive monitoring (heap threshold checks, automatic snapshots) combined with a process manager like PM2 or systemd with Restart=always to bring the service back online automatically.

How do I distinguish a memory leak from legitimately high usage?

Compare heapUsed across multiple major GC cycles, not just at peak. Legitimate high usage shows clear heap contraction after each major collection — heapUsed drops to a baseline value proportional to the number of long-lived objects the application genuinely holds. A structural leak shows monotonically increasing heapUsed after each collection, even when the application is idle. Heap snapshot diffing (DevTools → Memory → Heap Snapshot → Comparison view, sorted by Retained Size delta) identifies the specific object types and their retainer chains.

How much headroom should I leave between --max-old-space-size and total system RAM?

A safe rule is to cap Node.js at 70–75% of physical RAM. On a 16 GB server that means --max-old-space-size=12288 at most. The remaining capacity is needed for: the OS kernel and page tables (~200–500 MB), libuv’s native thread pool, native module allocations that live outside the V8 heap (counted in process.memoryUsage().external), and headroom for burst allocation before the GC can reclaim space. If you set the flag to 100% of RAM and allocation surges, the Linux OOM killer terminates the process with SIGKILL before V8 can emit its own fatal error, making the root cause harder to diagnose.