Diagnosing Node.js Memory with heapdump & Clinic.js

When a Node.js service leaks memory in production, the browser-first workflow of clicking “Take snapshot” in DevTools no longer applies — there is no page to profile, the process is headless, and it may be minutes from an out-of-memory kill. This guide covers the production toolchain: writing a .heapsnapshot on a signal, the heapdump module, attaching Chrome DevTools over --inspect, and using Clinic.js Doctor and Bubbleprof to decide whether you even have a heap leak before you start reading retainer chains. It sits within the Node.js Server-Side Memory Management area; for a head-to-head on which capture method to reach for first, see heapdump vs Clinic vs inspect.

The reading skills transfer directly from the browser: once you have two .heapsnapshot files on disk, the comparison technique in take & compare heap snapshots is identical whether the snapshot came from a tab or a server.

Conceptual Grounding

A Node.js “memory leak” is almost never a bug in V8’s garbage collector — it is your code keeping objects reachable from a GC root that never lets go. The three roots that dominate server leaks are the module cache (anything hung off a required singleton), timer and event-emitter registries (setInterval callbacks, listeners on a long-lived emitter), and per-request closures that outlive the request because they were pushed into a global array or a promise that never settles.

The tooling splits cleanly into two jobs. The first job is detection: is memory actually trending up, and is it in V8’s managed heap or in external memory? process.memoryUsage() and Clinic.js Doctor answer this. Doctor samples RSS, heap, event-loop delay, and CPU, then applies heuristics to classify the fault. The second job is attribution: which objects, held by which retainer chain. That requires a .heapsnapshot — a full serialization of every live object, its shallow size, and the edges between them.

The critical distinction the diagram below makes concrete: heapUsed covers only V8’s object graph. Buffer backing stores, ArrayBuffer memory, and native addon allocations show up as external and arrayBuffers, and a .heapsnapshot barely reflects them. Choosing the wrong tool for the wrong memory region is the single most common way engineers waste a day on a Node.js leak.

Clinic.js adds a third capability beyond detection and attribution: causation over time. Bubbleprof instruments the async operation graph — every timer, socket, and promise — and renders which asynchronous flows keep handles and their captured scopes alive. Where a heap snapshot answers “what is retained right now”, Bubbleprof answers “which async pipeline created and held it”, which is invaluable when the leak is a pending operation that never resolves rather than a growing collection. The HeapProfiler subcommand sits between the two: it takes a sampling profile of allocations across a run, so you can see where in your code bytes are being allocated without freezing the process for a full snapshot.

Node.js Memory Diagnostic Decision Flow A flow starting from process.memoryUsage sampling, branching on whether growth is in heapUsed or external memory, and mapping each branch to the correct tool: heap snapshot comparison for managed heap, Buffer and addon auditing for external memory, with Clinic.js Doctor as the upstream classifier. Clinic.js Doctor samples RSS + heap + loop delay every ~10 ms Growth region? heapUsed vs external heapUsed climbing Managed V8 object graph Tool: .heapsnapshot capture SIGUSR2 or heapdump module Read: DevTools Comparison view external climbing Buffers, ArrayBuffer, addons Snapshot will NOT show it Tool: memoryUsage sampling Audit Buffer pools + addons

Diagnostic Workflow

Work these steps in order. Skipping the detection steps and jumping straight to snapshots is how you end up staring at a 1 GB JSON file for a leak that was actually in external memory.

Step 1 — Confirm the trend with process.memoryUsage()

Action: log the four memory regions on a fixed interval so you can distinguish a genuine upward trend from GC sawtooth. Run with node --expose-gc server.js if you want to force collections between samples.

Expected output: heapUsed should oscillate but return to a stable floor after each GC. A floor that ratchets upward every few minutes is a heap leak. A flat heapUsed with rising external is a Buffer or addon problem.

Step 2 — Classify the fault with Clinic.js Doctor

CLI: clinic doctor --on-port 'autocannon localhost:$PORT' -- node server.js

Doctor runs your server, drives load, then opens an HTML report. Expected output: a verdict banner such as “Detected memory issue” plus four synchronized charts (memory, event-loop delay, CPU, active handles). A steadily rising memory chart with flat GC recovery confirms a leak worth snapshotting. If Doctor instead flags event-loop delay, your problem may be main-thread blocking rather than retention — cross-reference interpreting heap snapshots only once memory is confirmed as the fault.

Step 3 — Arm the process for on-demand snapshots

CLI flag: node --heapsnapshot-signal=SIGUSR2 server.js

This built-in flag (Node.js 12+) makes the process write a .heapsnapshot to its working directory whenever it receives SIGUSR2. No code change, no native module. Verify by finding the process PID with pgrep -f server.js and sending kill -USR2 <pid>. Expected output: a file named like Heap.20260705.140212.12345.0.001.heapsnapshot appears on disk within a few seconds.

Step 4 — Capture baseline and post-load snapshots

Action: send SIGUSR2 once after warm-up (Snapshot 1), drive the suspect workload — the same autocannon run, a batch job, a specific route — then send SIGUSR2 again (Snapshot 2). Each capture forces a full GC first, so anything present in both is genuinely retained, not transient.

Expected metric: note the file size of each. A Snapshot 2 that is materially larger (say 180 MB vs 90 MB) quantifies the leak’s on-heap footprint before you open a single retainer chain.

Step 5 — Compare offline in DevTools

DevTools path: open chrome://inspectOpen dedicated DevTools for NodeMemory tab → Load (the up-arrow icon) → select Snapshot 1, then Load Snapshot 2. Select Snapshot 2, switch the dropdown to Comparison and choose Snapshot 1 as the baseline. Sort by # Delta descending.

Expected metric: constructors with a large positive # Delta and near-zero # Deleted are your leak candidates. Click one, expand the Retainers pane, and read the chain to its GC root — a module-level Map, a listener array, or a closure. Prefer the Objects allocated between Snapshot 1 and Snapshot 2 filter in the class dropdown: it hides everything present at baseline and shows only what the workload created and failed to release, which typically cuts the candidate list from thousands of constructors to a handful.

When the leak is an array or Map that grows without bound, sort by Retained Size rather than # Delta. A single Array with 40,000 elements shows one object in the count column but dominates retained bytes — the count-based view can hide it entirely. Retained size is the memory that would be freed if that object were collected, which is the number that actually matters for an OOM budget.

Step 6 — Verify the fix

Action: apply the fix, re-run the identical load, capture Snapshot 3. Expected metric: heapUsed returns within ±5% of the Snapshot 1 baseline and the previously-leaking constructor’s # Delta collapses to near zero across two consecutive batches.

Code Patterns & Signatures

Use this first pattern as a lightweight always-on sampler that writes a structured log line, so your APM can alert on a heap floor that ratchets upward before the process is OOM-killed.

// Sample all four memory regions every 15s and flag heap trend.
let lastFloor = 0; // lowest heapUsed seen since last reset

function sampleMemory() {
  const m = process.memoryUsage();       // bytes, four regions
  const mb = (b) => (b / 1024 / 1024).toFixed(1); // -> MB string
  // external + arrayBuffers live OUTSIDE the V8 heap
  process.stdout.write(JSON.stringify({
    rss: mb(m.rss),                       // total resident set
    heapUsed: mb(m.heapUsed),             // live V8 objects
    external: mb(m.external),             // Buffers, addons
    arrayBuffers: mb(m.arrayBuffers),     // ArrayBuffer stores
  }) + '\n');
  if (m.heapUsed > lastFloor) lastFloor = m.heapUsed; // ratchet
}

setInterval(sampleMemory, 15_000).unref(); // don't block exit

Use this second pattern to write a snapshot from inside your own signal handler when you need a filename convention or an S3 upload that the bare --heapsnapshot-signal flag cannot give you.

const v8 = require('v8');                 // built-in, no install
const path = require('path');

// Synchronous write: forces a full GC, then serializes the heap.
function writeSnapshot(tag) {
  const file = path.join(
    process.env.SNAPSHOT_DIR || '/var/tmp', // writable dir
    `heap-${tag}-${process.pid}-${Date.now()}.heapsnapshot`
  );
  v8.writeHeapSnapshot(file);            // blocks event loop!
  return file;                       // hand off to uploader here
}

// SIGUSR2 is safe to send via `kill -USR2 <pid>` from the shell.
process.on('SIGUSR2', () => {
  const file = writeSnapshot('sigusr2'); // one capture per signal
  console.error(`heap snapshot: ${file}`); // stderr, not stdout
});

Use this third pattern on legacy Node.js (pre-12) or when you specifically want the heapdump module’s callback so you can react after the write completes.

// npm i heapdump  (needs a native toolchain to compile)
const heapdump = require('heapdump');

// The callback fires once the file is flushed to disk.
function dumpAndReport() {
  const name = `/var/tmp/leak-${Date.now()}.heapsnapshot`;
  heapdump.writeSnapshot(name, (err, filename) => {
    if (err) return console.error('heapdump failed', err);
    // Safe point to upload or page an on-call engineer.
    console.error(`wrote ${filename}`); // filename echoes name
  });
}

process.on('SIGUSR2', dumpAndReport); // async result

Use these Clinic.js invocations so the whole team captures reports the same way instead of memorising flag combinations. Save them as shell aliases or npm scripts.

# Doctor: classify leak vs loop-delay vs CPU (30s load).
clinic doctor \
  --on-port 'autocannon -d 30 localhost:3000' \
  -- node server.js

# Bubbleprof: map async ops that retain memory (20s).
clinic bubbleprof \
  --on-port 'autocannon -d 20 localhost:3000' \
  -- node server.js

# HeapProfiler: sampling allocations over the run.
clinic heapprofiler -- node server.js

Symptom-to-Fix Reference

Symptom Root Cause Immediate Action Measurable Impact
heapUsed floor ratchets up each minute Objects retained by a module cache or global array Compare two snapshots by # Delta; add eviction or WeakMap Floor stabilises within 2–3 load batches
external climbs, heapUsed flat Buffer or addon memory outside V8 Audit Buffer pools; snapshot won’t help external growth flattens; RSS drops
Snapshot capture stalls loop 2–3 s Full GC + serialize on large heap Capture on a drained instance only No user-facing latency during dump
Doctor flags “memory issue” Confirmed upward heap trend Arm --heapsnapshot-signal=SIGUSR2 Snapshots pinpoint leaking constructor
(closure) dominates retained size Per-request closure in long-lived scope Expand Retainers; pass only needed data Closure retained size falls sharply
.heapsnapshot too big for DevTools Heap over ~1.5 GB serialized Snapshot smaller instance or raise ceiling File opens; comparison completes
FATAL ERROR: Reached heap limit Live set exceeds old-space cap Raise --max-old-space-size then fix leak Process stops crashing under load

Edge Cases & Gotchas

The snapshot freezes the event loop

v8.writeHeapSnapshot() and heapdump both force a full GC and then serialize synchronously. On a 1 GB heap that is a 1–3 s stall during which the process answers no requests and fails health checks. Never wire SIGUSR2 to fire on all instances at once. Pull one node out of the load balancer, snapshot it, then return it — or snapshot a canary instance that mirrors production traffic.

Snapshots miss external and arrayBuffers memory

A .heapsnapshot serializes the V8 object graph. Buffer backing stores, raw ArrayBuffer memory, and native addon allocations live in external/arrayBuffers and appear only as small wrapper objects in the snapshot, not their true byte cost. If Clinic.js Doctor shows external growth, stop reaching for snapshots and audit Buffer pooling and addon free() paths instead — the same distinction that separates managed-heap leaks from the ones covered under Node.js Server-Side Memory Management as a whole.

heapdump needs a native toolchain

The heapdump npm module compiles native code with node-gyp, which requires Python and a C++ compiler in the build image. In minimal Alpine or distroless containers the install fails. On Node.js 12+ there is no reason to take that dependency — the built-in v8.writeHeapSnapshot() and --heapsnapshot-signal flag do the same job with zero install.

Comparing snapshots from different Node.js versions

Constructor names, internal object layouts, and pointer-compression behaviour differ between Node.js major versions. A baseline captured on Node.js 18 and a comparison captured on Node.js 20 will show spurious deltas that are really version differences, not leaks. Always capture both snapshots from the same binary, and record the exact node --version alongside each file.

Clinic.js sampling changes the timing profile

Clinic.js Doctor and HeapProfiler add sampling overhead that slightly slows the process and shifts GC timing. A leak that surfaces at 500 requests/second under normal load may need a longer autocannon duration to reproduce under instrumentation. Treat Clinic as a classifier that tells you what kind of problem you have; use raw process.memoryUsage() sampling for the precise growth-rate numbers you quote in a bug report.

The working directory must be writable

The bare --heapsnapshot-signal=SIGUSR2 flag writes to the process’s current working directory. In a read-only container filesystem or a directory the service user cannot write to, the signal fires, the event loop stalls for the GC, and then the write silently fails — you get the latency cost with no file to show for it. Set an explicit writable path by using your own handler with v8.writeHeapSnapshot('/var/tmp/...') as in the second code pattern, and confirm the mount is writable before you rely on it during an incident.

SIGUSR2 collides with nodemon and other tools

SIGUSR2 is not exclusively yours. nodemon uses it to trigger restarts, and some process managers repurpose it too. If you arm --heapsnapshot-signal=SIGUSR2 under a supervisor that already claims the signal, your kill -USR2 may restart the process instead of dumping the heap — destroying the very state you wanted to capture. In development, either stop nodemon first or choose a different signal such as SIGUSR1 for your handler, and document which signal each environment uses.

Frequently Asked Questions

Is it safe to capture a heap snapshot on a live production process?

Capturing a snapshot forces a full GC and freezes the event loop for the duration of the write — typically 100 ms to several seconds depending on heap size. On a 1 GB heap expect a 1–3 s stall. Trigger it on a drained instance or one pulled from the load balancer, never on every node simultaneously.

Should I use the heapdump module or the built-in --heapsnapshot-signal flag?

On Node.js 12+ prefer the built-in v8.writeHeapSnapshot() or the --heapsnapshot-signal=SIGUSR2 flag — they ship with the runtime and need no native compilation. The standalone heapdump npm module is only worth it on legacy Node.js versions or when you need its programmatic callback API.

Why does Clinic.js Doctor say my problem is external memory, not the heap?

Doctor reads process.memoryUsage(). If external and arrayBuffers climb while heapUsed stays flat, the growth is in Buffers, ArrayBuffer backing stores, or native addons that live outside V8’s managed heap. A .heapsnapshot will not show that memory, so switch to process.memoryUsage() sampling and Buffer pool auditing instead.

How large will a production .heapsnapshot file be and can I open it?

A .heapsnapshot is roughly 1.5–2x the live heap size on disk, so a 700 MB heap produces a 1–1.4 GB JSON file. DevTools can struggle above ~2 GB; raise the memory ceiling by launching Chrome with a larger --js-flags heap, or load the file into a dedicated tool rather than the browser tab you are debugging in.