Diagnosing Node.js Memory with heapdump & Clinic.js
When a Node.js service leaks memory in production, the browser-first workflow of clicking “Take snapshot” in DevTools no longer applies — there is no page to profile, the process is headless, and it may be minutes from an out-of-memory kill. This guide covers the production toolchain: writing a .heapsnapshot on a signal, the heapdump module, attaching Chrome DevTools over --inspect, and using Clinic.js Doctor and Bubbleprof to decide whether you even have a heap leak before you start reading retainer chains. It sits within the Node.js Server-Side Memory Management area; for a head-to-head on which capture method to reach for first, see heapdump vs Clinic vs inspect.
The reading skills transfer directly from the browser: once you have two .heapsnapshot files on disk, the comparison technique in take & compare heap snapshots is identical whether the snapshot came from a tab or a server.
Conceptual Grounding
A Node.js “memory leak” is almost never a bug in V8’s garbage collector — it is your code keeping objects reachable from a GC root that never lets go. The three roots that dominate server leaks are the module cache (anything hung off a required singleton), timer and event-emitter registries (setInterval callbacks, listeners on a long-lived emitter), and per-request closures that outlive the request because they were pushed into a global array or a promise that never settles.
The tooling splits cleanly into two jobs. The first job is detection: is memory actually trending up, and is it in V8’s managed heap or in external memory? process.memoryUsage() and Clinic.js Doctor answer this. Doctor samples RSS, heap, event-loop delay, and CPU, then applies heuristics to classify the fault. The second job is attribution: which objects, held by which retainer chain. That requires a .heapsnapshot — a full serialization of every live object, its shallow size, and the edges between them.
The critical distinction the diagram below makes concrete: heapUsed covers only V8’s object graph. Buffer backing stores, ArrayBuffer memory, and native addon allocations show up as external and arrayBuffers, and a .heapsnapshot barely reflects them. Choosing the wrong tool for the wrong memory region is the single most common way engineers waste a day on a Node.js leak.
Clinic.js adds a third capability beyond detection and attribution: causation over time. Bubbleprof instruments the async operation graph — every timer, socket, and promise — and renders which asynchronous flows keep handles and their captured scopes alive. Where a heap snapshot answers “what is retained right now”, Bubbleprof answers “which async pipeline created and held it”, which is invaluable when the leak is a pending operation that never resolves rather than a growing collection. The HeapProfiler subcommand sits between the two: it takes a sampling profile of allocations across a run, so you can see where in your code bytes are being allocated without freezing the process for a full snapshot.
Diagnostic Workflow
Work these steps in order. Skipping the detection steps and jumping straight to snapshots is how you end up staring at a 1 GB JSON file for a leak that was actually in external memory.
Step 1 — Confirm the trend with process.memoryUsage()
Action: log the four memory regions on a fixed interval so you can distinguish a genuine upward trend from GC sawtooth. Run with node --expose-gc server.js if you want to force collections between samples.
Expected output: heapUsed should oscillate but return to a stable floor after each GC. A floor that ratchets upward every few minutes is a heap leak. A flat heapUsed with rising external is a Buffer or addon problem.
Step 2 — Classify the fault with Clinic.js Doctor
CLI: clinic doctor --on-port 'autocannon localhost:$PORT' -- node server.js
Doctor runs your server, drives load, then opens an HTML report. Expected output: a verdict banner such as “Detected memory issue” plus four synchronized charts (memory, event-loop delay, CPU, active handles). A steadily rising memory chart with flat GC recovery confirms a leak worth snapshotting. If Doctor instead flags event-loop delay, your problem may be main-thread blocking rather than retention — cross-reference interpreting heap snapshots only once memory is confirmed as the fault.
Step 3 — Arm the process for on-demand snapshots
CLI flag: node --heapsnapshot-signal=SIGUSR2 server.js
This built-in flag (Node.js 12+) makes the process write a .heapsnapshot to its working directory whenever it receives SIGUSR2. No code change, no native module. Verify by finding the process PID with pgrep -f server.js and sending kill -USR2 <pid>. Expected output: a file named like Heap.20260705.140212.12345.0.001.heapsnapshot appears on disk within a few seconds.
Step 4 — Capture baseline and post-load snapshots
Action: send SIGUSR2 once after warm-up (Snapshot 1), drive the suspect workload — the same autocannon run, a batch job, a specific route — then send SIGUSR2 again (Snapshot 2). Each capture forces a full GC first, so anything present in both is genuinely retained, not transient.
Expected metric: note the file size of each. A Snapshot 2 that is materially larger (say 180 MB vs 90 MB) quantifies the leak’s on-heap footprint before you open a single retainer chain.
Step 5 — Compare offline in DevTools
DevTools path: open chrome://inspect → Open dedicated DevTools for Node → Memory tab → Load (the up-arrow icon) → select Snapshot 1, then Load Snapshot 2. Select Snapshot 2, switch the dropdown to Comparison and choose Snapshot 1 as the baseline. Sort by # Delta descending.
Expected metric: constructors with a large positive # Delta and near-zero # Deleted are your leak candidates. Click one, expand the Retainers pane, and read the chain to its GC root — a module-level Map, a listener array, or a closure. Prefer the Objects allocated between Snapshot 1 and Snapshot 2 filter in the class dropdown: it hides everything present at baseline and shows only what the workload created and failed to release, which typically cuts the candidate list from thousands of constructors to a handful.
When the leak is an array or Map that grows without bound, sort by Retained Size rather than # Delta. A single Array with 40,000 elements shows one object in the count column but dominates retained bytes — the count-based view can hide it entirely. Retained size is the memory that would be freed if that object were collected, which is the number that actually matters for an OOM budget.
Step 6 — Verify the fix
Action: apply the fix, re-run the identical load, capture Snapshot 3. Expected metric: heapUsed returns within ±5% of the Snapshot 1 baseline and the previously-leaking constructor’s # Delta collapses to near zero across two consecutive batches.
Code Patterns & Signatures
Use this first pattern as a lightweight always-on sampler that writes a structured log line, so your APM can alert on a heap floor that ratchets upward before the process is OOM-killed.
// Sample all four memory regions every 15s and flag heap trend.
let lastFloor = 0; // lowest heapUsed seen since last reset
function sampleMemory() {
const m = process.memoryUsage(); // bytes, four regions
const mb = (b) => (b / 1024 / 1024).toFixed(1); // -> MB string
// external + arrayBuffers live OUTSIDE the V8 heap
process.stdout.write(JSON.stringify({
rss: mb(m.rss), // total resident set
heapUsed: mb(m.heapUsed), // live V8 objects
external: mb(m.external), // Buffers, addons
arrayBuffers: mb(m.arrayBuffers), // ArrayBuffer stores
}) + '\n');
if (m.heapUsed > lastFloor) lastFloor = m.heapUsed; // ratchet
}
setInterval(sampleMemory, 15_000).unref(); // don't block exit
Use this second pattern to write a snapshot from inside your own signal handler when you need a filename convention or an S3 upload that the bare --heapsnapshot-signal flag cannot give you.
const v8 = require('v8'); // built-in, no install
const path = require('path');
// Synchronous write: forces a full GC, then serializes the heap.
function writeSnapshot(tag) {
const file = path.join(
process.env.SNAPSHOT_DIR || '/var/tmp', // writable dir
`heap-${tag}-${process.pid}-${Date.now()}.heapsnapshot`
);
v8.writeHeapSnapshot(file); // blocks event loop!
return file; // hand off to uploader here
}
// SIGUSR2 is safe to send via `kill -USR2 <pid>` from the shell.
process.on('SIGUSR2', () => {
const file = writeSnapshot('sigusr2'); // one capture per signal
console.error(`heap snapshot: ${file}`); // stderr, not stdout
});
Use this third pattern on legacy Node.js (pre-12) or when you specifically want the heapdump module’s callback so you can react after the write completes.
// npm i heapdump (needs a native toolchain to compile)
const heapdump = require('heapdump');
// The callback fires once the file is flushed to disk.
function dumpAndReport() {
const name = `/var/tmp/leak-${Date.now()}.heapsnapshot`;
heapdump.writeSnapshot(name, (err, filename) => {
if (err) return console.error('heapdump failed', err);
// Safe point to upload or page an on-call engineer.
console.error(`wrote ${filename}`); // filename echoes name
});
}
process.on('SIGUSR2', dumpAndReport); // async result
Use these Clinic.js invocations so the whole team captures reports the same way instead of memorising flag combinations. Save them as shell aliases or npm scripts.
# Doctor: classify leak vs loop-delay vs CPU (30s load).
clinic doctor \
--on-port 'autocannon -d 30 localhost:3000' \
-- node server.js
# Bubbleprof: map async ops that retain memory (20s).
clinic bubbleprof \
--on-port 'autocannon -d 20 localhost:3000' \
-- node server.js
# HeapProfiler: sampling allocations over the run.
clinic heapprofiler -- node server.js
Symptom-to-Fix Reference
| Symptom | Root Cause | Immediate Action | Measurable Impact |
|---|---|---|---|
heapUsed floor ratchets up each minute |
Objects retained by a module cache or global array | Compare two snapshots by # Delta; add eviction or WeakMap |
Floor stabilises within 2–3 load batches |
external climbs, heapUsed flat |
Buffer or addon memory outside V8 | Audit Buffer pools; snapshot won’t help | external growth flattens; RSS drops |
| Snapshot capture stalls loop 2–3 s | Full GC + serialize on large heap | Capture on a drained instance only | No user-facing latency during dump |
| Doctor flags “memory issue” | Confirmed upward heap trend | Arm --heapsnapshot-signal=SIGUSR2 |
Snapshots pinpoint leaking constructor |
(closure) dominates retained size |
Per-request closure in long-lived scope | Expand Retainers; pass only needed data | Closure retained size falls sharply |
.heapsnapshot too big for DevTools |
Heap over ~1.5 GB serialized | Snapshot smaller instance or raise ceiling | File opens; comparison completes |
FATAL ERROR: Reached heap limit |
Live set exceeds old-space cap | Raise --max-old-space-size then fix leak |
Process stops crashing under load |
Edge Cases & Gotchas
The snapshot freezes the event loop
v8.writeHeapSnapshot() and heapdump both force a full GC and then serialize synchronously. On a 1 GB heap that is a 1–3 s stall during which the process answers no requests and fails health checks. Never wire SIGUSR2 to fire on all instances at once. Pull one node out of the load balancer, snapshot it, then return it — or snapshot a canary instance that mirrors production traffic.
Snapshots miss external and arrayBuffers memory
A .heapsnapshot serializes the V8 object graph. Buffer backing stores, raw ArrayBuffer memory, and native addon allocations live in external/arrayBuffers and appear only as small wrapper objects in the snapshot, not their true byte cost. If Clinic.js Doctor shows external growth, stop reaching for snapshots and audit Buffer pooling and addon free() paths instead — the same distinction that separates managed-heap leaks from the ones covered under Node.js Server-Side Memory Management as a whole.
heapdump needs a native toolchain
The heapdump npm module compiles native code with node-gyp, which requires Python and a C++ compiler in the build image. In minimal Alpine or distroless containers the install fails. On Node.js 12+ there is no reason to take that dependency — the built-in v8.writeHeapSnapshot() and --heapsnapshot-signal flag do the same job with zero install.
Comparing snapshots from different Node.js versions
Constructor names, internal object layouts, and pointer-compression behaviour differ between Node.js major versions. A baseline captured on Node.js 18 and a comparison captured on Node.js 20 will show spurious deltas that are really version differences, not leaks. Always capture both snapshots from the same binary, and record the exact node --version alongside each file.
Clinic.js sampling changes the timing profile
Clinic.js Doctor and HeapProfiler add sampling overhead that slightly slows the process and shifts GC timing. A leak that surfaces at 500 requests/second under normal load may need a longer autocannon duration to reproduce under instrumentation. Treat Clinic as a classifier that tells you what kind of problem you have; use raw process.memoryUsage() sampling for the precise growth-rate numbers you quote in a bug report.
The working directory must be writable
The bare --heapsnapshot-signal=SIGUSR2 flag writes to the process’s current working directory. In a read-only container filesystem or a directory the service user cannot write to, the signal fires, the event loop stalls for the GC, and then the write silently fails — you get the latency cost with no file to show for it. Set an explicit writable path by using your own handler with v8.writeHeapSnapshot('/var/tmp/...') as in the second code pattern, and confirm the mount is writable before you rely on it during an incident.
SIGUSR2 collides with nodemon and other tools
SIGUSR2 is not exclusively yours. nodemon uses it to trigger restarts, and some process managers repurpose it too. If you arm --heapsnapshot-signal=SIGUSR2 under a supervisor that already claims the signal, your kill -USR2 may restart the process instead of dumping the heap — destroying the very state you wanted to capture. In development, either stop nodemon first or choose a different signal such as SIGUSR1 for your handler, and document which signal each environment uses.
Frequently Asked Questions
Is it safe to capture a heap snapshot on a live production process?
Capturing a snapshot forces a full GC and freezes the event loop for the duration of the write — typically 100 ms to several seconds depending on heap size. On a 1 GB heap expect a 1–3 s stall. Trigger it on a drained instance or one pulled from the load balancer, never on every node simultaneously.
Should I use the heapdump module or the built-in --heapsnapshot-signal flag?
On Node.js 12+ prefer the built-in v8.writeHeapSnapshot() or the --heapsnapshot-signal=SIGUSR2 flag — they ship with the runtime and need no native compilation. The standalone heapdump npm module is only worth it on legacy Node.js versions or when you need its programmatic callback API.
Why does Clinic.js Doctor say my problem is external memory, not the heap?
Doctor reads process.memoryUsage(). If external and arrayBuffers climb while heapUsed stays flat, the growth is in Buffers, ArrayBuffer backing stores, or native addons that live outside V8’s managed heap. A .heapsnapshot will not show that memory, so switch to process.memoryUsage() sampling and Buffer pool auditing instead.
How large will a production .heapsnapshot file be and can I open it?
A .heapsnapshot is roughly 1.5–2x the live heap size on disk, so a 700 MB heap produces a 1–1.4 GB JSON file. DevTools can struggle above ~2 GB; raise the memory ceiling by launching Chrome with a larger --js-flags heap, or load the file into a dedicated tool rather than the browser tab you are debugging in.
Related
- Node.js Server-Side Memory Management — the parent guide covering server-side heap, streams, and worker isolation
- heapdump vs Clinic vs node --inspect — a focused comparison of the three capture methods
- Interpreting Heap Snapshots for Memory Analysis — the main section on reading retainer chains and object graphs
- Take & Compare Heap Snapshots in Chrome, Step by Step — the comparison-view technique that applies identically to server snapshots