SSR Heap Exhaustion & Per-Request Memory

Server-side rendering turns every incoming HTTP request into a burst of short-lived heap allocation: a fresh component tree, a data layer populated from your API, a serialised markup buffer, and the framework scaffolding that ties them together. In isolation a single render is cheap. Under concurrency the picture changes — each in-flight request keeps its entire object graph alive until the response flushes, so peak heap scales with how many renders overlap. This guide sits under Node.js Server-Side Memory Management and explains where per-request memory comes from, how module-scope caches and singletons turn transient allocation into permanent retention, and how to bound both. For the runtime limits these renders push against, see Node.js memory limits & out-of-heap errors; for the Next.js-specific patterns, jump to fixing Next.js SSR leaks.


Conceptual Grounding

An SSR request has a strict memory lifecycle. When the handler runs, V8 allocates a request-scoped object graph in the young generation: the virtual DOM or template tree, a store or query cache seeded with fetched data, context providers, and — for string renderers — a growing markup buffer. The moment the response is written and the request object is dereferenced, that graph becomes unreachable and is eligible for collection. In a healthy server it never survives long enough to be promoted to old space; the scavenger reclaims it within a GC cycle or two.

Three things break this clean lifecycle. Concurrency multiplies the live set: at concurrency N, roughly N request graphs are simultaneously reachable, so a 6 MB render at 200 in-flight requests pins around 1.2 GB even though no single request leaks. Module-scope retention promotes request data to a longer life: a cache Map, a memo table, or an accidental array push at module scope keeps references to per-request objects after the response flushes, so the object is promoted to old space and never collected. Singleton state leaks across requests: a store, an HTTP client, or a renderer instance created once at module load and mutated per request accumulates data from every request that touches it.

The distinction matters because the fixes are opposite. Per-request pressure is bounded by capping concurrency and shrinking peak render size; cross-request retention is bounded by moving state into request scope and putting hard limits on caches. Confusing the two — raising --max-old-space-size when the real problem is an unbounded singleton — only delays the crash.

There is also a timing subtlety worth internalising. Young-generation collection is fast and frequent, so a request graph that dies before promotion is essentially free. Promotion happens when an object survives two scavenges, which is exactly what a long render, a slow upstream API, or a module-scope reference forces. Once request data lands in old space, only a major mark-sweep collection can reclaim it, and those pauses grow with old-space occupancy. This is why an SSR server that looks healthy at 10 requests per second can degrade non-linearly at 100: not only is the live set larger, but more request objects survive long enough to be promoted, and the collector spends progressively longer in stop-the-world pauses. The diagram below maps how three concurrent requests share module scope while each carries its own request graph.

SSR Per-Request vs Module-Scope Memory Three concurrent requests each allocate a request-scoped object graph that is freed after the response flushes, while a shared module-scope cache and a singleton store retain references across requests and grow without bound. Concurrent requests (per-request scope) Request A graph vDOM · store · buffer freed after flush Request B graph vDOM · store · buffer freed after flush Request C graph vDOM · store · buffer freed after flush writes reference Module scope (lives for process lifetime) cache Map — unbounded retains request data → grows each request singleton store — mutated accumulates state across requests Old space promotion → heap limit breach FATAL ERROR: JS heap out of memory

Diagnostic Workflow

Follow these steps in order. The goal is to classify the growth as per-request (self-clearing) or cross-request (retained) before touching any code, because the two demand opposite fixes.

Step 1 — Record a single-render footprint. Wrap the render handler with process.memoryUsage() before and after the response and log the heapUsed delta. Run one request in isolation.

Expected output: A stable per-request delta such as heapUsed +6.1 MB that returns to baseline after the next GC cycle. This is your per-render budget.

Step 2 — Apply steady concurrent load. Drive the SSR route with a load generator at fixed concurrency, for example:

# 50 concurrent connections, 3 minutes, against the SSR route
npx autocannon -c 50 -d 180 \
  http://localhost:3000/product/42

Sample the process while it runs:

# print RSS + heapUsed once per second during the load run
node --expose-gc -e '
setInterval(() => {
  const m = process.memoryUsage();       // live counters
  const mb = n => (n / 1048576).toFixed(1); // bytes → MB
  console.log("rss", mb(m.rss),           // resident set
              "heapUsed", mb(m.heapUsed)); // live JS heap
}, 1000);
' &

Expected output: rss and heapUsed climb during ramp-up, then plateau if memory is purely per-request.

Step 3 — Force GC between bursts to classify the growth. Launch Node with --expose-gc, run a load burst, stop it, call global.gc(), and read heapUsed. Repeat for several bursts.

Expected output: Per-request pressure returns heapUsed to baseline after each global.gc(). Cross-request retention leaves heapUsed ratcheting upward — each post-GC reading is higher than the last.

Step 4 — Capture and compare heap snapshots under load. Open chrome://inspect, attach to the Node process started with --inspect, and go to DevTools → Memory → Heap Snapshot. Take one snapshot mid-load, run more traffic, take a second, then switch the view dropdown to Comparison and sort by the Delta column.

Expected output: A per-request server shows near-zero delta for request constructors. A retaining server shows a positive delta on a specific constructor — often a plain Object, Array, or your store class — whose Retainers path leads to a module-scope Map or singleton.

Step 5 — Bound the allocation. Move singleton request state into request scope, cap module-scope caches with an LRU plus TTL, and switch string rendering to a streaming renderer to lower peak buffered bytes. Re-run Steps 2–4 and confirm post-GC heapUsed is flat across bursts. For the heap-limit mechanics behind these breaches, see why Node.js hits the heap limit.


Code Patterns & Signatures

Pattern 1: A module-scope cache that retains every request. Use this to spot the single most common SSR retention bug — an unbounded cache keyed on request-specific data.

// --- LEAKY: cache grows one entry per unique request ---
const renderCache = new Map(); // module scope: never cleared

export function handleRender(req) {
  const html = renderApp(req.url);   // per-request work
  renderCache.set(req.url, html);    // retains html forever
  return html;                       // every unique URL adds ~KBs
}

// --- FIXED: bounded LRU with a size cap and TTL ---
import { LRUCache } from 'lru-cache';

const renderCache = new LRUCache({
  max: 500,              // hard ceiling on entries
  ttl: 1000 * 60,        // evict after 60 s
  maxSize: 64 * 1024 * 1024, // ~64 MB cap on stored bytes
  sizeCalculation: (v) => v.length, // measure each entry
});

export function handleRender(req) {
  const hit = renderCache.get(req.url); // may be undefined
  if (hit) return hit;                  // reuse without alloc
  const html = renderApp(req.url);
  renderCache.set(req.url, html);       // eviction bounded
  return html;
}

Heap impact: the leaky Map grows without limit — a heap snapshot shows (map) retained size climbing across the comparison. The LRU holds heapUsed flat once it reaches its 64 MB ceiling.

Pattern 2: A singleton store leaking state across requests. Use this to catch data from one user bleeding into the next request and inflating retained size over time.

// --- LEAKY: one store instance shared by all requests ---
const store = createStore(); // created once at module load

export function handleRender(req) {
  store.dispatch(setUser(req.user)); // mutates shared state
  return renderApp(store);           // never reset → grows
}

// --- FIXED: a fresh store per request, freed after flush ---
export function handleRender(req) {
  const store = createStore();       // request-scoped
  store.dispatch(setUser(req.user)); // isolated per request
  const html = renderApp(store);
  return html;                       // store unreachable now
}

Heap impact: the shared store accumulates actions and derived state from every request; the per-request store is collected as soon as the response flushes, so retained size per request drops to zero after GC.

Pattern 3: Streaming instead of buffering the whole document. Use this when peak per-request memory — not total allocation — is what breaches the heap limit.

// --- HIGH PEAK: full markup string built before sending ---
import { renderToString } from 'react-dom/server';

export function handleRender(req, res) {
  const html = renderToString(<App url={req.url} />);
  // entire document lives in the heap at once
  res.end(`<!doctype html>${html}`);
}

// --- LOW PEAK: stream chunks with socket backpressure ---
import { renderToPipeableStream }
  from 'react-dom/server';

export function handleRender(req, res) {
  const { pipe } = renderToPipeableStream(
    <App url={req.url} />,
    {
      onShellReady() {
        res.setHeader('content-type', 'text/html');
        pipe(res); // backpressure caps buffered bytes
      },
    }
  );
}

Heap impact: renderToString peak memory scales with full document size (megabytes for large pages); renderToPipeableStream keeps peak buffered bytes near the stream highWaterMark, so peak heapUsed per request falls sharply even though total bytes rendered is unchanged.

Pattern 4: Guarding concurrency so the live set has a ceiling. Use this to put a hard cap on how many request graphs can be reachable at once.

// bound in-flight renders so peak live set is predictable
let inFlight = 0;
const MAX = 40; // ceiling on concurrent renders

export function handleRender(req, res, next) {
  if (inFlight >= MAX) {
    res.statusCode = 503;        // shed load early
    res.setHeader('retry-after', '1');
    return res.end('busy');      // avoid heap blowout
  }
  inFlight++;                    // reserve a slot
  renderApp(req)
    .then((html) => res.end(html))
    .finally(() => { inFlight--; }); // release slot
}

Heap impact: with MAX = 40 and a 6 MB render, the reachable request set is capped near 240 MB regardless of traffic spikes, keeping the process inside its --max-old-space-size budget instead of crashing.


Symptom-to-Fix Reference

Symptom Root Cause Immediate Action Measurable Impact
RSS climbs only under load, flat when idle N in-flight renders multiply live set Cap concurrent renders with a queue Peak RSS bounded to MAX × render size
heapUsed ratchets up after each GC Module-scope cache never evicts Swap Map for LRU with max + TTL Post-GC heapUsed flat across bursts
Data from one user shows in next render Singleton store mutated per request Create store per request Retained size per request drops to 0
OOM crash on large pages, fine on small renderToString buffers whole doc Switch to renderToPipeableStream Peak per-request heap cut to highWaterMark
Old space grows, young gen stable Request objects promoted via cache Snapshot Comparison, cut retainer heapUsed returns to baseline post-GC
Slow leak over hours, no spike Array push at module scope Move array to request scope Delta on Array constructor goes to 0
GC pauses lengthen as uptime grows Old space near heap limit Bound live set, size flag to it Major GC pause drops back under 50 ms

Edge Cases & Gotchas

Global Fetch Cache and Data-Layer Deduping

Framework data layers (React Query, Apollo, Next.js fetch caching) often install a request-deduping cache. If that cache is created at module scope rather than per request, every SSR fetch result is retained for the process lifetime. The fix is to create the query client or cache inside the request handler so it is garbage-collected with the rest of the request graph. Confirm by filtering the heap snapshot for the cache constructor and checking that its retained size no longer grows across the Comparison view. The caching versus memory bloat guide covers this trade-off in depth.

Streaming Without Consuming Backpressure

Switching to renderToPipeableStream only lowers peak memory if the destination applies backpressure. If you buffer the stream into a string before sending — or pipe into a writable that never signals drain — you reinstate the full-document peak you were trying to avoid. Always pipe directly into the response socket and let its highWaterMark throttle the renderer. Verify by watching peak heapUsed per request drop; if it stays at the renderToString level, backpressure is not reaching the renderer.

Closures Capturing the Request Object

An event listener, timer, or promise callback registered during a render can capture the entire req/res pair in its closure scope. Because the response object references buffers and socket state, one stray listener pins the whole request graph past flush. Detach every per-request listener before the response ends, ideally via an AbortController tied to request completion. The mechanics mirror browser-side closure memory leaks, where a small function reference anchors a large retained graph.

Large Serialised State in the HTML Payload

SSR frameworks embed the data layer’s state as a JSON blob in the markup for client hydration. A store bloated with unfiltered API responses produces a multi-megabyte inline script, and that string lives in the heap at peak render. Trim the serialised state to only what hydration needs before rendering; measure the size of the __NEXT_DATA__ or equivalent payload and treat anything over a few hundred KB as a retention smell.

Per-Request Timers and Intervals Surviving the Response

A setInterval or long setTimeout started during a render keeps its callback — and everything the callback closes over — reachable from the timer subsystem, which is a GC root. If the response flushes before the timer fires, the request graph is pinned until the timer resolves or is cleared. This is easy to introduce accidentally through a polling helper or a retry timer in the data layer. Always store the timer handle in request scope and clear it when the response ends. Verify by taking a snapshot under load and checking that no request-scoped constructors are retained by a Timeout object in the Retainers panel.

Raising the Heap Limit Instead of Bounding Memory

Bumping --max-old-space-size=4096 is the wrong first move for unbounded growth: it enlarges the working set the collector must scan, lengthening major GC pauses, and only postpones the OOM. Reserve the flag for a genuinely larger but bounded working set — for example, after you have capped concurrency to 40 and measured a stable 3 GB live set. The broader treatment of these limits lives in the Node.js memory limits & out-of-heap errors guide.


Frequently Asked Questions

Why does my SSR heap grow only under concurrent load?

Each in-flight request holds its own object graph alive until its response flushes. At concurrency N the live set is roughly N times a single render’s retained size, so a per-request footprint that is harmless in isolation multiplies into heap exhaustion once N in-flight renders overlap. The growth vanishes at low concurrency because requests complete and their graphs are collected before the next arrives. The practical test is to force GC between load bursts: if heapUsed returns to baseline each time, the memory is per-request and the fix is bounding concurrency, not chasing a leak.

Is renderToPipeableStream more memory-efficient than renderToString?

Yes for peak memory, which is the figure that drives heap-limit breaches. renderToString materialises the entire markup string in the heap before a single byte is sent, so peak per-request memory scales with total document size. renderToPipeableStream emits chunks as they render and lets the response socket apply backpressure, so peak buffered bytes stay bounded by the stream highWaterMark rather than the full page. Total bytes allocated over the request is similar, but the peak — the number that decides whether V8 hits its old-space ceiling — is far lower.

How do I tell a per-request leak from a module-scope cache leak?

Force a full GC between load bursts with --expose-gc and global.gc(). If post-GC heapUsed returns to its baseline after each burst, the memory is per-request and self-clearing; the fix is bounding peak concurrency or peak render size. If post-GC heapUsed ratchets upward burst over burst and never falls, something at module scope — a cache Map, a growing array, or a mutated singleton — is retaining request data across requests. A DevTools → Memory → Heap Snapshot → Comparison view will name the retaining constructor and its path to a GC root.

Should I cap concurrency or raise --max-old-space-size?

Cap concurrency first. Raising --max-old-space-size only buys headroom for a genuinely larger working set; if per-request memory is unbounded it merely delays the crash and lengthens major GC pauses because the collector scans a bigger heap. Bound in-flight renders with a queue or a load-balancer concurrency limit so the live set has a hard ceiling, then size the old-space flag to that ceiling plus a safety margin. The flag is a sizing decision made after you have bounded the working set, not a substitute for bounding it.