How Does DOM Replay Work? Technical Deep-Dive

Step 1 — The initial snapshot

When the SDK loads, it serialises the current DOM tree into a compact JSON representation. Every element becomes a node with its tag, attributes, computed styles (only the inheritable ones — the rest is reconstructed from stylesheets), text content, and a unique numeric ID. The result is typically 30-200KB for a normal page, gzipped down further by the transport layer.

Step 2 — Continuous mutation capture

The browser's MutationObserver API fires whenever the DOM changes. The SDK subscribes with { childList: true, attributes: true, characterData: true, subtree: true } so it sees every node add/remove, attribute change, and text update. Each mutation is encoded as a delta against the prior state (added node, removed node, attribute set, text replaced) and timestamped.

Step 3 — Input + behavioural events

In parallel, the SDK hooks into the input and pointer event stream:

Mouse: mousemove (downsampled to ~20Hz), click, scroll, hover.
Keyboard: keydown, keyup, input — masked for PII fields.
Touch: PointerEvent with pressure + tilt, pinch-zoom via visualViewport.
Focus: focus, blur, focusin, focusout.
Viewport: resize, scroll, visibility-change.

Step 4 — Network + console capture

The SDK proxies fetch and XMLHttpRequest, listens for WebSocket frames, hooks EventSource for Server-Sent Events, and intercepts sendBeacon. Each request gets a timestamp, method, URL, headers (with sensitive ones redacted), and request/response body (size-capped + PII-scrubbed). Console output is captured by monkey-patching console.log/info/warn/error/debug and serialising the args (depth-limited to avoid circular-reference issues).

Step 5 — PII masking at capture time

Before any bytes leave the browser, the SDK runs the captured payload through a PII filter:

Password fields (<code>input[type=password]</code>) always masked.
Regex for known PII patterns: email, phone, SSN.
Luhn validation for credit-card numbers.
Sensitive headers redacted (Authorization, Cookie, x-api-key).
Optional: on-device LLM (GLiNER on ONNX/WebGPU when the Relyv extension is installed) for free-form PII like names embedded in support-chat text.

Step 6 — Compression + transport

The event stream is buffered, deflated with CompressionStream (≈65% reduction), and shipped to the backend over an authenticated channel. Most tools batch events with a 50ms flush interval to amortise the network overhead.

Step 7 — Reconstruction in the viewer

The viewer reads the initial snapshot, builds the DOM inside a sandboxed iframe, and then replays the mutation/event timeline at wall-clock time (with optional 0.5x to 8x speed control). The result is the actual page: scripts are stripped (so they don't re-run side-effects), but the DOM is fully inspectable. You can open browser dev tools inside the replay, hover elements, see network calls in the integrated waterfall, and replay console errors.

Why this beats screen video

A screen video is opaque pixels — you cannot ask it questions. A reconstructed DOM is structured data — you can search it, diff it, inspect its computed styles, replay its network calls, and feed it to AI for analysis. Every modern engineering workflow (debugging, test generation, AI summarisation, cross-session diff) needs structured data; a video would have to be re-OCR'd to participate.

Frequently asked questions

Does the replay re-execute the page's JavaScript?

No. Scripts are stripped at capture time so they don't fire side-effects on replay (analytics pings, API calls). The DOM mutations the original scripts made are replayed instead, so the visible result is identical without re-running the logic.

How accurate is the reconstruction?

Pixel-accurate in the common case. Edge cases that can drift: canvas/WebGL content (some tools sample frames, others render placeholders), animated GIFs (timing may differ), and CSS-in-JS libraries that generate class names at runtime (the SDK must capture stylesheets, not just selectors).

What about iframes and shadow DOM?

Both are supported by mature SDKs. Same-origin iframes are recorded as nested snapshots; cross-origin iframes appear as opaque placeholders for security reasons. Shadow DOM is traversed via the open shadow-root API; closed shadow roots can't be captured (browser-enforced).

How do CSS animations replay?

CSS animations replay at their natural duration because the stylesheet is captured along with the DOM. JS-driven animations (e.g., GSAP, Framer Motion) replay because the DOM mutations they produced are in the event stream.

What happens to fetch/XHR responses?

Captured request and response payloads are shown in the integrated network waterfall. The replay does not actually re-fetch — it shows you what the original session received.

Can I export a session for offline analysis?

Yes, on tools that support it. Relyv exports every session as a single self-contained .html file with the viewer embedded inline — replay works in any browser with no internet connection. Other tools keep replays behind their own UI.

How big is a typical session?

After compression, 200KB-2MB for a typical 5-minute session. Heavy DOM mutation (dashboards with constant updates) or large network responses can push that higher; tools usually cap response-body capture at 100-500KB per request.

Why don't all session-replay tools use DOM replay then?

Video-based replay is simpler to engineer (you just need MediaRecorder or canvas sampling) and easier to integrate (no DOM-serialisation edge cases). DOM replay is more useful but harder to build, which is why the older free/cheap tools tend to be video-based and the engineering-grade tools are DOM-based.

How does DOM replay actually work?

Step 1 — The initial snapshot

Step 2 — Continuous mutation capture

Step 3 — Input + behavioural events

Step 4 — Network + console capture

Step 5 — PII masking at capture time

Step 6 — Compression + transport

Step 7 — Reconstruction in the viewer

Why this beats screen video

Frequently asked questions

Ready to record your first session?

Step 1 — The initial snapshot

Step 2 — Continuous mutation capture

Step 3 — Input + behavioural events

Step 4 — Network + console capture

Step 5 — PII masking at capture time

Step 6 — Compression + transport

Step 7 — Reconstruction in the viewer

Why this beats screen video

Frequently asked questions

Related reading

Ready to record your first session?