Step 1 — The initial snapshot
When the SDK loads, it serialises the current DOM tree into a compact JSON representation. Every element becomes a node with its tag, attributes, computed styles (only the inheritable ones — the rest is reconstructed from stylesheets), text content, and a unique numeric ID. The result is typically 30-200KB for a normal page, gzipped down further by the transport layer.
Step 2 — Continuous mutation capture
The browser's MutationObserver API fires whenever the DOM changes. The SDK subscribes with { childList: true, attributes: true, characterData: true, subtree: true } so it sees every node add/remove, attribute change, and text update. Each mutation is encoded as a delta against the prior state (added node, removed node, attribute set, text replaced) and timestamped.
Step 3 — Input + behavioural events
In parallel, the SDK hooks into the input and pointer event stream:
- <strong>Mouse:</strong> mousemove (downsampled to ~20Hz), click, scroll, hover.
- <strong>Keyboard:</strong> keydown, keyup, input — masked for PII fields.
- <strong>Touch:</strong> PointerEvent with pressure + tilt, pinch-zoom via visualViewport.
- <strong>Focus:</strong> focus, blur, focusin, focusout.
- <strong>Viewport:</strong> resize, scroll, visibility-change.
Step 4 — Network + console capture
The SDK proxies fetch and XMLHttpRequest, listens for WebSocket frames, hooks EventSource for Server-Sent Events, and intercepts sendBeacon. Each request gets a timestamp, method, URL, headers (with sensitive ones redacted), and request/response body (size-capped + PII-scrubbed). Console output is captured by monkey-patching console.log/info/warn/error/debug and serialising the args (depth-limited to avoid circular-reference issues).
Step 5 — PII masking at capture time
Before any bytes leave the browser, the SDK runs the captured payload through a PII filter:
- Password fields (<code>input[type=password]</code>) always masked.
- Regex for known PII patterns: email, phone, SSN.
- Luhn validation for credit-card numbers.
- Sensitive headers redacted (Authorization, Cookie, x-api-key).
- Optional: on-device LLM (GLiNER on ONNX/WebGPU when the Relyv extension is installed) for free-form PII like names embedded in support-chat text.
Step 6 — Compression + transport
The event stream is buffered, deflated with CompressionStream (≈65% reduction), and shipped to the backend over an authenticated channel. Most tools batch events with a 50ms flush interval to amortise the network overhead.
Step 7 — Reconstruction in the viewer
The viewer reads the initial snapshot, builds the DOM inside a sandboxed iframe, and then replays the mutation/event timeline at wall-clock time (with optional 0.5x to 8x speed control). The result is the actual page: scripts are stripped (so they don't re-run side-effects), but the DOM is fully inspectable. You can open browser dev tools inside the replay, hover elements, see network calls in the integrated waterfall, and replay console errors.
Why this beats screen video
A screen video is opaque pixels — you cannot ask it questions. A reconstructed DOM is structured data — you can search it, diff it, inspect its computed styles, replay its network calls, and feed it to AI for analysis. Every modern engineering workflow (debugging, test generation, AI summarisation, cross-session diff) needs structured data; a video would have to be re-OCR'd to participate.