Skip to main content
Privacy + security

How PII masking works in session replay.

Compliant session replay needs personal data masked <em>before</em> it leaves the browser. The mechanics: regex on standard patterns, Luhn validation on payment data, attribute selectors for custom fields, and optional on-device ML for free text. Here's the full stack.

TL;DR
On-device PII masking redacts personal data in the browser before the SDK transmits anything. Four mechanisms stack: regex for standard patterns (email, phone, SSN), Luhn validation for credit cards (to avoid false positives), attribute selectors (data-relyv-mask) for custom fields, and optional on-device ML (small LLM like GLiNER) for free-text PII. Server-side masking technically violates GDPR Art. 5(1)(c) data minimisation because the unredacted data was already transmitted to the vendor.

The one-sentence answer

On-device PII masking is the practice of redacting personal data (names, emails, phone numbers, payment data, free text) inside the user's browser before the session-replay SDK transmits anything to the vendor — using regex, validation rules, CSS/attribute selectors, and optionally on-device machine learning. Server-side masking — redacting after the data arrives at the vendor — is not equivalent and does not satisfy GDPR Article 5(1)(c) data minimisation.

Why server-side masking is not compliant

Server-side masking means the unredacted PII was transmitted to the vendor, processed by their infrastructure (even briefly), and only then redacted. Under GDPR Article 5(1)(c), processing must be "adequate, relevant and limited to what is necessary." The moment unredacted PII reaches the vendor, the controller has processed it — even if it's deleted within seconds. The ICO and CNIL both treat this as a data-minimisation violation. Vendors that claim "we mask everything server-side" are putting their customers at legal risk; the customer (data controller) is the party that gets fined.

The four-layer on-device masking stack

A compliant replay SDK stacks four mechanisms in order of specificity:

1. Regex on standard patterns

The first layer catches the obvious: email addresses (RFC 5322 with practical extensions), phone numbers (E.164 plus common national formats), Social Security numbers (US/UK/EU national IDs), and free-text patterns that look like keys (Bearer tokens, AWS access keys, GitHub tokens). The SDK runs these against every text node and attribute value before serialization. False-positive rate is the main tradeoff: an overly-broad regex catches things that aren't PII (a 16-digit order number that happens to look like a credit card).

2. Luhn validation for payment data

Credit-card numbers have a built-in checksum (Luhn algorithm). The SDK runs candidate 13-19 digit strings through Luhn before masking — only strings that pass become "card numbers" and get redacted. This reduces false positives dramatically vs naive regex (which would mask any 16-digit number). Same pattern works for IBAN (mod-97 check), CPF (Brazilian tax ID), and a handful of other validated formats.

3. Attribute and selector-based masking

For fields the regex can't identify by content alone (user-provided names, custom IDs, internal-only fields), the SDK supports explicit declarations:

<input data-relyv-mask placeholder="Internal customer ID">
<div data-relyv-mask>{{ customer.fullName }}</div>

Or a selector list passed to init():

init({
  apiKey: '...',
  mask: ['.pii', '[data-private]', 'input[name="ssn"]'],
});

Engineers add the markers at the source code level — once, in the component template, never duplicated per page.

4. Optional on-device LLM for free text

Free-text inputs (comments, support messages, search queries) can contain PII that regex won't catch — a user typing "my name is Sarah and I live at 123 Main St". A small on-device language model (like GLiNER, ~80 MB, distilled for NER tasks) can run in the browser via ONNX Runtime + WebGPU and identify named entities for masking. Trade-off: 80 MB model load is too heavy for most sites; usually enabled only via a browser extension or on specific high-risk pages (support forms, checkout review).

What good masking looks like in practice

A few patterns that distinguish well-built masking from naïve masking:

  • <strong>Mask before serialization, not after.</strong> The DOM snapshot the SDK takes should already have redacted values — not raw values that get masked in the upload payload.
  • <strong>Mask at multiple levels.</strong> Text nodes, attribute values, console.log arguments, network request bodies, network response bodies, query parameters. PII leaks through every channel.
  • <strong>Preserve structure, not content.</strong> A masked email shows as <code>***@***</code> not <code>***</code>; a masked credit card shows as <code>**** **** **** 1234</code> not redacted entirely. Debugability is preserved without the actual digits.
  • <strong>Mask before Web Worker handoff.</strong> If capture runs in a Web Worker, the data sent to the worker must already be masked — not masked inside the worker after transfer.
  • <strong>Provide an "inspect-the-payload" tool.</strong> Engineers should be able to record a session locally, capture the raw upload payload, and verify nothing leaked. Vendors that can't show you the raw payload are hiding something.
  • <strong>Default to mask-all on sensitive elements.</strong> Password fields, hidden inputs, autocomplete tokens — never captured at all, regardless of selectors.

What gets missed (and how to mitigate)

Even with all four layers, three failure modes are common:

  • <strong>Images of PII.</strong> A user uploaded an ID document or a screenshot containing personal data. The SDK can't parse pixels. Mitigation: don't capture <code>&lt;img src="..."&gt;</code> for user-uploaded content; substitute a placeholder.
  • <strong>iframe content from third parties.</strong> Cross-origin iframes are opaque to the SDK; same-origin iframes need recursive masking. Mitigation: configure the SDK to treat unknown iframe origins as fully-masked.
  • <strong>JavaScript that mutates the DOM after the snapshot.</strong> If a payment widget renders sensitive data into the DOM 200ms after page load, the MutationObserver picks it up and the masking layer needs to run on every mutation — not just the initial snapshot. Mitigation: verify the mutation-handler runs masking per change.

How Relyv implements masking

Relyv's SDK stacks all four layers by default:

  • <strong>Regex + Luhn:</strong> Email, phone (E.164 + common national formats), SSN (US/UK/EU), credit card (with Luhn validation), Bearer/JWT/AWS/GitHub tokens — masked at every capture point (text, attributes, console, network).
  • <strong>Attribute selectors:</strong> <code>data-relyv-mask</code> attribute marks custom fields. <code>mask: [...]</code> at init takes a CSS selector list. Password/hidden fields excluded outright.
  • <strong>On-device ML:</strong> GLiNER (~80 MB) optionally loaded via the browser extension for free-text PII detection. Off by default; opt-in per workspace.
  • <strong>Inspect-the-payload tool:</strong> Dashboard → Settings → "Inspect raw payload" shows the next captured session's upload before transmission, so you can verify masking worked on a real flow.
  • <strong>Web Worker handoff:</strong> Masking runs on the main thread <em>before</em> events transfer to the capture worker. The worker never sees raw PII.

Frequently asked questions

Does on-device masking slow down the page?

Negligibly. Regex + Luhn validation runs in microseconds per text node — the cost is dwarfed by the page render itself. Attribute-selector masking has zero runtime cost (checked once per element at capture). The on-device LLM layer (GLiNER) is the only heavyweight option and is opt-in for that reason.

What's the difference between masking and excluding?

Masking redacts the value but keeps the structure (a password field is captured as a 12-char string of asterisks). Excluding skips the element entirely (the password field doesn't appear in the replay at all). Passwords + hidden + autocomplete fields are typically excluded; visible fields with PII content are masked.

Can I mask network request bodies?

Yes — a good replay SDK runs the same masking pipeline over fetch/XHR request bodies and response bodies before serialization. JWT tokens in Authorization headers, PII in JSON payloads, query parameters in URLs — all need to flow through the masker. Some vendors only mask the visible DOM; verify your tool of choice masks the network layer too.

How do I test that masking actually works?

Three checks: (1) record a session that includes a credit-card form and inspect the raw upload payload — the digits should be redacted. (2) Open browser dev tools, look at the SDK's network requests, verify no plaintext PII appears. (3) Use the vendor's "inspect raw payload" tool if they provide one. If they don't — that's a red flag.

Is there a CCPA equivalent for masking?

CCPA doesn't explicitly require on-device masking the way GDPR Art. 5(1)(c) implies, but the right-to-delete (CCPA §1798.105) is much easier to satisfy when the data was never stored unmasked in the first place. The same on-device masking that satisfies GDPR data minimisation also makes CCPA compliance simpler.

What about HIPAA?

HIPAA-covered entities (healthcare providers + business associates) need a signed BAA from the replay vendor and PHI masked on-device. Most session-replay vendors don't sign BAAs and aren't HIPAA-compliant by default — check before deploying in a healthcare context. The on-device masking stack described here is the technical prerequisite, but the legal contract (BAA) is also required.

Ready to record your first session?

Free 1,000 sessions/mo. No credit card. Cancel anytime, no refunds.