Auto-Generate Playwright Tests From Session Replay

Generating Playwright tests from real user sessions.

Updated 2026-05-26 Relyv.ai Editorial Team

Every captured session is a sequence of user actions and observed state. With the right selector-extraction + assertion-inference pipeline, that sequence becomes a runnable Playwright spec — automatically. Here's the mechanics, the code, and the limits.

The one-sentence answer

Auto-generating a Playwright test from a session replay means walking the captured interaction stream (clicks, keystrokes, navigations, network responses), extracting a stable selector for each interacted element, and emitting a .spec.ts file that re-runs the same sequence with Playwright's API plus assertions inferred from the observed state. The recording is the source of truth; the generated spec is a deterministic re-execution of it.

Why generate tests from sessions instead of writing them

Hand-authoring end-to-end tests has two well-known problems: they drift from real user behaviour (engineers test the happy path they imagine; users hit the path no one thought to test) and they take time the QA team usually doesn't have. Generated tests fix both. Every real user session that contained a bug becomes a regression test for that bug. Every important user flow you watched in replay can become a test in one click. The test corpus grows alongside the product, grounded in production behaviour.

What a session replay captures that a Playwright test needs

A good replay SDK captures the full set of inputs Playwright needs to faithfully replay:

Click events with target selectors (DOM path + nearby data-* attributes + accessible role + text content).
Keyboard input per element (the actual values typed, masked or unmasked depending on PII rules).
Scroll position per scrollable container at each timestamp.
Navigation events (URL changes, including soft-routed History API pushes).
Network calls with request + response status + body shape (so assertions can verify "the POST /checkout returned 200").
DOM mutations with timestamps (so assertions can verify "the success banner appeared after the click").
Console errors (so the generated test can assert no error fired during the flow).

The five-step generation pipeline

How a captured session becomes a runnable spec, step by step:

1. Filter the action stream

A raw capture has hundreds of events per second (mouse moves, scroll deltas, micro-mutations). The generator first filters to the meaningful actions — typically the click/keystroke/navigation/submit events. Mouse moves and scrolls are usually dropped unless they cross a meaningful threshold (scrolled past a fold, hovered a tooltip target for >500ms).

2. Extract a stable selector per interacted element

This is the hardest step. A click event has a target element — but the DOM path that worked at capture time may not work at test time (random IDs, CSS-in-JS class names, virtualised lists). A good selector-extraction algorithm tries strategies in order of stability:

data-testid attribute (most stable; intentionally added by engineers).
role + accessible name (semantic + screen-reader-aware; survives CSS refactors).
text content for buttons, links (e.g. <code>getByRole('button', { name: 'Add to cart' })</code>).
label-based selector for form inputs (<code>getByLabel('Email')</code>).
nth-of-type + parent role as fallback (least stable; flag for review).

3. Emit the Playwright actions

Each filtered event maps to a Playwright API call: click → page.click(); keystroke sequence on a single input → page.fill(); navigation → page.goto() (or assert URL after a click for soft-routed nav); scroll past fold → page.evaluate(scrollTo). Wait conditions are inserted automatically: after a click that triggered a network call, the generator inserts await page.waitForResponse() against the captured URL pattern.

4. Infer assertions from observed state

The captured state at each step grounds assertions. After a successful checkout, the captured DOM showed an order-confirmation panel — the generator emits await expect(page.getByRole('heading', { name: /order confirmed/i })).toBeVisible(). After a form submission, the captured network call returned 200 — the generator asserts the status. After an interaction, no console error fired — the generator wires page.on('pageerror', ...) to fail on errors.

5. Emit the .spec.ts file

A test file gets written with the session's metadata as comments at the top (URL, capture date, user-agent, replay-link) and the generated actions inside a test('description', async ({ page }) => { ... }) block. The description comes from the session's flow summary (AI-generated from the action stream) so future engineers can read it as documentation.

What the generated test looks like — an example

A captured checkout-flow session for an e-commerce site produces something like this:

// auto-generated from session 0x7a2f...e91d (2026-05-25T14:33:12Z)
// flow: search → add to cart → checkout → success
// captured: https://staging.shop.example.com
// replay: https://relyv.ai/s/0x7a2f...e91d
import { test, expect } from '@playwright/test';

test('checkout: blue running shoes', async ({ page }) => {
  await page.goto('/products');

  await page.getByPlaceholder('Search products').fill('blue running shoes');
  await page.getByRole('button', { name: 'Search' }).click();

  await page.getByRole('link', { name: /Nimbus 7.*Blue/ }).click();
  await page.getByRole('button', { name: 'Add to cart' }).click();

  await page.getByRole('link', { name: 'Checkout' }).click();
  await page.getByLabel('Card number').fill('4242424242424242');
  await page.getByLabel('Expiry').fill('12/27');
  await page.getByLabel('CVC').fill('123');

  const checkoutResp = page.waitForResponse(/\/checkout$/);
  await page.getByRole('button', { name: 'Pay now' }).click();
  expect((await checkoutResp).status()).toBe(200);

  await expect(page.getByRole('heading', { name: /order confirmed/i })).toBeVisible();
});

Limits + what manual review still has to catch

Generation isn't a free lunch. The generator should always flag, not silently include:

nth-of-type selectors — fragile across UI changes; engineer should add a <code>data-testid</code> instead.
Time-sensitive assertions — a captured network call that returned in 80ms might take 800ms in CI; explicit waits beat implicit timing.
Hard-coded values — captured credit-card numbers, emails, addresses. Generator masks during capture; review prompts the engineer to template them.
Cross-session state — a flow that depended on session-local state (logged-in user, populated cart) needs setup the generator can't infer.
Visual regressions — Playwright generation captures functional behaviour; visual-regression assertions (screenshot comparisons) need a separate step.

How Relyv does it

Relyv generates Playwright + Cypress specs from any captured session, with the selector-extraction strategy described above (data-testid → role → text → label → fallback). Generated specs are previewed before commit so the engineer can edit, run locally, and add to the test suite via one-click PR. The full pipeline lives in /features/playwright-test-generation; the underlying capture event stream is documented in how DOM replay works.

Frequently asked questions

Can session replay really auto-generate working Playwright tests?

Yes, with caveats. The generator emits a runnable .spec.ts that replays the captured actions and asserts on observed state. The generated test works on the first run when the selectors it picked are stable. About 70-80% of generated specs run as-is in our internal benchmarks; the rest need an engineer to swap a fragile selector or template a hard-coded value. Either way, the engineer starts from a near-complete test, not a blank file.

What selectors does the generator pick?

In priority order: data-testid attribute (if present), Playwright role-based selectors (getByRole with accessible name), text content for buttons/links, label-based selectors for form inputs (getByLabel), and CSS path with nth-of-type as a fallback. The fallback case is always flagged for engineer review.

Does this work for React, Vue, Angular, Svelte apps?

Yes — the generator operates on DOM events and the rendered HTML, not framework internals. Single-page apps with virtualised lists, Suspense boundaries, or animation libraries work fine because the capture is at the DOM level. CSS-in-JS that emits random class names is the main edge case; the generator falls back to role/text/label selectors which avoid the class-name dependency.

Can generated tests run in CI?

Yes — generated specs are standard Playwright .spec.ts files. They run in any CI that runs Playwright (GitHub Actions, GitLab CI, CircleCI, etc.). The generator emits a config-aware spec (it uses the project's existing playwright.config.ts if one exists in the same repo) so test isolation, retries, and reporters work out of the box.

What about Cypress?

Relyv emits Cypress specs as an alternative output. The selector-extraction logic is the same; the API maps differently (cy.get/cy.click/cy.type/cy.intercept instead of page.locator/page.click/page.fill/page.waitForResponse). Pick the output that matches your existing test suite.

How is this different from a record-and-playback tool like Selenium IDE?

Record-and-playback tools require a human to manually record the flow in a special browser, and the resulting tests are typically brittle CSS-path selectors. Session-replay-based generation works on every captured real user session in production — no recording session needed — and uses smarter selectors (role + label + text) that survive UI refactors. The unit of "what gets tested" goes from "what QA bothered to record" to "what users actually do."

Does the generator handle authentication?

For captured sessions where login happened during the session, the auth flow appears as part of the generated spec. For tests that depend on a pre-authenticated state, you can either (a) generate the auth flow once and reuse via Playwright's storageState, or (b) wrap generated tests in your existing auth fixture. The generator emits a comment flagging where auth is assumed.