Session Replay is a Band-Aid (And AI Can Do Better)

Session replay tools like FullStory, LogRocket, and Hotjar have become standard in frontend observability. The pitch is compelling: record everything users do, replay it like a video when something goes wrong. Finally, you can see exactly what happened.

But session replay solves the right problem with the wrong approach. It's a band-aid on a deeper architectural gap. With AI, we can do something better.

The Problem Session Replay Tries to Solve

Session replay exists because of a fundamental visibility gap: when a user reports something broke, we have no idea what actually happened from their perspective.

Think about how debugging usually goes. User says "checkout didn't work." Support asks for details. User says "I clicked the button and nothing happened." That's all you get. No browser version. No error messages. No sequence of actions. No state.

Server logs show the API call failed with a 500, but not why the user made that call, what they saw before it, or how they reacted after. Traditional logging captures the backend but leaves the frontend as a black box.

Session replay promises to fill this gap. Record the DOM, capture mouse movements, replay it all later. See what the user saw.

Why It's a Band-Aid

Session replay works by recording DOM mutations and reconstructing a video-like playback. It captures pixels, not meaning.

Privacy is a nightmare. You're recording everything. Every form field, every personal detail, every sensitive piece of data that appears on screen. Yes, you can mask PII, but masking is fragile. Miss one field and you've recorded someone's SSN. The regulatory overhead alone makes this approach expensive.

Storage explodes. DOM diffs for every session across every user adds up to terabytes fast. You're storing massive amounts of data "just in case" someone reports a bug. The signal-to-noise ratio is terrible.

Finding the needle takes forever. When something goes wrong, you have hours of recordings to search through. Which session matters? Which 10 seconds of that 30-minute session contains the bug? You're watching video hoping to spot the problem.

You can't query it. Session replay is fundamentally a passive consumption experience. You watch. You can't ask "show me all users who rage-clicked on the submit button" or "what percentage of users hesitated on the payment form for more than 30 seconds?" The data exists but it's locked in video format.

It's siloed from your APM. The recording shows what happened on the frontend, but it doesn't connect to your backend traces. You see the user click a button, but correlating that to the specific API call, the database query, the error in your logs—that's manual detective work.

A Different Approach

Now that we have AI, what would we build if we were starting from scratch?

We don't need video. We need understanding. Instead of recording DOM mutations, capture semantically rich events and let AI synthesize the story.

Instead of watching a 45-minute recording hoping to find the bug, you ask: "What happened to user David Hope on the checkout page?"

And AI responds:

David loaded checkout at 14:32. He filled the form correctly but hesitated 45 seconds on the payment section. He clicked "Submit" 6 times in rapid succession—classic rage clicking. The Stripe API returned a 402 (card declined) but your frontend swallowed the error. The user saw nothing. He refreshed twice, then abandoned. Backend trace ID: abc123.

That's not a recording. That's a narrative synthesized from structured data. And it's more useful than any video.

The Instrumentation This Requires

To make this work, you need to capture meaning, not pixels. Instead of "click at coordinates (342, 891)", you capture "user clicked the Submit Order button in the CheckoutForm component."

The events look something like this:

{"event": "user.interaction",
  "action": "click",
  "target": {
    "semantic_name": "Submit Order",
    "component": "CheckoutForm",
    "element": "button#submit"
  },
  "frustration_signals": {
    "rapid_clicks": 6,
    "time_since_page_load": 45000
  },
  "trace_id": "abc123",
  "session_id": "...",
  "user": { "id": "david.hope" }
}

You also capture the signals that session replay shows but doesn't understand:

  • Rage clicks: Multiple rapid clicks on the same element (frustration)
  • Dead clicks: Clicks on non-interactive elements (user expected something)
  • Thrashing: Rapid scrolling up and down (user is lost)
  • Form hesitation: Long pauses on form fields (confusion or context-switching)
  • Error blindness: User continues after an error they didn't notice

The key difference: these become queryable, structured data. Not pixels in a video.

Why OTLP and Elastic?

This approach works naturally with OpenTelemetry and existing APM infrastructure.

OTLP already supports traces, metrics, and logs. Frontend events are just another signal type. Instrument the browser with OTEL's browser SDK plus custom semantic instrumentation, send via OTLP, and land in Elastic alongside your backend traces.

That trace_id field connects the frontend interaction to the backend request. When AI narrates "the Stripe API returned 402," it's pulling that from the linked backend span. No manual correlation.

Elastic gives you the query engine. AI gives you the interface. The same infrastructure you use for APM now powers intelligent session understanding.

What I'm Building

I've started prototyping this. A browser instrumentation layer that captures semantic events with frustration signals. An OTLP pipeline to Elastic. And an AI layer that can answer questions about user sessions.

It's early. But the core thesis is working: structured semantic events + AI synthesis is more powerful than DOM recording + video playback.

I'll write more as this progresses. Building in public, as they say.

Why This Matters

Session replay was a reasonable solution when you couldn't understand what users experienced. Recording everything and playing it back was better than nothing.

But "record everything" is brute force. It trades storage and privacy for visibility. It gives you data, not understanding.

Capture meaning instead. Let AI synthesize. Ask questions instead of watching video. Connect frontend to backend through traces.

That's a different architecture for the same problem.