Building AI-Native Session Replay with OpenTelemetry
In my last post, I argued that traditional session replay is a band-aid: it records pixels instead of meaning, creates privacy nightmares, and can't be queried. I proposed an alternative: capture semantic events, detect frustration signals, and let AI synthesize understanding.
I've built the first version. Here's how it works.
Code: github.com/davidgeorgehope/sessionreplay
The Architecture: Logs + Traces
Two complementary OTLP signals:
- Logs for discrete events: clicks, frustration signals, navigation, errors
- Traces for business operations with duration: checkout flows, API calls
Why not just traces? Long-lived spans are a known unsolved problem in OpenTelemetry. Spans remain in memory until end() is called. If the browser tab closes before the span ends, that data is lost. The beforeunload event is unreliable (Chrome deprecated unload entirely).
Logs solve this elegantly: events send immediately via OTLP, no lifecycle management needed. A user click is captured the moment it happens, not when some parent span eventually closes.
What Gets Captured
Every event includes session and user context automatically:
{
"body": "user.click",
"attributes": {
"session.id": "sess_abc123",
"session.sequence": 42,
"user.id": "david",
"user.email": "[email protected]",
"event.category": "user.interaction",
"target.semantic_name": "Submit Order",
"target.element": "button",
"page.url": "https://example.com/checkout"
}
}
Notice target.semantic_name. Instead of "click at coordinates (342, 891)", we capture "Submit Order button". Meaning, not pixels.
Frustration Detection
The agent automatically detects frustration signals that session replay shows but doesn't understand:
Rage clicks: Multiple rapid clicks on the same element. The user expected something to happen and it didn't. Classic broken button scenario.
{
"body": "user.frustration.rage_click",
"attributes": {
"frustration.type": "rage_click",
"frustration.score": 0.8,
"frustration.click_count": 6,
"frustration.duration_ms": 850,
"target.semantic_name": "Submit Order"
}
} Dead clicks: Clicks on non-interactive elements. The user thought something was clickable but it wasn't. Often indicates confusing UI.
Thrashing: Rapid scroll direction changes. The user is lost, searching for something they can't find.
These become queryable structured data. In Elastic, you can ask "show me all users with frustration score above 0.7 in the last hour" and get actual answers.
Parent-Child Spans for Business Flows
For operations with duration and hierarchy, traces are the right tool. Here's how to create a checkout flow with child spans:
import { trace, context, getTracer } from '@anthropic/session-replay-browser-agent';
const tracer = getTracer();
// Parent span (becomes Transaction in Elastic APM)
const checkoutSpan = tracer.startSpan('checkout.buy_now');
// Execute child operations within parent context
context.with(trace.setSpan(context.active(), checkoutSpan), () => {
// Child span 1: validate cart
const validateSpan = tracer.startSpan('checkout.validate_cart');
validateSpan.setAttribute('cart.items', 3);
validateSpan.end();
// Child span 2: process payment
const paymentSpan = tracer.startSpan('checkout.process_payment');
paymentSpan.setAttribute('payment.method', 'credit_card');
// ... payment logic
paymentSpan.end();
// Child span 3: confirm order
const confirmSpan = tracer.startSpan('checkout.confirm_order');
confirmSpan.end();
});
checkoutSpan.end();
In Elastic APM, this shows as a Transaction (checkout.buy_now) with child Spans nested underneath. You can see exactly where time was spent.
Trace-Log Correlation
Logs emitted within an active trace context automatically include trace.id and span.id:
// Inside a trace context
emitSessionEvent({
name: 'payment.started',
attributes: { 'payment.method': 'credit_card' }
});
// The log automatically gets:
// {
// "trace.id": "32f9e17f...",
// "span.id": "9ac7c0be...",
// ...
// } This means when you query logs in Elastic, you can jump directly to the related trace. When you're looking at a trace, you can see all the user events that happened during that operation. No manual correlation needed.
Using It in Your App
Installation:
npm install @anthropic/session-replay-browser-agent Basic setup:
import {
createSessionReplayProvider,
createSessionLogProvider,
setUser,
enableAllInstrumentations
} from '@anthropic/session-replay-browser-agent';
// Initialize providers
createSessionReplayProvider({
serviceName: 'my-app',
endpoint: 'https://your-elastic.cloud:443/v1/traces',
apiKey: 'your-api-key',
});
createSessionLogProvider({
serviceName: 'my-app',
endpoint: 'https://your-elastic.cloud:443/v1/logs',
apiKey: 'your-api-key',
});
// Set user identity after login
setUser({
id: 'david',
email: '[email protected]',
name: 'David Hope'
});
// Enable automatic instrumentation
enableAllInstrumentations(); That's it. Clicks, navigation, errors, and frustration signals are now being captured and sent to Elastic.
Querying in Elastic
ES|QL makes this data immediately useful:
-- All events for a specific user
FROM logs-generic.otel-default
| WHERE attributes.user.id == "david"
| SORT @timestamp ASC
-- Frustrated users in the last hour
FROM logs-generic.otel-default
| WHERE attributes.frustration.type IS NOT NULL
| STATS count = COUNT() BY attributes.user.id
| SORT count DESC
-- Rage clicks by page
FROM logs-generic.otel-default
| WHERE attributes.frustration.type == "rage_click"
| STATS count = COUNT() BY attributes.page.url
| SORT count DESC The data is structured for querying. You can ask questions that would be impossible with video recordings.
What's Next: The AI Layer
The instrumentation layer is done. Next: AI synthesis. Ask "what happened to user David on checkout?" and get a narrative that pulls from logs and traces.
Session IDs link events together. Trace IDs connect to backend operations. Frustration scores highlight important moments. An LLM can synthesize this into a story of what happened, faster than watching video.
Try It
The code is open source: github.com/davidgeorgehope/sessionreplay
Clone it, point it at your Elastic Cloud instance, and see semantic session events start flowing in. The demo app includes a load test that generates realistic user behavior including frustration signals.
This is still early. But the core thesis from my last post is holding up: structured semantic events + frustration detection is more useful than DOM recording. And it's privacy-friendly, storage-efficient, and queryable from day one.