December 2025

LLMs Keep Hallucinating My URLs (So I Built a Fix)

I asked Claude to build me a publications page with links to my articles. It did a great job—clean layout, nice organization, even got the article titles right. But every single URL was wrong.

The URLs looked plausible. They followed Elastic's URL patterns. They just didn't exist. Claude had hallucinated them.

This is a known problem with LLMs, but it's particularly annoying with URLs because they're so easy to get wrong and so hard to notice. A hallucinated URL looks exactly like a real one until you click it and get a 404.

What Went Wrong

I asked Claude Code to create a page listing my Elastic blog posts. Claude knew I had written articles. It even knew the titles. But it didn't have the actual URLs, so it generated plausible-looking ones based on patterns from training data.

My article "The hidden costs of tool sprawl" became /blog/sre-guide-observability-tool-sprawl-consolidation. The real URL? /blog/guide-observability-consolidation. Close, but not close enough.

The frustrating part: Claude has web search. It could have looked up the real URLs. But the model generates tokens probabilistically—it doesn't "know" when it's making up a URL versus remembering one. It just predicts the next token based on context.

The Architectural Problem

The obvious fix—validate URLs after generation—has a problem: the output still routes through the LLM. Even if you validate and tell the model "that URL is wrong," it might hallucinate a different wrong URL in response.

The deeper issue is that LLMs generate text, not references. When you ask for a URL, the model doesn't look up a fact and return it—it predicts what characters would come next in a plausible response. That's fundamentally different from retrieval.

A few architectural approaches work:

Structured references: The model outputs citation indices like [source:3], and a post-processor maps those to actual URLs from search results. The LLM never generates the URL string itself.
Two-phase workflow: First search for real URLs, store them in a reference file, then generate content using only URLs from that file.
Output validation as a filter: Validate URLs in the final output and strip or flag invalid ones—without feeding corrections back to the model.

The Solution: A Claude Code Hook

I went with option 3. Claude Code supports hooks—shell commands that execute at specific points in the agent's lifecycle. There are hooks for before/after tool calls, when sessions start, when Claude stops responding, and more. They're an official extension mechanism, and they provide something crucial: deterministic behavior in a probabilistic system.

I wrote a hook that fires after any file write, extracts URLs, and validates them via HTTP HEAD requests.

If any URL returns a 404 (or fails to connect), the hook fails with an error message listing the broken links. Claude sees the error, searches for the real URLs, and fixes the file.

The validation doesn't re-inject into the model's generation. It blocks bad output and provides feedback. The model then has to find correct URLs, which it can do when forced to search.

# The hook validates URLs in .html, .astro, .md files
# Extracts URLs with grep, checks each with curl
# Fails if any return 4xx/5xx

if [ -n "$BROKEN_URLS" ]; then
    echo "URL VALIDATION FAILED"
    echo "Found $FAILED broken URL(s) in $FILE_PATH:"
    echo -e "$BROKEN_URLS"
    echo "Please search for the correct URLs."
    exit 1
fi

Does It Work?

Yes. After I set up the hook, I asked Claude to add some links. It generated hallucinated URLs (old habits), the hook caught them, and Claude then searched for the real URLs and fixed the file. The feedback loop works.

It's not perfect. The hook runs after generation, so there's still wasted work. A better solution would enforce URL sourcing upfront—requiring the model to search before including any external link. But this catches the problem, and catching it is most of the battle.

The Takeaway

LLMs are good at many things, but retrieving arbitrary strings isn't one of them. URLs, phone numbers, API keys, version numbers—anything without semantic structure—will be unreliable.

Don't wait for the model to get better at memorizing. Architect around the limitation: structured references, validation, separation of generation from retrieval.

Or, in the spirit of observability: if you can't trust the output, instrument it. Measure what's actually happening. Add a check that fails loudly when something goes wrong.

That's basically what this hook does. It's observability for LLM outputs.

Try It Yourself

If you use Claude Code, you can add this hook to your project. Create .claude/settings.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "command": ".claude/hooks/validate-urls.sh"
      }
    ]
  }
}

Then add the validation script at .claude/hooks/validate-urls.sh. The full implementation is in this site's repo if you want to grab it.

Hooks run automatically. You don't have to remember to validate. That's how guardrails should work.