Merchants · Agent discoverability

How AI Agents Will Discover Your Brand in 2026

SEO was the discoverability playbook for browsers. For AI agents, the signals are different — and mostly machine-readable. A practical checklist, from running inside the discovery layer.

April 15, 2026 Ge Jiaqi ~8 min read

If you run a brand or a merchant storefront, here is the awkward version of a question you already have: when a user asks Claude, ChatGPT, Perplexity, Cursor, or any of the hundreds of smaller agents "where should I buy X?", does your brand come up?

For most merchants in 2026, the honest answer is: sometimes, unpredictably, and mostly for the wrong reasons. That will keep being the case until a merchant consciously builds for agent discoverability — which is a different discipline from SEO, with a different audit list and a different set of measurable signals.

This post walks through what actually works, based on running a small piece of the discovery layer and watching which signals matter.

1. The sources an agent actually reads

Agents don't see the web the way a human search user does. They read three distinct surfaces, in roughly this order of preference:

The training corpus. The LLM backing the agent has a fixed-at-training view of your brand. If your site had good structured data two years ago, your brand is inside the model. If it didn't, you are mostly invisible there. You can't edit this; you can only influence the next training cycle.

Retrieval-augmented search. When an agent answers a question it isn't sure about, it runs a web search through whatever search provider its host has wired up — usually Google or Bing, sometimes Brave, sometimes a vertical search API. The search ranks pages, the agent re-reads the top handful, and the answer is synthesized. Classic SEO still matters here, but the signals that rank well for agents skew toward machine-readable structured data, not toward keyword-density marketing copy.

Direct tool calls. Modern agents increasingly have tool-calling capability — Model Context Protocol (MCP) is the open standard that has crystallized around this in 2025-2026. If your brand is reachable via an MCP server the agent has installed, the agent can ask it about you synchronously inside the conversation, with zero web-search latency.

Each of these surfaces rewards a different investment. Together they define the 2026 discoverability playbook.

2. The merchant-side checklist

Here is the short list we see working, in rough priority order.

a. schema.org on every product page

This is the table-stakes move. Every product detail page should carry schema.org/Product JSON-LD in the <head>, with name, brand, offers.price, offers.priceCurrency, offers.availability, and offers.areaServed for the regions you actually fulfil to. Product pages without this structure end up invisible to a retrieval-based agent, because there is nothing typed for the model to reason over.

The category page and brand homepage should carry Organization and CollectionPage nodes with the same level of fidelity. sameAs should point to your verified social profiles and your canonical storefront URL.

b. llms.txt at the site root

Jeremy Howard's llms.txt convention (2024) gave LLM crawlers a standard way to find a machine-readable site index — roughly what robots.txt did for search crawlers, but pointing at a curated list of pages worth reading rather than rules about pages to avoid. Agents increasingly check for /llms.txt before doing a deep crawl. If yours isn't there, they fall back to guessing your sitemap.

Your llms.txt should list your main category pages, your brand pages, a product-feed reference if you have one, and any documentation that explains what you sell and who you serve. See ours at xurprise.ai/llms.txt as a small reference.

c. Permissive robots.txt for agent crawlers

A surprising number of merchants have robots.txt that blocks GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Every rule you add there is a door you close. Unless you have a specific reason to block one of these crawlers (intellectual-property concern, licensing ambiguity, compute cost), consider them the same as Googlebot: let them in.

d. An MCP endpoint if you want to be in the first page of answers

The single highest-leverage move in 2026 is publishing an MCP server that exposes your product catalogue as a typed tool. An agent that has your MCP installed will, with near certainty, call yours before it tries a web search — because tool calls are lower latency, more reliable, and the agent's host has already paid for the integration. You want to be in the call-before-web-search slot, not the call-after-web-search slot.

If running your own MCP server is more infrastructure than your team wants to take on, you can show up in someone else's. xurprise is one such aggregator-of-agent-surfaces; the Official MCP Registry lists many more.

e. Agent-readable region signals

One of the biggest failure modes we see: product pages that don't declare the regions they actually serve. The agent has to infer from address clues, from currency, from shipping-estimate widgets in JavaScript the agent can't run. If you only ship to Singapore, say it in structured data. If you ship to Malaysia and Indonesia but not Vietnam, say that too. Offer.eligibleRegion and Offer.areaServed are the fields. Region signals are what decides whether a Singapore user's agent recommends you or skips you.

f. Stable URLs and canonical tags

Agents hate ambiguity. If your product URL rotates based on tracking parameters, you will appear in the retrieval index as several different pages, none of which rank well, all of which look to the agent like near-duplicates. Pick one canonical URL per SKU. Use <link rel="canonical">. Keep the URLs short and human-readable.

g. Direct citations from indexable content

Link equity still matters. If your brand is mentioned in a well-indexed blog post, a Reddit comment, a review roundup, or a vertical community, the agent is more likely to surface you. This is the one thing on the checklist that still looks like "PR" rather than "engineering" — but now the PR target is not the user, it's the training corpus of the next model.

3. The signals that will matter in 18 months

Three things we think are coming that aren't standard yet:

Real-time product feeds exposed via MCP. Most merchants today publish static sitemaps. Within 18 months, serious e-commerce brands will expose a streaming, typed MCP endpoint that an agent can query with "what's in stock in region R under price P" and get a tight answer. Merchants that build this first will out-recommend merchants who don't.

Agent-native loyalty signals. Agents are learning which merchants deliver when they say they will, which ones have honest stock levels, and which ones handle returns cleanly. Those signals will feed into recommendation weightings. Reputation data is about to become as important to agents as it already is to humans — but it will be gathered differently, through API-accessible fulfilment and review signals.

Structured multilingual availability. For merchants that serve multi-language markets (Southeast Asia, Switzerland, Canada, India, most of Africa), being legible in each language's native script turns out to matter a lot. An agent serving a user in Thai doesn't want to get back English-only descriptions. Multilingual indexing is going to stop being a nice-to-have.

The framing shift that matters most: you are no longer writing for a user, you are writing for the agent that is writing for the user. Your copy is still read by humans eventually, but the gatekeeper has moved up a layer.

4. Measuring it

Classic SEO had clear metrics — rankings, impressions, click-through rates. Agent discoverability is harder to measure because the agent's conversation with the user is private. Three proxy signals work surprisingly well:

Named-agent crawler hits. Watch your Cloudflare / CDN analytics for GPTBot, ClaudeBot, PerplexityBot, Google-Extended. Rising crawler traffic is the leading indicator of rising training / retrieval presence.
Referrer-free clicks from tracked domains. When agents surface your link and the user clicks, the referer header is often empty or from the agent host. A rising share of "direct-ish" traffic on product pages that don't have any marketing campaign attached is almost always agent-sourced.
Direct-question testing. Once a week, open Claude / ChatGPT / Perplexity and ask them directly, in the categories you compete in, where to buy X. Does your brand show up? In what position? Over time, is it trending up or down? This is the qualitative version of keyword-rank tracking.

5. Where to start

If your team has a week:

Day 1–2: schema.org Product on every product page, with Offer.eligibleRegion filled in honestly.
Day 3: llms.txt at the site root. Include category pages, top brand pages, product-feed reference.
Day 4: clean robots.txt, remove any agent-crawler blocks that don't have a specific justification.
Day 5: canonical URLs, stable slugs, sameAs on Organization.
Rest of the month: plan an MCP endpoint, start listing with aggregators that already have one, publish thought-leadership content that explains your category in ways an agent will cite.

None of these are speculative. They all return measurable results within a crawl cycle or two. Start there.

— Ge Jiaqi · April 15, 2026 · more posts