AI Visibility Checklist: The 16 Checks Every Website Needs in 2026

The complete AI visibility checklist for 2026. 16 concrete checks across Technical SEO, Content, Structured Data, GEO Readiness, and AI Crawler Access to make sure ChatGPT, Perplexity and Google AI can find and cite your site.

Why a Checklist Matters in 2026

Most websites are invisible to AI search engines and their owners do not know it. They check Google rankings, ignore everything else, and wake up one day to find that ChatGPT, Perplexity, and Google AI Overviews never mention them.

This is not a vague risk. Over 60% of websites block at least one AI crawler accidentally, and the vast majority lack the structured data, citability signals, and entity clarity that AI models need to recommend a business confidently.

This checklist breaks down the 16 most important AI visibility checks for 2026. You can run them manually using the instructions below, or use our free AI Exposure audit to run all 16 in 60 seconds.

Category 1: Technical SEO (4 Checks)

The foundation. If AI crawlers cannot reach your site or parse it cleanly, nothing else matters.

☐ 1. robots.txt allows AI crawlers and references sitemap

Your robots.txt should not block GPTBot, ClaudeBot, PerplexityBot, Google-Extended, or any other AI crawler. It should also reference your sitemap.

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

See our full guide to AI crawlers for details on all 11 major bots.

☐ 2. sitemap.xml exists and lists all important pages

A valid /sitemap.xml with <lastmod> dates on every URL. Submit it to Google Search Console and Bing Webmaster Tools so crawlers discover updates fast.

☐ 3. Canonical URL set on every page

Every page should declare its canonical URL:

<link rel="canonical" href="https://yoursite.com/page-path" />

Prevents duplicate-content confusion when AI models compare versions of your page.

☐ 4. Open Graph tags present

Helps social platforms and some AI engines understand your page identity:

<meta property="og:title" content="..." />
<meta property="og:description" content="..." />
<meta property="og:image" content="..." />

Category 2: Content Quality (4 Checks)

AI models prefer content that is clear, factual, and citable. Marketing fluff gets ignored.

☐ 5. Exactly one H1 that describes the page

Each page should have a single <h1> that clearly describes what the page is about. Multiple H1s confuse AI parsing.

☐ 6. At least 1,000 words of informative content on key pages

Pages with under 300 words are routinely deprioritized by AI engines because there is not enough context to cite from. Aim for 1,000+ words on your homepage and key landing pages.

☐ 7. FAQ section with 5+ questions

A clear FAQ section, ideally with FAQPage schema, gives AI engines ready-made Q&A pairs to surface in their answers. This is one of the highest-ROI signals.

☐ 8. Marketing-to-information ratio under 2%

Pages dominated by marketing phrases like “world-class,” “industry-leading,” or “innovative solutions” are penalized. AI models reward fact-rich content with specific numbers, dates, and concrete claims.

Category 3: Structured Data (3 Checks)

Schema.org markup gives AI engines a machine-readable map of your business. See our structured data guide for full code examples.

JSON-LD Organization schema on your homepage with sameAs links to LinkedIn, Twitter, Crunchbase, Wikipedia, and any other authoritative profile. This is the single highest-impact addition for AI entity recognition.

☐ 10. WebSite schema with SearchAction

A WebSite schema with a SearchAction lets AI engines understand how to send users to a search on your site. Especially valuable for content-heavy sites.

☐ 11. FAQPage schema on FAQ content

If you have an FAQ section (check 7), wrap it in FAQPage JSON-LD so AI engines can pull individual Q&A pairs directly into their answers.

Category 4: GEO Readiness (3 Checks)

Generative Engine Optimization signals specific to AI search — these are what differentiates a site that gets cited from one that gets ignored.

☐ 12. llms.txt file at /llms.txt

A machine-readable summary of your site at yoursite.com/llms.txt. Acts as an “elevator pitch” AI models can fall back on. See our llms.txt guide for templates.

☐ 13. Clear entity description in the first section of the homepage

AI engines need to understand who you are in one sentence. Your homepage should clearly state: “X is a [type] that helps [audience] to [benefit].” No marketing fluff — just a clean factual definition.

☐ 14. At least 5 citable blocks (facts, statistics, definitions)

Pages should contain self-contained, fact-rich paragraphs (130-170 words each) with specific numbers, dates, or definitions. These are what AI models quote when answering user questions.

Category 5: AI Crawler Access (2 Checks)

Even with perfect content, blocked crawlers means zero visibility.

☐ 15. All Tier 1 AI bots explicitly allowed

The most important bots to check individually:

BotCompanyRole
GPTBotOpenAIChatGPT training + browsing
OAI-SearchBotOpenAIChatGPT search results
ChatGPT-UserOpenAILive ChatGPT browsing
ClaudeBotAnthropicClaude content access
PerplexityBotPerplexityPerplexity citations

None of these should appear under Disallow in your robots.txt.

☐ 16. Google-Extended and major Tier 2 bots allowed

Google-Extended controls whether your content appears in Google AI Overviews and Gemini. Blocking it has zero impact on Google Search rankings but kills your AI Overviews visibility. Also check Applebot-Extended (Siri), Bytespider (TikTok AI), and CCBot (Common Crawl, used by many models).

How to Run This Checklist in 60 Seconds

You can go through these 16 checks manually — open robots.txt, inspect your HTML, validate schemas, count citable paragraphs — but it takes a few hours per site.

Or you can run a free AI Exposure audit and get all 16 results in under a minute, plus a prioritized action plan with step-by-step fixes and code examples for everything that fails.

What the Best Sites Get Right

The websites that AI engines consistently cite share five traits:

  1. They were intentional about GEO from day one instead of bolting it on later
  2. They publish structured data on every important page
  3. They include a llms.txt file describing their business clearly
  4. They never block AI crawlers — see our full crawler guide
  5. They write fact-rich content with specific numbers and citations

You do not need to be a Fortune 500 to get cited by AI. You need to be discoverable, citable, and clearly scoped to your topic.


Want to know exactly which of these 16 checks your site passes or fails? Run a free AI Exposure audit — get your score across all 16 checks in 60 seconds, with a prioritized action plan including step-by-step fixes.

Check Your AI Visibility Score

Free audit in 60 seconds. No signup required.

Get Free Audit
← Back to Blog