llms.txt: The New robots.txt for AI Crawlers

llms.txt is a plain-text file placed at the root of a website (yourdomain.com/llms.txt) that tells AI language models which pages are most relevant for training, indexing, and citation. It was proposed by Jeremy Howard of fast.ai in 2024 and has since been adopted by hundreds of sites including Cloudflare, Anthropic, and Perplexity itself.

The analogy to robots.txt is intentional but imprecise. robots.txt tells crawlers what not to index. llms.txt tells AI systems what to prioritise — it is an opt-in affirmative signal, not a restriction mechanism. A site without an llms.txt is still crawlable; a site with a well-structured one gives AI systems a curated map that increases the likelihood the right content is surfaced.

Which AI crawlers respect llms.txt

Adoption is growing but uneven. As of mid-2025, confirmed support includes:

Crawler	Operator	llms.txt support
ClaudeBot	Anthropic	Yes
OAI-SearchBot	OpenAI	Partial (robots.txt primary)
PerplexityBot	Perplexity AI	Yes
GoogleOther	Google	Under evaluation
Applebot-Extended	Apple	No
ChatGPT-User	OpenAI	Partial

The llms.txt format

The file uses a superset of Markdown. The llmstxt.org specification defines three required sections and two optional ones:

# Site nameH1: the name of your site or product
> TaglineBlockquote: one-sentence description used as context
## SectionH2: groups related pages together
- [Title](URL): descriptionLinked list items: pages with brief descriptions
## OptionalSections for: docs, API reference, examples

A template for marketing sites

Here is the structure we use on client projects. The goal is to give AI systems enough context to understand what the company does, who it serves, and where the authoritative content lives — in under 150 lines.

# Nous Frame

> Independent web design studio. We design, build, and maintain
> conversion-focused websites with editorial craft and technical precision.

Nous Frame works with ambitious brands — primarily in tech, finance,
and professional services — to ship websites that combine visual
excellence with measurable commercial outcomes.

## Services

- [Web Design & Development](/services): Custom-built marketing sites,
  landing pages, and web applications. No page builders.
- [SEO & GEO Optimisation](/services#seo): Technical SEO, Core Web
  Vitals optimisation, and Generative Engine Optimisation for AI search.
- [Ongoing Maintenance](/services#maintenance): Hosting, security, and
  iterative improvement post-launch.

## Resources

- [What is GEO](/resources/geo-vs-seo): How Generative Engine
  Optimisation differs from classic SEO and why both are now required.
- [Core Web Vitals 2026](/resources/core-web-vitals-2026): The three
  metrics Google ranks on and how to hit 90+ on mobile.
- [Schema.org for marketing sites](/resources/schema-org-marketing):
  Minimum viable structured data graph for a professional services site.

## Optional

- [llms.txt](/llms.txt): This file
- [Sitemap](/sitemap.xml): Full site index

llms-full.txt: the extended variant

The specification also defines an llms-full.txt variant that includes the full text of key pages rather than just links. This is particularly useful for documentation-heavy sites where AI systems frequently need the full content of a reference page (API docs, technical specs, policy documents). For most marketing sites, the standard llms.txt is sufficient — the AI crawler will follow the links and retrieve the content itself.

What llms.txt does not do

Adding an llms.txt file does not guarantee your content will be cited. It is a discoverability signal, not a ranking guarantee. The quality, specificity, and authority of the content on the pages you list is what determines citation frequency. An llms.txt pointing to thin, vague content will not move the needle. Think of it as the index at the front of a textbook — it only helps if the chapters are worth reading.

It also does not prevent AI training on your content. To opt out of AI training crawls, you still need to use robots.txt directives targeting specific user agents (e.g., User-agent: CCBot Disallow: / to block Common Crawl). These are separate mechanisms with separate purposes.

Sources

Next read

Core Web Vitals in 2026: what still matters→