# /llms.txt—a proposal to provide information to help LLMs use websites
Jeremy Howard
2024-09-03

> *Much of this content is copied from the [llms.txt
> proposal](https://llmstxt.org/). See that link for more details,
> examples, and tools for using llms.txt files.*

## Summary

We propose adding a `/llms.txt` file to websites that are designed for
reading by language models, not just humans. `llms.txt` is a file that
outlines the information that a model may want to retrieve (with links)
when assembling context for LLM prompts relevant to a website. Here’s an
[example llms.txt file](https://docs.fastht.ml/llms.txt).

The `llms.txt` file may contain both external and internal links, which
are curated to be appropriate for the subject matter, and suitable for
direct ingestion by LLMs. The problem this solves is that today,
constructing the right context for LLMs based on a website is ambiguous
— do you:

1.  Crawl the sitemap and include every page, trying to automatically
    format into an LLM-friendly form?
2.  Selectively include external links in addition to the sitemap?
3.  For specific domains like software documentation should you also try
    to include all the source code?

Site authors know best, and can provide a list of content that an LLM
should use. We do not prescribe exactly how you should assemble and
format the context. We’re only proposing a standard where authors can
outline the most pertinent and LLM-friendly context, and tools can later
assemble and format that context in a form suitable for the application.
`llms.txt` is programmatically parseable.

We also propose making an LLM-friendly version of web pages available at
a URL adds an `.md` suffix to the regular address—for instance, here is
a [regular HTML docs
page](https://docs.fastht.ml/tutorials/by_example.html), along with
exact same URL but with [a .md
extension](https://docs.fastht.ml/tutorials/by_example.html.md). For an
example of how these can work together, here’s [a custom
GPT](https://chatgpt.com/g/g-rnbNNmdz6-answers-about-answer-ai) which
answers questions about Answer.AI using the information from
[Answer.AI’s llms.txt](https://www.answer.ai/llms.txt).

## Background

Today websites are not just used to provide information to people, but
they are also used to provide information to large language models. For
instance, language models are often used to enhance development
environments used by coders, with many systems including an option to
ingest information about programming libraries and APIs from website
documentation.

Providing information for language models is a little different to
providing information for humans, although there is plenty of overlap.
Language models generally like to have information in a more concise
form. This can be more similar to what a human expert would want to
read. Language models can ingest a lot of information quickly, so it can
be helpful to have a single place where all of the key information can
be collated.

## Proposal

<figure>
<img src="llms-logo.png" class="lightbox floatr" width="150"
alt="llms.txt logo" />
<figcaption aria-hidden="true">llms.txt logo</figcaption>
</figure>

We propose that those interested in providing LLM-friendly content add a
`/llms.txt` file to their site. This is a markdown file that provides
brief background information and guidance, along with links to markdown
files (which can also link to external sites) providing more detailed
information. This can be used, for instance, in order to provide
information necessary for coders to use a library, or as part of
research to learn about a person or organization and so forth. You are
free to use the `llms.txt` logo on your site to indicate your support if
you wish.

llms.txt markdown is human and LLM readable, but is also in a precise
format allowing fixed processing methods (i.e. classical programming
techniques such as parsers and regex). For instance, there is an
[llms-txt](../intro.html) project providing a CLI and Python module for
parsing `llms.txt` files and generating LLM context from them. There is
also a [sample JavaScript](../llmstxt-js.html) implementation.

We furthermore propose that pages on websites that have information that
might be useful for LLMs to read provide a clean markdown version of
those pages at the same URL as the original page, but with `.md`
appended. (URLs without file names should append `index-commonmark.md`
instead.)

The [FastHTML project](https://fastht.ml) follows these two proposals
for its documentation. For instance, here is the [FastHTML docs
llms.txt](https://docs.fastht.ml/llms.txt). And here is an example of a
[regular HTML docs
page](https://docs.fastht.ml/tutorials/by_example.html), along with
exact same URL but with [a .md
extension](https://docs.fastht.ml/tutorials/by_example.html.md). Note
that all [nbdev](https://nbdev.fast.ai/) projects now create .md
versions of all pages by default, and all Answer.AI and fast.ai software
projects using nbdev have had their docs regenerated with this
feature—for instance, see the [markdown
version](https://fastcore.fast.ai/docments.html.md) of [fastcore’s
docments module](https://fastcore.fast.ai/docments.html).

This proposal does not include any particular recommendation for how to
process the file, since it will depend on the application. For example,
FastHTML automatically builds a new version of two markdown files
including the contents of the linked URLs, using an XML-based structure
suitable for use in LLMs such as Claude. The two files are:
[llms-ctx.txt](https://docs.fastht.ml/llms-ctx.txt), which does not
include the optional URLs, and
[llms-ctx-full.txt](https://docs.fastht.ml/llms-ctx-full.txt), which
does include them. They are created using the
[`llms_txt2ctx`](https://llmstxt.org/intro.html#cli) command line
application, and the FastHTML documentation includes information for
users about how to use them.

llms.txt files can be used in various scenarios. For software libraries,
they can provide a structured overview of documentation, making it
easier for LLMs to locate specific features or usage examples. In
corporate websites, they can outline organizational structure and key
information sources. Information about new legislation and necessary
background and context could be curated in an `llms.txt` file to help
stakeholders understand it.

llms.txt files can be adapted for various domains. Personal portfolio or
CV websites could use them to help answer questions about an individual.
In e-commerce, they could outline product categories and policies.
Educational institutions might use them to summarize course offerings
and resources.

## Format

At the moment the most widely and easily understood format for language
models is Markdown. Simply showing where key Markdown files can be found
is a great first step. Providing some basic structure helps a language
model to find where the information it needs can come from.

The `llms.txt` file is unusual in that it uses Markdown to structure the
information rather than a classic structured format such as XML. The
reason for this is that we expect many of these files to be read by
language models and agents. Having said that, the information in
`llms.txt` follows a specific format and can be read using standard
programmatic-based tools.

The `llms.txt` file spec is for files located in the root path
`/llms.txt` of a website (or, optionally, in a subpath). A file
following the spec contains the following sections as markdown, in the
specific order:

- An H1 with the name of the project or site. This is the only required
  section
- A blockquote with a short summary of the project, containing key
  information necessary for understanding the rest of the file
- Zero or more markdown sections (e.g. paragraphs, lists, etc) of any
  type except headings, containing more detailed information about the
  project and how to interpret the provided files
- Zero or more markdown sections delimited by H2 headers, containing
  “file lists” of URLs where further detail is available
  - Each “file list” is a markdown list, containing a required markdown
    hyperlink `[name](url)`, then optionally a `:` and notes about the
    file.

Here is a mock example:

``` markdown
# Title

> Optional description goes here

Optional details go here

## Section name

- [Link title](https://link_url): Optional link details

## Optional

- [Link title](https://link_url)
```

Note that the “Optional” section has a special meaning—if it’s included,
the URLs provided there can be skipped if a shorter context is needed.
Use it for secondary information which can often be skipped.

To create effective `llms.txt` files, consider these guidelines: Use
concise, clear language. When linking to resources, include brief,
informative descriptions. Avoid ambiguous terms or unexplained jargon.
Run a tool that expands your `llms.txt` file into an LLM context file
and test a number of language models to see if they can answer questions
about your content.
