Large Language Models are everywhere and are getting better at understanding the web almost in real time.
However, because of the size of their context windows, they might miss key information about websites amidst ads, scripts, banners, or other irrelevant content that isn't about the actual information itself.
That's where the llms.txt file plays a role: it allows us to have a compressed version of our site or pages of our site in a format that LLMs easily understand: Markdown.
In this article, we will learn how to add a llms.txt file to a Rails application and some best practices.
Before anything, let's learn about what it is and reasons to add it. If you're already aware about those, skip to the application setup.
What is the llms.txt file
At its core, it's a Markdown file located at /llms.txt
which aims to offer LLM-friendly content. It is supposed to add relevant background information to help LLMs understand what our site or a given page is all about.
The main benefits of using the file are:
-
Efficiency: LLMs can produce better inference responses even if they weren't trained using the data on the
llms.txt
file. - Improved context: because context windows are one of the limitations of LLMs, increasing their capacity to hold context by only handling them relevant information is a plus.
- Better user experience: as we improve the way LLMs learn about our content, users can get a better experience because it's easier to match their intent to our offerings.
- Machine and human consumable content: providing a simplified overview about our sites can improve LLMs, but they can also help humans find what they're looking for.
Currently, it's a proposed standard, so things might change in the future, but the core concept probably won't.
The standard considers that the llms.txt
will mainly be useful for inference, at the time the user is seeking assistance, and not so much for training. However, the file can become part of the training data in the future.
Why add it
Adding an llms.txt
file to your apps and websites is not only about helping LLMs by providing a friendlier format.
Just like with traditional on-page SEO, the goal is to partially control the information about our sites or products that people can access to. We might want to influence our product's messaging, pointing out the features we consider important, or even talk about our product's weaknesses or disadvantages.
We can also use the file to point LLMs toward documentation or significant information that can get confusing to parse, especially if that information changes or if we have to keep older versions of documentation lying around for compatibility purposes.
Further reading
Adding a sitemap to a Rails applicationAccording to an AI traffic study by Semrush, traditional SEO factors are currently driving large portions of visibility in LLMs. But we don't know for how long. They actually predict that organic search visitors will decrease in favor of LLM visitors, which, according to them, are currently 4.4 times as valuable.
Even though there's no certainty about the usage of the file by current models, the fact that an increasing amount of traffic, and conversions, is being driven by LLMs means that we have to try our best to provide information that can help us increase our chances of appearing on LLMs suggestions.
Format
The llms.txt
file doesn't use a structured format like sitemaps, mainly because language models and agents can better interpret information that doesn't follow a stricter pattern.
To make it lightweight and readable by traditional programmatic tools, the llms.txt
uses Markdown.
The proposed format that seems to be loosely followed at the time looks like the following:
- A level 1 heading with the name of the site. It's the only required part of the file.
-
A summary of the project: included right below the main heading using a blockquote
>
. It's useful to let LLMs know about our product while letting potential customers or users know if our product can be a good fit. -
Zero or more Markdown sections: below the summary, we can include more paragraphs, lists, or even links to give more detailed information about the project and how to interpret the file. For example, if we have the detailed
llms-full.txt
file, we might include a reference to it here. - Zero or more H2 limited sections: these can be thought of as containers to better organize the information we want to provide. The standard proposes that each section should contain a list of URL's with some further details.
- A skippable "optional" section: this section should also be a list of links, but it's understood that LLMs might ignore it if it exceeds its context window size.
An example for our imaginary podcasting platform could look like this:
# Harmonic
> Harmonic delivers crystal-clear audio, seamless publishing, and powerful analytics for podcasters. Instantly launch, grow, and monetize your show with intuitive tools, customizable branding, and wide distribution.
For complete documentation in a single file, see [Full documentation](https://harmonic.text/llms-full.txt).
## Features
- [Seamless uploading](https://harmonic.test/uploading)
- [Advanced analytics](https://harmonic.test/analytics)
- [Customizable branding](https://harmonic.test/branding)
## Optional
- [Podcast hosting platform](https://harmonic.text/blog/podcast-hosting-platform)
- [Audio setup for podcasting](https://harmonic.text/blog/audio-setup-for-podcasting)
Application setup
To show how to add the llms.txt
file to our app, we will create a fantasy SaaS podcasting application with a landing page and a simple blog.
Let's start by creating the application:
rails new podcast --css=tailwind --javascript=esbuild
Then, we will add a Post
model:
bin/rails generate scaffold Post title description:text body:text
Let's run the migrations:
bin/rails db:migrate
Then, after some styling work, our site looks like this:
As you can see, we have a landing page describing our product, and a blog with an index of posts and the ability to visit individual blog posts
Let's start by adding the llms.txt
file to our root page:
Adding the llms.txt file
The first step to implement the standard in Rails applications is to add an llms.txt
file at the root level of our domain.
Let's start by adding the route:
# config/routes.rb
get "llms.txt" => "pages#llms"
Then, in our PagesController
we add the action with a custom responder to make it clear that we're expecting requests with the text
MIME type:
class PagesController < ApplicationController
# Rest of the code
def llms
respond_to do |format|
format.text
end
end
end
Then, we add the content of the sample file we previously showed to the app/views/llms.text.erb
partial.
Now, when we access llms.txt
we should get something like this:
Of course, the way you format this file depends on your site's structure and what you consider to be important.
If you want to find inspiration for how to format your llms.txt
file, take a look at llmstxthub, a directory that includes links to real examples from actual companies.
Make sure to browse as many examples as you can to see the different ways in which people are formatting the file and get inspiration from them.
To add the llms-full.txt
version, we just have to add the route and corresponding action:
# config/routes.rb
get "llms.txt" => "pages#llms"
get "llms-full.txt" => "pages#llms_full"
# app/controllers/pages_controller.rb
class PagesController < ApplicationController
# Rest of the code
def llms
respond_to do |format|
format.text
end
end
def llms_full
respond_to do |format|
format.text
end
end
end
Finally, we should be able to access both files with our desired content:
Adding plain markdown pages
Now that we are rendering both files, let's learn how to render existing content like our blog posts using Markdown which is accessible from the same URL as the post but appending the .md
extension.
Of course, the post might have a Markdown body like we did in the syntax highlighting with Rails article, but we cannot just render the body, we probably have some extra work to do.
The first thing we need to do is add a new MIME type. To achieve that, let's create a mime_types
initializer and add the Markdown type:
# config/initializers/mime_types.rb
Mime::Type.register "text/markdown", :md
This allows us to respond with the text/markdown
type when the .md
extension is used.
Now, assuming we have the markdown content stored in a body
attribute, let's start with an approach where we add a to_markdown
or to_llms_txt
on the Post
model:
# app/models/post.rb
class Post < ApplicationRecord
def to_markdown
<<~MARKDOWN
Published on: #{published_at.strftime('%b %d, %Y')}, by #{author}
#{body}
MARKDOWN
end
end
Then, let's render it in our posts_controller.rb
:
class PostsController < ApplicationController
def show
@post = Post.friendly.find(params[:id])
respond_to do |format|
format.html
format.md { render plain: @post.to_markdown }
end
end
end
Now, we should get the following result:
However, you might have resources that require more than just adding an H1 and publication date to a Markdown body.
In that case, instead of rendering explicitly in the controller, we don't pass anything to format.md
and define the corresponding partial within the app/views/posts
directory:
class PostsController < ApplicationController
def show
@post = Post.friendly.find(params[:id])
respond_to do |format|
format.html
format.md
end
end
end
And then, in the views:
<%# app/views/posts/show.md.erb %>
# <%= @post.title %>
Published on: <%= @post.published_at.strftime('%b %d, %Y') %>, by <%= @post.author %>
<%= @post.body %>
And we get the same result as before:
How it's being used
At the time of writing this, I performed a quick check to gather a notion of the actual usage of the llms.txt
.
The first interesting thing was that most of the bigger marketing companies weren't using the file.
Then, I stumbled upon some tech-oriented products that were using it, but most of the ones that I could think of weren't using it.
So I decided to write a simple script to check for the existence of the file against a modest dataset of 500 SaaS companies that I got from Kaggle.
The results are the following:
- From the 500 SaaS companies, only around 3.56% have an
llms.txt
file. - The average file has 820 lines, 233 links, 16 sections and is 85 KB in size.
- The formatting varies a lot: this probably means that companies figure that LLMs are capable of interpreting their way of producing the file. Some companies include keywords, some others add Allowed and Not allowed sections, some use a very commercial keyword-optimized approach, some focus on documentation and some focus on commercial intent.
- A vast majority of them are not using the
.md
formatted pages for the linked resources.
Next, I decided to do the same analysis for a list of 200 marketing companies: they're supposed to be early adopters of things that give them a competitive marketing advantage.
However, the results were very similar, around 3.72% of the marketing companies had the file at the time of the analysis.
We cannot jump to conclusions from such a small sample, but it appears that we're still in the very early stage of the standard's adoption.
This can mean that it's an ideal time to add it to our sites and projects, especially if we consider that we can probably generate the content of the file from material we already have.
TL;DR
LLMs are changing the way users consume content all over the world, and they are doing it at a rapid pace.
Just like sitemaps, meta tags, structured data or robots.txt
files became a standard for on-page SEO optimizations, the llms.txt
file is a standard proposed by Jeremy Howard, the founder of Fast AI which aims to help language models produce more useful results for users.
The main advantages of using this file are that we can extend the useful context window for LLMs, provide them useful information about our projects or sites and also being able to partially control the narrative around our products.
The llms.txt
standard suggests the use of a Markdown file which contains a description about our projects together with a list of useful links that might help LLMs provide useful information to our users.
Beyond the llms.txt
file, which is supposed to represent a summary of our site, we can also add an llms-full.txt
file which provides additional information if needed.
To add both of these files, let's start with the routes file:
# config/routes.rb
get "llms.txt" => "pages#llms"
get "llms-full.txt" => "pages#llms_full"
Then, we add the corresponding controller code:
# app/controllers/pages_controller.rb
class PagesController < ApplicationController
# Rest of the code
def llms
respond_to do |format|
format.text
end
end
def llms_full
respond_to do |format|
format.text
end
end
end
Next, we can add a view for each one of these actions. Let's add a view for the llms.txt
:
# Harmonic
> Harmonic delivers crystal-clear audio, seamless publishing, and powerful analytics for podcasters. Instantly launch, grow, and monetize your show with intuitive tools, customizable branding, and wide distribution.
## Features
- [Seamless uploading](https://harmonic.test/uploading)
- [Advanced analytics](https://harmonic.test/analytics)
- [Customizable branding](https://harmonic.test/branding)
## Optional
- [Podcast hosting platform](https://harmonic.test/blog/podcast-hosting-platform)
- [Audio setup for podcasting](https://harmonic.test/blog/audio-setup-for-podcasting)
Now, when we visit /llms.txt
we get the Markdown content.
The next step for the proposed standard is that links to the file should be accessible through the URL but also present a simplified Markdown version when we append the .md
extension.
To achieve this using Rails, we have to register Markdown as a MIME type within an initializer:
# config/initializers/mime_types.rb
Mime::Type.register "text/markdown", :md
Next, we add a partial to render the appropriate markdown:
<%# app/views/posts/show.md.erb %>
# <%= @post.title %>
Published on: <%= @post.published_at.strftime('%b %d, %Y') %>, by <%= @post.author %>
<%= @post.body %>
Then, we render the partial when the appropriate request comes through:
class PostsController < ApplicationController
def show
@post = Post.friendly.find(params[:id])
respond_to do |format|
format.html
format.md
end
end
end
With these tips, you should be able to add an llms.txt
file to your site and render the links within it using a simplified Markdown version of the actual content.
I hope you enjoyed this article and that you use the tips on it to make your site ready for the Generative Engine Optimization era of the web.
Have a good one and, happy coding!