A table of contents is a useful way to provide readers an improved navigation experience for our content.
UX is an increasingly important ranking factor, so helping users find what they're looking for is not only beneficial for them but also for us.
In this article, we will learn how to add a table of content feature to a Rails app using the Nokogiri gem and Stimulus to improve the navigation experience.
Let's start by understanding why adding a table of contents is a good idea:
Why add a table of contents
First and foremost, a table of contents is simply a succinct and convenient navigation list.
Its primary purpose is to give users an overview of the content, the subtopics it covers and, overall, a nice way to quickly jump to a specific part of the content that might particularly interest them.
Adding a TOC results in happier users, but it also provides some benefits that can help our sites:
- Lower bounce rates and higher visit duration: as long as the content we're creating satisfies the search intent, users will spend more time in our site which is a strong ranking signal of content quality.
- Ability to rank for more keywords: keywords that are present in navigation can product anchor links on the search results page. These are links below the site's name that help users navigate directly to a specific part of the content. They help improve our click-through rate and when the section title matches a search term it's possible to rank for keywords that don't exactly match the target keyword for our site.
- Users more likely to return: happier users are more likely to come back to our site, have a positive association with our brand or be more open to what we have to offer.
All in all, as long as you can, adding a table of contents to your site is almost always a good decision.
What we will build
For this tutorial, we will create a blog with Rails with a Post
model with a title
and content
attributes.
To generate the table of contents we will take a two-step approach: first we will analyze and parse the HTML using the Nokogiri gem and we will generate an array of hashes with the following structure:
items = [
{
label: "What are Active Record Callbacks?",
id: "what-are-active-record-callbacks",
children: [
{
label: "Available callbacks in Rails",
id: "available-callbacks-in-rails",
children: []
}
]
},
{
label: "Best practices for using callbacks",
id: "best-practices-for-using-callbacks",
children: []
}
]
We don't need to worry about the absolute level of the headings but the relation between them.
So, we need a label
, an id
which we can obtain from the heading content and a children
array that contains the headings that are below the current heading.
This way of generating the TOC allows us to separate the generation from the rendering, which means we can pass the results to a partial, ViewComponent or even generate a JSON response that can be consumed by a client in case we need to.
For this tutorial, we will render the table of contents using ViewComponent just to showcase how it's done, you can customize the rendering to your liking.
The end result will be something like this:
Plus, we will add tests in order to make sure everything works as expected.
Rails app setup
First we will create a new app, create the databases and run the first migration in order to create the schema.rb
file:
rails new toc --css=tailwind -j esbuild
cd toc && bin/rails db:create
bin/rails db:migrate
Next, we will create a Post
model using Rails scaffolding:
bin/rails generate scaffold Post title content:text
We won't be using Action Text or any specific editor to edit the HTML as we don't really care how the HTML is generated. However, you should pick an HTML editor for your application if you want the
Generating the table of contents with TDD
In order to build this feature using TDD we will add an empty TableOfContentsGenerator
class to a newly created /content
folder in the lib
directory.
Even though we could arguably add this class to the models directory or as a service object, it actually feels like something that could eventually become a gem because it's application agnostic.
After adding the empty class we will create a test in test/lib/content/table_of_contents_generator_test
with our first test case:
# test/lib/content/table_of_contents_generator_test.rb
require 'test_helper'
class Content::TableOfContentsGeneratorTest < ActiveSupport::Test
test "it returns an empty array if there are no headings" do
html_content = "<p>Not a heading</p>"
toc = Content::TableOfContentsGenerator.new(html_content).generate
assert_equal([], toc)
end
end
After running it, it fails because we haven't defined anything in the class yet. We have to at least add an initializer that receives the content as an argument and a generate
method that, for the moment, returns an empty array to make the test pass:
# lib/content/table_of_contents_generator.rb
class Content::TableOfContentsGenerator
def initialize(html)
end
def generate
[]
end
end
Now that our test is passing, we add two more tests that will pass with our current implementation
# test/lib/content/table_of_contents_generator_test.rb
test "it returns an empty array if headings are present but not contentable" do
html_content = "<h6>Section 1</h6><p>I'm a part of section one</p>"
toc = Content::TableOfContentsGenerator.new(html_content).generate
assert_equal([], toc)
end
test "it returns an empty array if the content is empty" do
html_content = ""
toc = Content::TableOfContentsGenerator.new(html_content).generate
assert_equal([], toc)
end
Now we have three passing tests with a method that returns an array, but we need to make this class do something useful, so we write a test that will fail this time:
# test/lib/content/table_of_contents_generator_test.rb
test "it generates a correct TOC when only top level headings are present" do
html_content = "<h2 id='section-1'>Section 1</h2><p>I'm part of section one</p>"
toc = Content::TableOfContentsGenerator.new(html_content).generate
expected = [{label: "Section 1", id: "section-1", children: []}]
assert_equal(expected, toc)
end
The test fails because we are expecting an array of hashes, and we're just getting an array.
To make the test pass, and generate our first successful table of contents we will use the Nokogiri gem which is an XML and HTML parser that allows us to read, write, modify and query documents.
Under the hood, Nokogiri does something similar to what the browser does: it parses HTML/XML text into a tree structure of objects. The browser calls that the “DOM: (Document Object Model)” while Nokogiri defines it as a Document
instance.
And, like the browser with the querySelector
, Nokogiri allows us to traverse the document, select specific parts of it and manipulate it to achieve our goals using CSS or XPath selectors.
As you can imagine, we can use those features to generate a table of contents using Nokogiri by selecting the headings in a document and processing them to fit our needs.
So, back to our feature and its tests, let's make our test go back to green by adding Nokogiri to the equation and returning a list of <h2>
headings in the desired format:
# lib/content/table_of_contents_generator.rb
class Content::TableOfContentsGenerator
def initialize(html)
@document = Nokogiri::HTML(html)
end
def generate
toc = []
@document.css("h2").each do |heading|
toc.push({label: heading.text, id: heading['id'], children: []})
end
toc
end
end
Now, our tests should be passing. If you're wondering what the code above does:
-
Nokogiri::HTML
generates a tree representation of our HTML. - The
@document.css("h2")
returns an array of objects representing every<h2>
element in our document. - We define an empty array to represent the table of contents, and then we iterate over the
<h2>
array and push a hash representation of each one of them into thetoc
array.
Now that everything passes we need to add one more test to the scenario where we have <h3>
tags within sections that have an <h2>
tag.
# /test/lib/content/table_of_contents_generator_test.rb
test "it generates a TOC with nested headings" do
html_content = <<-HTML
<h2 id="section-1">Section 1</h2>
<p>I'm a part of section one</p>
<h3 id="section-1-1">Section 1.1</h3>
HTML
expected = [
{
label: "Section 1",
id: "section-1",
children: [{label: "Section 1.1", id: "section-1-1", children: []}]
}
]
toc = Content::TableOfContentsGenerator.new(html_content).generate
assert_equal(expected, toc)
end
This test fails because we're not actually doing anything to populate the children array.
We know that we expect <h3>
tags to be inside <h2>
sections so what we can do is, retrieve both type of headings and, iterate over them and check if the current heading is an <h2>
or an <h3>
and push the latter to the children array of the parent heading:
# lib/content/table_of_contents_generator.rb
class Content::TableOfContentsGenerator
TARGET_HEADINGS = "h2, h3"
def initialize(html)
@document = Nokogiri::HTML(html)
end
def generate
toc = []
current_h2 = nil
@document.css(TARGET_HEADINGS).each do |heading|
if heading.name == "h2"
current_h2 = {label: heading.text, id: heading['id'], children: []}
toc.push(current_h2)
else
current_h2[:children].push({label: heading.text, id: heading['id'], children: []})
end
end
toc
end
end
This code makes the tests pass by creating a temp current_h2
variable that is nil
if the current heading is an <h2>
but is already populated with the heading 2 data if the heading is an <h3>
so what we do is append the current level 3 heading to the children
key of the hash.
You may notice that we have some repetition going on: we're generating the heading hash representation twice.
Now that we have test coverage and the tests are passing is a good time to refactor the code to remove the repetition.
If you take a closer look at the hash, it's actually not a heading representation but actually a TOC item so what we can do is extract that part into a value object using the Data core class introduced in Ruby 3.22
class Content::TocItem < Data.define(:label, :id, :children); end
So now, our generate
method would look like this:
# lib/content/table_of_contents_generator.rb
def generate
toc = []
current_h2 = nil
@document.css(TARGET_HEADINGS).each do |heading|
item = Content::TocItem.new(heading.text, heading['id'], []).to_h
if heading.name == "h2"
current_h2 = item
toc.push(current_h2)
else
current_h2[:children].push(item)
end
end
toc
end
Now our tests are passing, and our code matches the feature requirements. The ContentItem
object might seem like overkill, but I think that it actually helps understand what we're trying to represent.
Everything seems to be working fine. However, you might have noticed that if our content doesn't have any <h2>
tags, but if it does have one or more <h3>
tags the whole thing fails.
Let's add another test to handle that scenario:
# test/lib/content/table_of_contents_generator.rb
test "it generates a TOC if only h3 headings are present" do
html_content = "<h3 id='section-1-1'>Section 1.1</h3>"
expected = [{label: "Section 1.1", id: "section-1-1", children: []}]
toc = Content::TableOfContentsGenerator.new(html_content).generate
assert_equal(expected, toc)
end
Then, we update our code to conditionally push to the TOC directly or manipulating the current <h2>
if it's actually present:
# lib/content/table_of_contents_generator.rb
def generate
toc = []
current_h2 = nil
@document.css(TARGET_HEADINGS).each do |heading|
item = Content::TocItem.new(heading.text, heading['id'], []).to_h
if heading.name == "h2"
current_h2 = item
toc.push(current_h2)
elsif current_h2.present?
current_h2[:children].push(item)
else
toc.push(item)
end
end
toc
end
Tests should be passing once again. Now, you might notice that we're checking for specific things like heading.named == 'h2'
and that is fine by itself, but it also means that our code knows too much about the structure of the TARGET_HEADINGS
constant and that might not be desirable because it means the class is not as extensible as it can be.
Also, we're currently limited to generating a TOC up to the <h3>
level, which is fine for most content, but sometimes we need to generate a deeper TOC, Wikipedia style.
Let's start the fix by adding a test for multiple nested headings scenario:
# test/lib/table_of_contents_generator_test.rb
test "it generates a TOC with multiple nested headings" do
html_content = <<-HTML
<h2 id="section-1">Section 1</h2>
<p>I'm a part of section one</p>
<h3 id="section-1-1">Section 1.1</h3>
<p>I'm a part of section one point one</p>
<h3 id="section-1-2">Section 1.2</h3>
<p>I'm a part of section one point two</p>
<h4 id="section-1-1-1">Section 1.1.1</h3>
<p>I'm a part of section one point one point one</p>
<h2 id="section-2">Section 2</h2>
<p>I'm part of this other section</p>
HTML
expected = [
{
label: "Section 1",
id: "section-1",
children: [
{label: "Section 1.1", id: "section-1-1", children: []},
{label: "Section 1.2", id: "section-1-2", children: [{
label: "Section 1.1.1",
id: "section-1-1-1",
children: []
}]}
]
},
{label: "Section 2", id: "section-2", children: []}
]
toc = Content::TableOfContentsGenerator.new(html_content).generate
assert_equal(expected, toc)
end
Of course, after running the tests they should be failing. In order to make the tests pass we have to change the code and it gets a bit more complex:
# lib/content/table_of_contents_generator.rb
class Content::TableOfContentsGenerator
TARGET_HEADINGS = ["h2", "h3", "h4"]
def initialize(content)
@document = Nokogiri::HTML(content)
end
def generate
toc = []
last_level = 0
current_section = toc
headings.each do |heading|
# Convert the h2, h3, h4 elements into 1, 2 or 3
level = TARGET_HEADINGS.index(heading.name) + 1
if level > last_level && !current_section.empty?
current_section.last[:children] ||= []
current_section = current_section.last[:children]
elsif level < last_level
current_section = toc
end
current_section << Content::TocItem.new(heading.text.strip, heading['id'], children: []).to_h
last_level = level
end
toc
end
private
def headings
@document.css(TARGET_HEADINGS.join(", "))
end
end
class Content::TocItem < Data.define(:label, :id, :children); end
All the tests should be passing now, if you're wondering what the code above does:
- The
toc
variable is the list of items. - We define a
last_level
andcurrent_section
temp variables in order to keep track of the hierarchy of elements. Thelast_level
represents the heading level that we processed last starting with 0 while thecurrent_section
defines whether we push the item that represents the current heading to thetoc
array or if we push it to thechildren
array of the last element in the list. - Assuming we have an
<h2>
that has an<h3>
and<h4>
as “children” (they're below them in the content): for the first iterationlevel
would be 1 which means thatlevel
is greater thanlast_level
which is 0. However, thecurrent_section
would be empty so we skip the conditional altogether, and we push the item representation of the heading into thecurrent_section
which is thetoc
array. - The following heading will then be the
<h3>
and thelast_level
value is set to “1” so for the following iteration,level
is 2 so it's greater thanlevel
and thecurrent_section
is no longer empty so we define thecurrent_section
variable to equal thechildren
array of the last element which is the<h2>
tag and then, the new item gets pushed to it so we have an<h2>
heading with the<h3>
set to thechildren
. - Next comes the
<h4>
, ourlast_level
is now set to “2” andlevel
is “3” and thecurrent_section
is also not empty so the conditional evaluates to true and thecurrent_section
is now thechildren
of the latest element which is the<h3>
representation, and we append the item to it.
As you can see, the code basically decides whether the item we push to the TOC is a top level element or should be the children of the last level.
Also, please note that this feature assumes that the headings will have an id
attribute which we will use to link to the specific content section, make sure you're adding them in a callback when creating or updating articles:
class Article
before_save :add_ids_to_headings
private
def add_ids_to_headings
doc = Nokogiri::HTML(self.content)
doc.css("h2, h3, h4, h5").each do |heading|
heading.set_attribute("id", heading.parameterize)
end
self.content = doc.to_html
end
end
Now that we are generating the TOC as hash, let's make a component in charge of rendering:
Rendering the TOC with ViewComponent
The first step is to install the ViewComponent gem:
bundle add view_component
bundle install
Next, we generate our TableOfContentsComponent
from the command-line:
bin/rails generate component TableOfContents
Then we add the following:
# components/table_of_contents_component.rb
class TableOfContentsComponent < ViewComponent::Base
def initialize(html_content:)
@toc_hash = Content::TableOfContentsGenerator.new(html_content).generate
end
end
Now, in the HTML we can traverse the list and display it as we want. I'm using Tailwind for this:
<nav>
<ol>
<% @toc_hash.each do |item| %>
<li class="toc-item mb-3" data-toc-target="item" data-id="<%= item[:id] %>">
<%= link_to item[:label], "##{item[:id]}", class: "text-gray-700 text-base font-medium" %>
<% if item[:children].any? %>
<ul class="pl-3">
<% item[:children].each do |child| %>
<li class="toc-item mb-2" data-id="<%= child[:id] %>" data-toc-target="item">
<%= link_to child[:label], "##{child[:id]}", class: "text-gray-700 text-sm" %>
</li>
<% end %>
</ul>
<% end %>
</li>
<% end %>
</ol>
</nav>
As you can see, we're just iterating twice which will only show the <h2>
and direct children like <h3>
we could make a recursion based approach but, for most cases this is more than enough because we don't want to complicate the TOC too much. However, you're free to generate the view like you see fit.
Right now, the table of contents works correctly, but we can further improve it by adding an .active
class to the currently active item and handling link clicks with a smooth scroll.
Improving the navigation experience with Stimulus
First, we will add smooth scrolling to the target element when clicking on a link within the table of contents:
// app/javascript/controllers/toc_controller.js
import { Controller } from "@hotwired/stimulus";
export default class extends Controller {
static targets = ["item", "content"];
connect() {
this._setLinkScroll();
}
_setLinkScroll() {
this.itemTargets.forEach((item) => {
item.addEventListener("click", (e) => {
e.preventDefault();
const target = this.contentTarget.querySelector(
e.target.getAttribute("href")
);
target?.scrollIntoView({ behavior: "smooth", block: "start" });
});
});
}
}
Next we need to set the controller at the top level Post show view in order for the controller to have access to the table of contents itself and also the content:
<div class="max-w-screen-xl mx-auto py-16" data-controller="toc">
<div class="grid grid-cols-12 gap-x-4">
<div class="col-span-8">
<h1 class="text-3xl font-bold text-gray-900"><%= @post.title %></h1>
<%= image_tag @post.cover, class: "w-full h-auto rounded-lg mt-4" if @post.cover.present? %>
<div class="post-content mt-4 max-w-none" data-toc-target="content">
<%= sanitize @post.content, tags: %w[h2 h3 h4 h5 p ul ol li pre code strong em s], attributes: %w[id] %>
</div>
</div>
<div class="col-span-4">
<div class="sticky top-6 rounded-xl px-6 pl-4">
<h3 class="text-lg font-semibold text-gray-900 mb-2">Table of Contents</h3>
<%= render TableOfContentsComponent.new(html_content: @post.content) %>
</div>
</div>
</div>
</div>
Now, in order to give visual feedback to our users we will be adding the .active
class to the currently active item which is the heading the user is actually reading.
We make sure to add data-toc-target="item"
to the <li>
elements in the table_of_contents_component.html.erb
.
To set the behavior we do modify the toc_controller
and add an event listener for the scroll and highlight the items that are currently active:
// app/javascript/controllers/toc_controller.js
import { Controller } from "@hotwired/stimulus";
export default class extends Controller {
static targets = ["item", "content"];
connect() {
this._setLinkScroll();
window.addEventListener("scroll", this._handleScroll.bind(this));
}
disconnect() {
window.removeEventListener("scroll", this._handleScroll.bind(this));
}
_setLinkScroll() {
this.itemTargets.forEach((item) => {
item.addEventListener("click", (e) => {
e.preventDefault();
const target = this.contentTarget.querySelector(
e.target.getAttribute("href")
);
target?.scrollIntoView({ behavior: "smooth", block: "start" });
});
});
}
_highlightLink(id) {
this.itemTargets.forEach((link) => {
if (link.dataset.id === id) {
link.classList.add("active");
} else {
link.classList.remove("active");
}
});
}
_handleScroll() {
const currentScrollPosition = window.scrollY;
// Find the first heading above the viewport
const activeHeading = Array.from(
this.contentTarget.querySelectorAll("h2, h3, h4")
)
.reverse()
.find((heading) => {
const headingPosition =
heading.getBoundingClientRect().top + window.scrollY;
return headingPosition <= currentScrollPosition;
});
if (activeHeading) {
this._highlightLink(activeHeading.id);
}
}
}
As you can see, we're adding a scroll listener on connect()
and removing it on disconnect()
and we're handling the scroll in the _handleScroll
function that basically finds the first heading above the viewport and adding the active
class to the corresponding element in the table of contents.
We could also use the IntersectionObserver
API, but for a simple use case like this, we can get away with this solution.
Don't forget to add the CSS to customize the appearance of the active
elements. Get creative!
After doing all of this, the feature looks like this:
Summary
Adding a table of contents is a good idea because it can help our users better enjoy our application, especially if content is an important part of it.
In this article we created a table of contents feature with TDD by using the Nokogiri gem and some logic extracted into a custom object.
After fulfilling the requirements we added a custom component using ViewComponent and improved the experience with a Stimulus controller to handle smooth scrolling to the target section and active item highlighting.
There are probably simpler ways to build a table of contents feature, but I wanted to showcase how a more complex example could be worked out.
Hope you found the article useful and that it helps you achieve your goal.
Happy coding!