Rails: Easy Sitemaps

A simple way to improve your SEO is letting google know what is available. Sure they crawl you automatically, but nothing is perfect. The easiest way to give them the info they need is to create sitemaps and submit them to Google's Search Console tool.

The Controller

The most simple way to start with sitemaps in rails is to create a sitemaps controller. To begin, it'll have two actions, but it can easily grow over time if necessary.

class SitemapsController < ApplicationController
  def index
    respond_to do |format|
      format.xml
    end
  end

  def pages
    respond_to do |format|
      format.xml
    end
  end
end

Note: Make sure that if you require authentication in ApplicationController, you remove it from this controller.

index will return a sitemap of sitemaps. You can substitute the lastmod time with whatever makes sense.

# index.xml.builder
xml.instruct!
xml.sitemapindex xmlns: "http://www.sitemaps.org/schemas/sitemap/0.9" do
  xml.sitemap do
    xml.loc sitemap_pages_url
    xml.lastmod Time.utc(2020, 12, 21, 11).strftime("%Y-%m-%dT%H:%M:%S+00:00")
  end
end

pages will act as the individual sitemap referenced above in our sitemap of sitemaps. Here is an example of the sitemap for www.flippercloud.io.

# pages.xml.builder
xml.instruct!
xml.urlset xmlns: "http://www.sitemaps.org/schemas/sitemap/0.9" do
  xml.url do
    xml.loc root_url
    xml.lastmod Time.utc(2020, 12, 21, 11).strftime("%Y-%m-%dT%H:%M:%S+00:00")
  end
  xml.url do
    xml.loc documentation_url
    xml.lastmod Time.utc(2020, 12, 21, 11).strftime("%Y-%m-%dT%H:%M:%S+00:00")
  end
  xml.url do
    xml.loc sign_up_url
    xml.lastmod Time.utc(2020, 12, 21, 11).strftime("%Y-%m-%dT%H:%M:%S+00:00")
  end
  xml.url do
    xml.loc sign_in_url
    xml.lastmod Time.utc(2020, 12, 21, 11).strftime("%Y-%m-%dT%H:%M:%S+00:00")
  end
  xml.url do
    xml.loc password_reset_url
    xml.lastmod Time.utc(2020, 12, 21, 11).strftime("%Y-%m-%dT%H:%M:%S+00:00")
  end
end

Right now, we don't have many pages, so it is quite simple. Just remember, this builder view is plain old Ruby. If your pages are in the database, you can query and iterate them and set the lastmod time to updated_at or whatever makes sense.

Down the road, if you add more sitemaps (say one for blog posts and categories) all you need to do is create another action and add the sitemap to your sitemaps index.

The last piece to make your sitemaps work is a few routes:

get "/sitemap.xml", to: "sitemaps#index", as: :sitemaps
get "/sitemap-pages.xml", to: "sitemaps#pages", as: :sitemap_pages

The Tests

We can test these routes manually in a browser, but since we want to ensure we don't accidentally break these, we'll drop some tests in.

require 'test_helper'

class SitemapsControllerTest < ActionDispatch::IntegrationTest
  test "GET index renders list of sitemaps" do
    get sitemaps_path, env: {"HOST" => "www.flippercloud.io"}
    assert_response :success
    assert_select "sitemapindex sitemap loc", "http://www.flippercloud.io/sitemap-pages.xml"
  end

  test "GET show renders sitemap" do
    get sitemap_pages_path, env: {"HOST" => "www.flippercloud.io"}
    assert_response :success
    assert_select "urlset url loc", "http://www.flippercloud.io/"
    assert_select "urlset url loc", "http://www.flippercloud.io/docs"
    assert_select "urlset url loc", "http://www.flippercloud.io/signup"
    assert_select "urlset url loc", "http://www.flippercloud.io/signin"
    assert_select "urlset url loc", "http://www.flippercloud.io/password-reset"
  end
end

Submitting to Google Search Console

While these changes are deploying you can add your app as a property in Search Console. Then, once they are out in production, you can head to the Sitemaps page and submit them to google.

A couple minutes of work and now Google has a much better idea of what is available on your site and when it was last updated.

Don't Forget Other Robots

Now that you've submitted these to google, the last step is to declare them in your robots.txt file. This makes it easy for other robots to pick them up (DuckDuckGo, etc.).  

It's as easy as adding a few lines like this:

Sitemap: https://www.flippercloud.io/sitemap.xml
Sitemap: https://www.flippercloud.io/sitemap-pages.xml

More Complex Example

Also, sitemaps can be as complex as you need. For example, speakerdeck.com's are quite a bit more complicated.

class SitemapsController < ApplicationController
  layout nil

  def index
    start_date = Time.utc(2010, 10)
    end_date = Time.now.to_date

    @months = []
    date = start_date.beginning_of_month
    while date <= end_date.beginning_of_month do
      @months << date.to_date
      date = date.advance(months: 1)
    end

    respond_to do |format|
      format.xml
    end
  end

  def month
    headers['Content-Type'] = 'application/xml'
    month_start = Time.utc(params[:year], params[:month], 1).beginning_of_month
    month_end = month_start.end_of_month
    @talks = Talk.published.where("created_at BETWEEN :month_start AND :month_end", month_start: month_start, month_end: month_end).sorted.limit(50_000).includes(:owner)

    respond_to do |format|
      format.xml
    end
  end
end

This generates a sitemap of sitemaps with a sitemap per month of all published and publicly viewable talks. The index view then iterates the months:

xml.instruct!
xml.sitemapindex xmlns: "http://www.sitemaps.org/schemas/sitemap/0.9" do
  @months.each do |month|
    xml.sitemap do
      xml.loc sitemap_url(year: month.year, month: month.month)
      xml.lastmod month
    end
  end
end

And the individual month sitemap queries for the talks published during that time frame and iterates them to generate the urlset:

xml.instruct!
xml.urlset xmlns: "http://www.sitemaps.org/schemas/sitemap/0.9" do
  @talks.each do |talk|
    xml.url do
      xml.loc owner_talk_url(talk.owner, talk)
      xml.lastmod talk.updated_at.strftime("%Y-%m-%dT%H:%M:%S+00:00")
    end
  end
end

Hat tip to Esteve Castells for the recommendation to do a sitemap per month for Speaker Deck.