Generating a Sitemap from an Algolia Index

This tutorial will teach you how to use algolia-sitemap, an open-source wrapper for algoliasearch that allows you to dynamically generate sitemaps from your Algolia indices. Since your content is dynamic, using algolia-sitemap ensures your sitemap and content stay synchronized. It does so using a sitemap index file, which allows you to have multiple sitemaps, each with up to 50k entries.

Go here for information on the impact of Algolia on SEO.

Why generate a sitemap

And what is a sitemap? Having great content and UX is only good if people can actually find it. Search Engine Optimization (SEO) is a key traction strategy for most websites, and sitemaps play a big role in it. A sitemap is a file that describes all the pages of your website. It’s meant to be consumed by a machine (Googlebot or other crawlers) to help it index your content. It gives precious information like which page should be prioritized, or how often a page updates.

Sitemaps are especially effective when the content is loaded asynchronously. That’s the case for single page applications, progressive web apps, or any page that displays content from a fetch request, which is the case when we use Algolia on the front-end.

Because of the flexibility of facets, Algolia can power navigation in addition to search result pages. That lets you have dynamic category pages based on the products in your index. Those are great candidates to add to your sitemap! Actually, since your index holds all the information about your products, we can use it to generate the whole sitemap of your website.

Dataset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[
  {
    "name": "Apple - iPhone 5s 16GB",
    "shortDescription": "...",
    "bestSellingRank": 339,
    "thumbnailImage": "...",
    "salePrice": 699.99,
    "manufacturer": "Apple",
    "url": "...",
    "type": "HardGood",
    "image": "...",
    "customerReviewCount": 1397,
    "categories": ["Mobile Phones", "Phones & Tablets"],
    "shipping": "Free shipping",
    "salePrice_range": "501 - 2000",
    "objectID": "1752456"
  }
]

We will use an e-commerce dataset where each record is a product. They all have a categories attribute which can hold one or more categories. To follow along, you can download the dataset. You can also have a look at how to import it in Algolia.

Installing algolia-sitemap

Before starting, you need to install algolia-sitemap. It’s a package that requires Node v6 or above. You can make sure you have it by doing node -v. To include it in your project, you can use yarn and do yarn add algolia-sitemap or npm install algolia-sitemap.

Create a sitemap of all the records in your index

Let’s create a sitemap with all the products of our catalog to make sure search engines know where to find them. There are two steps to do it with algolia-sitemap. But before that, let’s create a generate_sitemap.js file.

First is the configuration step. There, you need to enter your index’s credentials (APP_ID and API KEY). Make sure that the key has a browse permission. You can generate one from the API Keys tab of your dashboard, or use your Admin API Key. Be careful not to share that key with anyone!

1
2
3
4
5
const algoliaConfig = {
  appId: YourApplicationID,
  apiKey: YourSearchOnlyAPIKey, // Must have a browse permission
  indexName: your_index_name,
};

Once configured, you have to provide the hitsToParams callback to the algoliaSitemap function. This callback will be called once for each record in your index, and allows you to map a record to an entry in your sitemap. The return value of your callback must be an object whose attributes are the same as those of a <url> entry in a sitemap.xml file. Only loc is required, the rest is optional:

  • loc: The URL of the detail page
  • [lastmod] The last modified date
  • [priority]: The priority of this page compared to other pages in your site (values range from 0.0 to 1.0)
  • [changefreq] : Describes how frequently the page is likely to change
  • [alternates] : Alternate versions of this link
  • [alternates.languages]: Array of languages that are enabled for this link
  • [alternates.hitToURL] : A function to transform a language into a URL

For more information about the format of the values, check the sitemap protocol page.

We will keep it simple and only output the loc for each of our products:

1
const hitToParams = ({ url }) => ({ loc: url })

Now we have everything we need to generate our sitemaps. Let’s call the algoliaSitemap function:

1
2
3
4
5
6
algoliaSitemap({
  algoliaConfig,
  sitemapLoc: 'https://example.com/sitemaps',
  outputFolder: 'sitemaps',
  hitToParams
});

If you follow along, make sure to modify the credentials and the parameter of the hitsToParams to make it correspond to your record. You will also need to create a /sitemaps folder where all the sitemaps will be generated.

Now we can run node generate_sitemaps.js. It will create sitemaps in /sitemap. As you will notice, there are two types of sitemap files:

  • the sitemap-index file has a link to each sitemaps
  • the sitemaps have links to your products

    To make sure the generated sitemaps are correct, we can use any sitemap validator online like XML Sitemap Checker. This website isn’t operated by Algolia, so we won’t be able to provide support on it.

Create a sitemap for categories

Let’s see how we can generate entries for category pages. While it might sound obvious, we need to make sure the categories can be accessed by a URL. Remember, our records have a categories attribute that looks like that:

1
2
3
{
  "categories": ["Mobile Phones", "Phones & Tablets"]
}

We can see that the product belongs to two categories. Let’s assume that those categories can be accessed by going to https://example.com/CATEGORY_NAME.

The idea is to modify our hitToParams function to return an array of all the categories that belong to the passed hit, only if those categories have not been already added.

Let’s declare a variable to hold all the processed categories:

1
const alreadyAdded = {};

This will allow us to know if a category has already been processed by doing alreadyAdded[c]. Let’s leverage this in our new hitToParams function:

1
2
3
4
5
6
7
8
9
10
11
12
const hitToParams = ({ categories }) => {
  const newCategories = categories.filter(category => !alreadyAdded[category]);
  if (!categories || !categories.length || !newCategories.length) {
    return false;
  }
  return newCategories.map(category => {
    alreadyAdded[category] = category
    return {
      loc: `https://example.com/${category}`
    }
  });
}

For each hit, we check if there are categories that haven’t been added to the sitemap yet. When that’s the case, we mark them and add them! Now we can save all our category pages to our sitemap! You can check this example directly from our repository.

To make sure the generated sitemaps are correct, we can use any sitemap validator online like XML Sitemap Checker. This website isn’t operated by Algolia, so we won’t be able to provide support on it.

Create a sitemap for both products and categories

By doing a simple modification to the above script, we can use algolia-sitemap to generate a sitemap for both our products and our categories. We only need to push the current product in addition to its categories. Here’s how:

1
2
3
4
5
6
7
8
9
10
11
12
const hitToParams = ({ categories, url }) => {
  const newCategories = categories.filter(category => !alreadyAdded[category]);
  if (!categories || !categories.length || !newCategories.length) {
    return false;
  }
  return [...newCategories.map(category => {
    alreadyAdded[category] = category
    return {
      loc: `https://example.com/${category}`
    }
  }), { loc: url }];
}

Ping search engines to let them know that the sitemap changed

Finally, we can let search engines know that our sitemap changed directly from our script. Most search engines have a ping mechanism that allows users to inform them of a new sitemap. For Google and Bing, all we need is to send a GET request:

  • Google: http://www.google.com/webmasters/sitemaps/ping?sitemap=http://example.com/sitemap.xml
  • Bing: http://www.bing.com/webmaster/ping.aspx?siteMap=http://example.com/sitemap.xml

Use your favorite http client to do it. In this example, we use node-fetch:

1
2
3
4
5
6
Promise.all([
  fetch('http://www.google.com/webmasters/sitemaps/ping?sitemap=http://example.com/sitemap.xml'),
  fetch('http://www.bing.com/webmaster/ping.aspx?siteMap=http://example.com/sitemap.xml')
]).then(() => {
  console.log('Done')
})

Conclusion

We saw how to generate a sitemap from our index. By doing so, we make sure search engines know how to access our product pages and our category pages. We also have a way to dynamically keep our index and sitemap in sync. Finally, we inform the search engines that our sitemap has changed, so they can re-index it.

Did you find this page helpful?