Hello everyone and thanks for dropping by. Today I would like to explore how one might go about creating their own sitemap generator script for a static Next.js website. Of course there are packages out there that do just this, but there is great value in writing your own and understanding the solution fully.
I'll assume that readers have some familiarity with Next.js, JavaScript, Node.js and SEO to set the tone for what follows. In addition, I assume the reader has a static Next.js website or can create one to follow along with the post. Ok, now let's get started.
The motivation for this post really came from some work I was doing on my personal website: afro-cloud.com. The goal of the site is to be an engaging and educational blog that is communicated in a digestible way for any reader. I wanted to customize the blog as much as I'd like with little to no cost and reacquaint myself with React.js. I initially thought a vanilla React.js application would be great, but SPAs have their limitations when it comes to SEO if you aren't server side rendering your pages. So I chose to host a static Next.js site on Google Cloud Storage to make my content indexable by search engines.
Let's get refocused on the concept of sitemaps. You can think of sitemaps as a list of pages for your website that help search engines understand the contents of your website. In short, they tell search engine crawlers what contents on your site are important to you. But sitemaps aren't just limited to pages, they can be used to indicate other types of content on your site like video for example. So if you want your pages to be indexed by a search engine like Google, it's important to indicate to their crawlers what content matters to you. The sitemap.xml file conveys this information and we will be building a script to generate one for a static Next.js website.
Next.js has two different routing paradigms: "Using app router" and "Using page router". For the purposes of this post, we will be using the app router approach. The TLDR is that all pages for our website will be contained within a top-level folder called app/. But within app/, we can have nested directories to represent the hierarchy of pages for our site. The way we indicate to Next.js that we have pages for our website is by having a page.jsx or page.tsx file within those directories. Take this directory structure for our app represented in a nested list for example:
For this example we have the following page paths:
For sake of example, let's say we're building a website with the domain: wethebest.com. If we translate the above list to client routes we get:
Since we are using a static website approach, we need to map our client side routes to the fully qualified path of the HTML files corresponding to each page to ensure refreshes navigate to the same page. Doing so yields the following:
Ok, so now we have some pages, but what does Next.js offer to create a sitemap.xml for our application? Next.js provides some support, but you have to define a configuration object that corresponds to what you want in your sitemap.xml. But we can do better. I for one am lazy and don't want to add a new entry to a configuration every time I add a page to my site. Let's create our own script to generate a sitemap for us.
So how might you start to tackle this problem of generating our own sitemap.xml generator script? We know how the client side routing works in Next.js for our use case. We know that our sitemap.xml URL entries need to have the static HTML file included in the URL for each page. We know that whenever we add a new page to our site, we want the URL of that page to be reflected in our sitemap.xml. Hopefully you're thinking about reading the content of our app directory and using that information to build our sitemap. If not, no worries! We'll go through it together. Let's first start by creating a sitemap.js file at the root of our project alongside our package.json. The contents of the file will be the following to start:
const fs = require("node:fs");
const { SitemapStream, streamToPromise } = require("sitemap");
const { Readable } = require("stream");
const SITEMAP_HOSTNAME = "https://www.wethebest.com";
(async () => {
// Create [SitemapStream] that will ultimately populate our sitemap.xml.
const stream = new SitemapStream({
hostname: SITEMAP_HOSTNAME,
});
})();
We'll be using the Node.js file system sync API to read the contents of our directories. We'll also be using an NPM package called sitemap that provides utilities to help make creating our sitemap easy. In order to take advantage of the sitemap package, we'll need to create a readable stream that Node.js provides. We have a constant variable declaration that we'll use to build our URLs for the sitemap.xml file. Finally, we have an immediately invoked function expression which will be executed whenever we execute this file. Now let's add the code to start traversing our app/ directory and find the pages for our website.
The first thing we'll want to do is start exploring our top-level app/ folder and find all of our pages file paths. The reason we are doing this is that we can map the file path to the corresponding HTML files for our site (look back at the routing section if you need a refresher). We can use the following code to do that Update your sitemap.js file to look like this:
const fs = require("node:fs");
const { SitemapStream, streamToPromise } = require("sitemap");
const { Readable } = require("stream");
const PAGE_FILENAME = "page.tsx";
const SITEMAP_HOSTNAME = "https://www.wethebest.com";
const SITEMAP_PATHS = [];
Populates [SITEMAP_PATHS] with the fully qualified URLs for our pages to be contained
within our sitemap.xml file.
@param {string} path The path we are currently searching.
@param {string} isRoot Flag that denotes whether or not we are processing our root path.
@return {void}
*/
const getAllUrlsForSitemap = (path, isRoot) => {
// Get all files within our current directory
const files = fs.readdirSync(path);
files.forEach((file) => {
// Not using template strings due to article format on website.
const dirPath = 'path' + '/' + 'file';
if (file.includes(PAGE_FILENAME) && !isRoot) {
SITEMAP_PATHS.push(dirPath);
} else {
try {
const stats = fs.statSync(dirPath);
if (stats.isDirectory()) {
// Continue searching until we exhaust all directories.
getAllUrlsForSitemap(dirPath);
}
} catch (err) {}
}
});
};
(async () => {
// Create [SitemapStream] that will ultimately populate our sitemap.xml.
const stream = new SitemapStream({
hostname: SITEMAP_HOSTNAME,
});
// Populate [SITEMAP_PATHS] based on our [PAGE_FILENAME].
getAllUrlsForSitemap(APP_ROOT, true);
})();
You'll notice that we have a new method declaration: getAllUrlsForSitemap. This method navigates our app/ directory contents and searches for page.tsx files as well as other directories. We have a special case for our root page (app/page.tsx) because we'll add that entry later. Since we have some nested paths, we will need to explore more than the top-level app folder. The output of the readdirSync gives us all the contents of the directory and we loop through the contents to check for our pages and nested directories. Finally, when we find a page, we add the corresponding path to a variable SITEMAP_PATHS that we will use for processing later on. Now let's tie it all together.
We have all of the paths for the pages we need to generate our sitemap, now all we have to do is map those to the HTML files and create the sitemap. Update your sitemap.js file to look like this:
const fs = require("node:fs");
const { SitemapStream, streamToPromise } = require("sitemap");
const { Readable } = require("stream");
const APP_ROOT = "/Users/claudioherrera/Documents/GitHub/wethebest.com/app";
const PAGE_FILENAME = "page.tsx";
const SITEMAP_HOSTNAME = "https://www.wethebest.com";
const SITEMAP_PATHS = [];
const TODAYS_DATE_AS_ARRAY = new Date().toLocaleDateString().split("/");
const WRITE_FILE_PATH = "./app/sitemap.xml";
/**
Populates [SITEMAP_PATHS] with the fully qualified URLs for our pages to be contained
within our sitemap.xml file.
@param {string} path The path we are currently searching.
@param {string} isRoot Flag that denotes whether or not we are processing our root path.
@return {void}
*/
const getAllUrlsForSitemap = (path, isRoot) => {
// Get all files within our current directory
const files = fs.readdirSync(path);
files.forEach((file) => {
// Not using template strings due to article format on website.
const dirPath = 'path' + '/' + 'file';
if (file.includes(PAGE_FILENAME) && !isRoot) {
SITEMAP_PATHS.push(dirPath);
} else {
try {
const stats = fs.statSync(dirPath);
if (stats.isDirectory()) {
// Continue searching until we exhaust all directories.
getAllUrlsForSitemap(dirPath);
}
} catch (err) {}
}
});
};
/**
Returns an object containing the url and lastmod sitemap properties.
@param {string} path The path we are currently searching.
@param {string} isRoot Flag that denotes whether or not we are processing our root path.
@return {Object} url: URL for page on website. lastmod: last modification date for our page.
*/
const mapToSitemapFormat = (path, lastmod) => ({
// Not using template strings due to article format on website.
url: path
.replace(APP_ROOT + '/', "")
.replace('/' + PAGE_FILENAME, "") + '.html',
lastmod,
});
/**
Returns a string representation of a date unit. e.g. dd, mm, yy
@param {string} unit Date unit as a number in string format. e.g. month, day, year
@param {string} isRoot Flag that denotes whether or not we are processing our root path.
@return {Object} url: URL for page on website. lastmod: last modification date for our page.
*/
const padDateUnit = (unit) => ('unit'.length < 2 ? '0' + 'unit' : unit);
(async () => {
console.info("Generating sitemap...");
// Create [SitemapStream] that will ultimately populate our sitemap.xml.
const stream = new SitemapStream({
hostname: SITEMAP_HOSTNAME,
});
// Populate [SITEMAP_PATHS] based on our [PAGE_FILENAME].
getAllUrlsForSitemap(APP_ROOT, true);
try {
// Create our sitemap.xml contents. Not using template strings due to article format on website.
const LAST_MODIFIED_DATE = TODAYS_DATE_AS_ARRAY[2] + '-' + padDateUnit(
TODAYS_DATE_AS_ARRAY[0]
) + '-' + padDateUnit(TODAYS_DATE_AS_ARRAY[1]);
const ROOT_SITEMAP_ENTRY = {
url: SITEMAP_HOSTNAME,
lastmod: LAST_MODIFIED_DATE,
};
const content = await streamToPromise(
Readable.from([
ROOT_SITEMAP_ENTRY,
...SITEMAP_PATHS.map((url) =>
mapToSitemapFormat(url, LAST_MODIFIED_DATE)
),
]).pipe(stream)
).then((data) => data.toString());
console.info("Successfully generated sitemap...");
// Write our sitemap contents to [WRITE_FILE_PATH].
fs.writeFileSync(WRITE_FILE_PATH, content);
} catch (err) {
console.error(err);
}
})();
The core of the difference between this snippet and the one above is that we have now tied together the last two pieces to generate our sitemap. We are mapping our SITEMAP_PATHS to the sitemap.xml format by creating objects that contain a url and lastmod key. These objects are then fed into the sitemap package to generate the sitemap in XML format. Finally, we write the sitemap.xml to the location of our choosing. Let's execute the sitemap.js script by running "node sitemap.js" (note you will need to have Node.js installed to be in the current working directory of the sitemap file). You should now see a sitemap.xml file in your app folder and it should look something like:
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>https://www.wethebest.com/</loc>
<lastmod>2024-03-10T00:00:00.000Z</lastmod>
</url>
<url>
<loc>https://www.wethebest.com/about.html</loc>
<lastmod>2024-03-10T00:00:00.000Z</lastmod>
</url>
<url>
<loc>https://www.wethebest.com/blog.html</loc>
<lastmod>2024-03-10T00:00:00.000Z</lastmod>
</url>
<url>
<loc>https://www.wethebest.com/blog/post1.html</loc>
<lastmod>2024-03-10T00:00:00.000Z</lastmod>
</url>
<url>
<loc>https://www.wethebest.com/blog/post2.html</loc>
<lastmod>2024-03-10T00:00:00.000Z</lastmod>
</url>
</urlset>
Nice! Now, whenever you add a new page to your Next.js app, you can easily include it in your sitemap. I highly recommend adding a new production build rule to your application to incorporate this sitemap generation whenever you deploy a new version of your website. I hope you found this post useful. Cheers!
Also appears here.