Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improved sitemap #3579

Merged
merged 8 commits into from
Jun 16, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .changeset/popular-cherries-float.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
'@astrojs/sitemap': minor
---

# Key features

- Split up your large sitemap into multiple sitemaps by custom limit.
- Ability to add sitemap specific attributes such as `lastmod` etc.
- Final output customization via JS function.
- Localization support.
- Reliability: all config options are validated.

## Important changes

The integration always generates at least two files instead of one:

- `sitemap-index.xml` - index file;
- `sitemap-{i}.xml` - actual sitemap.
1 change: 1 addition & 0 deletions examples/integrations-playground/astro.config.mjs
Original file line number Diff line number Diff line change
@@ -9,5 +9,6 @@ import solid from '@astrojs/solid-js';

// https://astro.build/config
export default defineConfig({
site: 'https://example.com',
integrations: [lit(), react(), tailwind(), turbolinks(), partytown(), sitemap(), solid()],
});
183 changes: 182 additions & 1 deletion packages/integrations/sitemap/README.md
Original file line number Diff line number Diff line change
@@ -64,7 +64,35 @@ export default {
}
```

Now, [build your site for production](https://docs.astro.build/en/reference/cli-reference/#astro-build) via the `astro build` command. You should find your sitemap under `dist/sitemap.xml`!
Now, [build your site for production](https://docs.astro.build/en/reference/cli-reference/#astro-build) via the `astro build` command. You should find your _sitemap_ under `dist/sitemap-index.xml` and `dist/sitemap-0.xml`!

Generated sitemap content for two pages website:

**sitemap-index.xml**

```xml
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://stargazers.club/sitemap-0.xml</loc>
</sitemap>
</sitemapindex>
```

**sitemap-0.xml**
<?xml version="1.0" encoding="UTF-8"?>

```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>https://stargazers.club/</loc>
</url>
<url>
<loc>https://stargazers.club/second-page/</loc>
</url>
</urlset>
```

You can also check our [Astro Integration Documentation][astro-integration] for more on integrations.

@@ -111,5 +139,158 @@ export default {
}
```

### entryLimit

Non-negative `Number` of entries per sitemap file. Default value is 45000. A sitemap index and multiple sitemaps are created if you have more entries. See explanation on [Google](https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps).

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
entryLimit: 10000,
}),
],
}
```

### changefreq, lastmod, priority

`changefreq` - How frequently the page is likely to change. Available values: `always` \| `hourly` \| `daily` \| `weekly` \| `monthly` \| `yearly` \| `never`.

`priority` - The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0.

`lastmod` - The date of page last modification.

`changefreq` and `priority` are ignored by Google.

See detailed explanation of sitemap specific options on [sitemap.org](https://www.sitemaps.org/protocol.html).


:exclamation: This integration uses 'astro:build:done' hook. The hook exposes generated page paths only. So with present version of Astro the integration has no abilities to analyze a page source, frontmatter etc. The integration can add `changefreq`, `lastmod` and `priority` attributes only in a batch or nothing.

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
changefreq: 'weekly',
priority: 0.7,
lastmod: new Date('2022-02-24'),
}),
],
}
```

### serialize

Async or sync function called for each sitemap entry just before writing to a disk.

It receives as parameter `SitemapItem` object which consists of `url` (required, absolute page URL) and optional `changefreq`, `lastmod`, `priority` and `links` properties.

Optional `links` property contains a `LinkItem` list of alternate pages including a parent page.
`LinkItem` type has two required fields: `url` (the fully-qualified URL for the version of this page for the specified language) and `hreflang` (a supported language code targeted by this version of the page).

`serialize` function should return `SitemapItem`, touched or not.

The example below shows the ability to add the sitemap specific properties individually.

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
serialize(item) {
if (/your-special-page/.test(item.url)) {
item.changefreq = 'daily';
item.lastmod = new Date();
item.priority = 0.9;
}
return item;
},
}),
],
}
```

### i18n

To localize a sitemap you should supply the integration config with the `i18n` option. The integration will check generated page paths on presence of locale keys in paths.

`i18n` object has two required properties:

- `defaultLocale`: `String`. Its value must exist as one of `locales` keys.
- `locales`: `Record<String, String>`, key/value - pairs. The key is used to look for a locale part in a page path. The value is a language attribute, only English alphabet and hyphen allowed. See more about language attribute on [MDN](https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang).


Read more about localization on Google in [Advanced SEO](https://developers.google.com/search/docs/advanced/crawling/localized-versions#all-method-guidelines).

__astro.config.mjs__

```js
import sitemap from '@astrojs/sitemap';

export default {
site: 'https://stargazers.club',
integrations: [
sitemap({
i18n: {
defaultLocale: 'en', // All urls that don't contain `es` or `fr` after `https://stargazers.club/` will be treated as default locale, i.e. `en`
locales: {
en: 'en-US', // The `defaultLocale` value must present in `locales` keys
es: 'es-ES',
fr: 'fr-CA',
},
},
}),
],
};
...

```

The sitemap content will be:

```xml
...
<url>
<loc>https://stargazers.club/</loc>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/"/>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/"/>
</url>
<url>
<loc>https://stargazers.club/es/</loc>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/"/>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/"/>
</url>
<url>
<loc>https://stargazers.club/fr/</loc>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/"/>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/"/>
</url>
<url>
<loc>https://stargazers.club/es/second-page/</loc>
<xhtml:link rel="alternate" hreflang="es-ES" href="https://stargazers.club/es/second-page/"/>
<xhtml:link rel="alternate" hreflang="fr-CA" href="https://stargazers.club/fr/second-page/"/>
<xhtml:link rel="alternate" hreflang="en-US" href="https://stargazers.club/second-page/"/>
</url>
...
```

[astro-integration]: https://docs.astro.build/en/guides/integrations-guide/
[astro-ui-frameworks]: https://docs.astro.build/en/core-concepts/framework-components/#using-framework-components
11 changes: 9 additions & 2 deletions packages/integrations/sitemap/package.json
Original file line number Diff line number Diff line change
@@ -13,20 +13,27 @@
},
"keywords": [
"astro-component",
"seo"
"seo",
"sitemap"
],
"bugs": "https://github.com/withastro/astro/issues",
"homepage": "https://astro.build",
"exports": {
".": "./dist/index.js",
"./package.json": "./package.json"
},
"files": [
"dist"
],
"scripts": {
"build": "astro-scripts build \"src/**/*.ts\" && tsc",
"build:ci": "astro-scripts build \"src/**/*.ts\"",
"dev": "astro-scripts dev \"src/**/*.ts\""
},
"dependencies": {},
"dependencies": {
"sitemap": "^7.1.1",
"zod": "^3.17.3"
},
"devDependencies": {
"astro": "workspace:*",
"astro-scripts": "workspace:*"
5 changes: 5 additions & 0 deletions packages/integrations/sitemap/src/config-defaults.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import type { SitemapOptions } from './index';

export const SITEMAP_CONFIG_DEFAULTS: SitemapOptions & any = {
entryLimit: 45000,
};
9 changes: 9 additions & 0 deletions packages/integrations/sitemap/src/constants.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
export const changefreqValues = [
'always',
'hourly',
'daily',
'weekly',
'monthly',
'yearly',
'never',
] as const;
55 changes: 55 additions & 0 deletions packages/integrations/sitemap/src/generate-sitemap.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import { SitemapItemLoose } from 'sitemap';

import type { SitemapOptions } from './index';
import { parseUrl } from './utils/parse-url';

const STATUS_CODE_PAGE_REGEXP = /\/[0-9]{3}\/?$/;

/** Construct sitemap.xml given a set of URLs */
export function generateSitemap(pages: string[], finalSiteUrl: string, opts: SitemapOptions) {
const { changefreq, priority: prioritySrc, lastmod: lastmodSrc, i18n } = opts || {};
// TODO: find way to respect <link rel="canonical"> URLs here
const urls = [...pages].filter((url) => !STATUS_CODE_PAGE_REGEXP.test(url));
urls.sort((a, b) => a.localeCompare(b, 'en', { numeric: true })); // sort alphabetically so sitemap is same each time

const lastmod = lastmodSrc?.toISOString();
const priority = typeof prioritySrc === 'number' ? prioritySrc : undefined;

const { locales, defaultLocale } = i18n || {};
const localeCodes = Object.keys(locales || {});

const getPath = (url: string) => {
const result = parseUrl(url, i18n?.defaultLocale || '', localeCodes, finalSiteUrl);
return result?.path;
};
const getLocale = (url: string) => {
const result = parseUrl(url, i18n?.defaultLocale || '', localeCodes, finalSiteUrl);
return result?.locale;
};

const urlData = urls.map((url) => {
let links;
if (defaultLocale && locales) {
const currentPath = getPath(url);
if (currentPath) {
const filtered = urls.filter((subUrl) => getPath(subUrl) === currentPath);
if (filtered.length > 1) {
links = filtered.map((subUrl) => ({
url: subUrl,
lang: locales[getLocale(subUrl)!],
}));
}
}
}

return {
url,
links,
lastmod,
priority,
changefreq, // : changefreq as EnumChangefreq,
} as SitemapItemLoose;
});

return urlData;
}
Loading