Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new configuration property "sitemapPath" to change the sitemap path #74

Closed
wants to merge 4 commits into from

Conversation

dj-fiorex
Copy link
Contributor

Description

Hello, this is my first PR, i needed a way to have a custom path for the sitemap.xml file instead of the hardcoded /sitemap.xml, so i added a new configuration property called sitemapPath that is a string | undefined and can be set via cli/ci/configuration file

Example configuration file:

export default {
  site: 'https://google.com',
  outputPath: 'google-ci-2246',
  ci: {
    buildStatic: true
  },
  scanner: {
    sitemapPath: 'sitemap/index.xml',
    maxRoutes: 50,
    device: 'desktop'
  },
  debug: true
}

@netlify
Copy link

netlify bot commented May 9, 2023

Deploy Preview for unlighthouse canceled.

Name Link
🔨 Latest commit c037874
🔍 Latest deploy log https://app.netlify.com/sites/unlighthouse/deploys/645acbe81447aa0008aa00fe

@harlan-zw
Copy link
Owner

Hey @dj-fiorex, thanks for the PR! The code looks good.

I think one fundamental issue outside of your great work, is the robots.txt configuration isn't respected. I've made a PR to support that here: #79

One breaking change in that with your code is that we support multiple sitemaps. I've added support for this via the config, but if you wanted to update this PR to support is via the command line then would be happy to work with you on getting it mertged.

@dj-fiorex
Copy link
Contributor Author

dj-fiorex commented May 10, 2023

Hello @harlan-zw , thanks for your words, much appreciated. I think the best thing I could do is waiting that your Mr is merged and then add my Config again.
In this way you can

  • get urls from sitemap taken from robots
  • get urls from sitemap taken from sitemapPath
  • get urls from crawling

What do you think?

P.s. please can you write some docs on how to enable/disable reading from robots?

@harlan-zw
Copy link
Owner

Thanks again for the work @dj-fiorex. Closing this in favor for #116

@harlan-zw harlan-zw closed this May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants