Skip to content

Implement respectRobotsTxtFile crawler option #1144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
B4nan opened this issue Apr 8, 2025 · 1 comment
Open

Implement respectRobotsTxtFile crawler option #1144

B4nan opened this issue Apr 8, 2025 · 1 comment
Assignees
Labels
product roadmap Issues synchronized to product roadmap. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@B4nan
Copy link
Member

B4nan commented Apr 8, 2025

This option automatically fetches the robots.txt file based on the current request and adheres to the disallow directives.

JS version was implemented via the following PRs:

We will first need to implement the RobotsTxtFile and Sitemap classes:

@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Apr 8, 2025
@B4nan
Copy link
Member Author

B4nan commented Apr 8, 2025

If this will be part of a major bump, we can enable the respectRobotsTxtFile by default, we want to do that in the JS version too once we get to the v4.

@B4nan B4nan added the product roadmap Issues synchronized to product roadmap. label Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product roadmap Issues synchronized to product roadmap. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants