Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft Bing Robots.txt Tester said it does not accept Disallow line with empty value #101

Open
tony95271 opened this issue Aug 4, 2020 · 1 comment

Comments

@tony95271
Copy link

tony95271 commented Aug 4, 2020

django-robots it generates following by default

User-agent: *
Disallow:

Sitemap: https://mysite.com/sitemap.xml

Bing Robots.txt Tester reports error on line 2.

@some1ataplace
Copy link

Microsoft Bing Robots.txt Tester may report an error on line 2 of the default robots.txt file generated by django-robots because it contains a Disallow directive with an empty value. According to the Robots Exclusion Protocol, Disallow directives should specify a path or a pattern of paths that are not allowed to be crawled by the specified user agents.

To fix the error reported by Bing Robots.txt Tester, you can either remove the Disallow directive altogether, or specify a path or a pattern of paths that should be disallowed. For example, if you want to disallow crawling of all pages under the /admin/ path, you can modify the robots.txt file as follows:

User-agent: *
Disallow: /admin/

Sitemap: https://mysite.com/sitemap.xml


It's possible to modify the default output of django-robots to address Bing's requirement.

Instead of simply specifying Disallow:, you can list out all the pages you want to disallow access to:

User-agent: *
Disallow: /admin/
Disallow: /secret_page/
Disallow: /unpublished_articles/

Sitemap: https://mysite.com/sitemap.xml

This format includes a specific path to disallow, rather than just having an empty value. You can customize this list of paths as needed for your site.

To implement this in django-robots, you can define a ROBOTS_DISALLOWED_URLS dictionary in your Django settings file:

ROBOTS_DISALLOWED_URLS = {
    'User-agent': {
        'Disallow': [
            '/admin/',
            '/secret_page/',
            '/unpublished_articles/',
        ],
    },
}

This will generate a robots.txt file with the format shown above, specifying each path you want to disallow. You can adjust the list of URLs to match the pages you want to block access to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants