Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEO Audits] Page is not blocked from indexing #3182

Closed
rviscomi opened this issue Aug 29, 2017 · 2 comments
Closed

[SEO Audits] Page is not blocked from indexing #3182

rviscomi opened this issue Aug 29, 2017 · 2 comments

Comments

@rviscomi
Copy link
Member

rviscomi commented Aug 29, 2017

Audit group: Crawling and indexing
Description: Page is not blocked from indexing
Failure description: Page is blocked from indexing
Help text: The “Robots” directives tell crawlers how your content should be indexed. Directives like noindex and none prevent indexing. Learn more about the Robots meta tag and header.

Success conditions:

  • No tag matching the query selector meta[name=robots] has a content attribute with a comma-separated value in the blocklist below
  • X-Robots-Tag response header does not have a value in the blocklist below

Blocklist:

  • noindex
  • none

Notes:

  • Site owners can block crawlers via: meta tags, robots.txt rules, or X-Robots-Tag response header.
    • In next version, check that location.href is not disallowed by robots.txt
  • The robots meta name attribute is interchangeable with other bot names including:
    • msnbot
    • bingbot
    • google
    • yandex
@kdzwinel
Copy link
Collaborator

Edge cases and my current strategy to deal with them:

  1. NOINDEX - not sure if uppercase/mixedcase is supported - assuming it's valid
  2. noindex nofollow - not sure if space (instead of coma) is supported - assuming it's not valid
  3. noindex, nofollow, all - not sure if all overrides previous restrictions - assuming it's not valid
  4. unavailable_after - not supporting
  5. <meta name="googlebot" - not supporting

@rviscomi let me know if'd like to make any changes.

@rviscomi
Copy link
Member Author

rviscomi commented Oct 19, 2017

  1. NOINDEX - not sure if uppercase/mixedcase is supported - assuming it's valid

👍

  1. noindex nofollow - not sure if space (instead of coma) is supported - assuming it's not valid

👍

  1. noindex, nofollow, all - not sure if all overrides previous restrictions - assuming it's not valid

Let's fail as long as we see anything in the blocklist.

  1. unavailable_after - not supporting

Let's try to support this. Fail the audit if the date is in the past, otherwise provide some kind of warning that the page will not be indexable soon.

  1. meta name="googlebot" - not supporting

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants