Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "noindex" tag to hidden or removed pages to keep them out of search engines #131

Open
BobCorrigan opened this issue Jan 6, 2016 · 0 comments
Assignees

Comments

@BobCorrigan
Copy link

After the recent spam attack on EOL, it's become clear we need to ensure that bad content - if not completely removed from the EOL database - is at least appropriately marked to ensure it doesn't end up in Google or other search engines. The overall page rank and reputation of EOL is negatively impacted by bad content, so this is urgent.

Google documents two approaches (source: https://support.google.com/webmasters/answer/81749/?utm_source=wnc_634403&utm_medium=gamma&utm_campaign=wnc_634403&utm_content=msg_694100&hl=en):

Use "nofollow" tags

Together with Yahoo! and MSN, Google introduced the "nofollow" HTML microformat several years ago, and the attribute has been widely adopted. Any link with the rel="nofollow" attribute will not be used to calculate PageRank or determine the relevancy of your pages for a user query. For example, if a spammer includes a link in your comments like this:

This is a nice site!
it will get converted to:

This is a nice site!
This new link will not be taken into account when calculating PageRank. This won't prevent spam, but it will avoid problems with passing PageRank and deter spammers from targeting your site. By default, many blogging sites (such as Blogger) automatically add this attribute to any posted comments.

Prevent untrusted content from showing in search

If your site allows users to create pages like profile pages, forum threads, or websites, you can deter spam abuse by preventing new or untrusted content from showing up in search.

For example, you can use the noindex meta standard to block access to pages for new and not-yet-trusted users. Like this:

Or you can use the robots.txt standard to temporarily block the page:

Disallow:/guestbook/newpost.php
Once you believe the user is legitimate and not a spammer, you can remove the crawling or indexing restrictions. There are a number of ways that you can tell if a new user is a spammer, including using signals from your community.


Recommended approaches:

  1. Physically remove offending pages from the EOL database. If it's not there, it can't be crawled.
  2. When a page containing spam content is marked "hidden" or disassociated with a taxon, it should be marked "nofollow" per recommendation 1 above. This is the most urgent of the activities to prevent future issues.
  3. Pages on the site that are at risk of abuse by users (e.g., user profiles) should be marked "noindex"
@AmrMMorad AmrMMorad assigned AmrMMorad and unassigned AmrMMorad Jan 12, 2016
@YoustinaAtef YoustinaAtef self-assigned this Jan 12, 2016
@BobCorrigan BobCorrigan changed the title Add "nofollow" tag to hidden or removed content to keep out of search engines Add "noindex" tag to hidden or removed pages to keep them out of search engines Jan 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants