-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement X-Robots-tag headers (avoid dev/stage websites "leaking" into SERPs) #804
Comments
it may not be needed in most development cases, but good if this snippet could be added to doc or wiki as a 'nice to have' for some use cases. |
@maniqui could you add this to the wiki in the make it better section? |
@nimbupani Done, under the .htaccess section I've added a link to this issue. Now, I'll close it. |
Thanks Julián! |
<FilesMatch ".(doc|pdf|png|jpe?g|gif)$"> Sorry, how does this discriminate between development and live sites? I am looking for a way to discriminate between the two using a conditional statement, that way the X-Robots tag doesn't have to be manually taken out of the .htaccess file when deployed to live. Will this work? |
@tanwill The snippet you quote doesn't discriminate between development and live sites. The FilesMatch directive limits the scope of the enclosed directive to files that meet the matched pattern. The entire block sets headers every time and then the enclosed FilesMatch directive sets different headers for a subset of files. Also, off the top of my head I don't think there's any way to do what you want to do. But, I could be wrong. Stack Overflow might be a good place to search to see if there's some way to do that. |
Thank you, @roblarsen. |
I just found a way to do this, @roblarsen. Instead of editing the .htaccess file, just edit header.php. Put the conditional there:
|
Ah, yeah, of course you can do that sort of logic in PHP. I was referring to doing it in this context- in htaccess. Glad to see that you got your problem sorted out. |
Thanks. Yeah, it's a workaround. I think you're right in that there may not be a non-convoluted way to do this in .htaccess. |
I would tweak the regex ever so slightly to capture a few additional file types (.woff, .woff2, .ttf for fonts, and .ppt, .pptx/.xls, .xlsx/.doc, .docx, .dot, .dotx/ for documents and match .htm as well as .html) and use it as: |
You could try something like this for a
|
Short version: On public websites being developed, disallowing crawlers on robots.txt isn't enough.
See:
http://yoast.com/prevent-site-being-indexed/
http://code.google.com/web/controlcrawlindex/docs/robots_meta_tag.html
Matt Cutt (from Google Webmaster Central) on this topic:
http://www.youtube.com/watch?v=KBdEwpRQRD0
Long version:
Yes, compliant robots won't crawl it, but that doesn't exactly means that the website (well, its URL) won't get indexed.
Matt Cutt explains it fairly well in the linked video, but here's the short version: if your website gets linked from somewhere else (other websites?) out there, Google may then list your website's URL in SERPs, even if Google respects your robots.txt.
At least, I've experienced (probably, I'm the guilty one for being careless, for not doing the robots.txt disallow dance) one or two times that development websites are "leaked" into Google SERPs.
Why? I can't tell exactly.
Maybe I or other developer or the client "talked" about the dev website (dev.example.com) on some email exchange via GMail, and thus, the Big Google Brother "got to know" about the existence of dev.example.com...
So, once your development website gets leaked into SERPs, you probably want to remove it from there.
Doing the robots.txt disallow dance may not help on that, as your website already leaked.
Thus, X-Robots-tag headers may be a good approach to solve the situation, imo.
Even more, this approach is not to be used in tandem with disallowing crawlers via robots.txt.
From Google's docs on Control Crawling and Indexing:
So, here is the proposed snippet to add in .htaccess.
Please, feel free to improve the wording I've used to try to explain this.
In any case, if it doesn't fit into the goal/philosophy of H5BP, I expect others could benefit from this knowledge.
The text was updated successfully, but these errors were encountered: