Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonical links breaking spiders #181

Closed
sminnee opened this issue Jul 9, 2018 · 6 comments
Closed

Canonical links breaking spiders #181

sminnee opened this issue Jul 9, 2018 · 6 comments

Comments

@sminnee
Copy link
Member

sminnee commented Jul 9, 2018

Following up on #156 and #180.

According to Swiftype support staff, having a canonical URL link that point to a redirect back to your current page is a bad thing to do for all search engines. It is breaking Swiftype's ability to index any content and they've claimed that it's a confusing thing to do for all spiders, including Google's.

If we want to use Swiftype, we'd need to work a way of reducing duplicate search results in google without creating infinite loops in our canonical links:

Comments below suggest that we fix this in the following way:

This solution was rejected:

@robbieaverill
Copy link
Contributor

Yep I agree

@dhensby
Copy link
Contributor

dhensby commented Jul 11, 2018

The best way to think of a canonical tag is as a hidden 301 for spiders. It basically says "this page doesn't really exist, you should be indexing <canonical page> instead.

So yes, it is really bad for all spiders if you want them to be able to index all the content for all the versions. edit: sorry, missed the point. It is super bad if the canonical URL points at a URL that then does a proper 301 back to the current URL.

The problem is how do we then stop google showing results for older versions above new versions? The page ranking takes in a plethora of signals, but a fairly well understood one is how many links there are to a page. It only makes sense that the old docs pages will have more inbound links than the newer ones and @chillu has identified that this is a problem that other docs sites suffer too.

readthedocs.io uses canonical links in a similar way to us (view the source of https://phpunit.readthedocs.io/en/7.0/installation.html).

(edit: This paragraph is still true, but not too relevant to what Sam has raised) Now, I'd suggest that Swiftype shouldn't be acting as a normal spider, it should be indexing all content for search and allowing the customer to say which pages should or shouldn't be used for search results (perhaps a "respect canonical tags" flag is in order). I would not consider it unusual that a website would have it's own search index that doesn't respect canonical tags because they are intended to only be for public search engines. Of course we can't expect them to change their product overnight, but this is a point that should be put to them.

@chillu
Copy link
Member

chillu commented Jul 11, 2018

Dan's RTD example:

<!-- 
Always link to the latest version, as canonical.
http://docs.readthedocs.org/en/latest/canonical.html
-->
<link rel="canonical" href="https://phpunit.readthedocs.io/en/7.1/installation.html" />

That looks like Sam's second option. If RTD has decided that's a good option for them, I think it should be our default choice as well. If there was any issues with frequently changing the canonical tags, they would've picked that up by now. I don't see any relevant open issues about it. As a sanity check, Laravel Docs do this as well.

@dhensby
Copy link
Contributor

dhensby commented Jul 12, 2018

Ah, sorry - I think I've got the wrong end of the stick here. I didn't realise our canonical tags pointed to a URL which itself 301-ed to somewhere (and occasionally the same page that had a canonical URL that had just redirected us back). Yes; that's super bad and needs to stop.

I'd go for the 2nd option; as @chillu points out that seems fairly standard and I think it's more helpful for users to see the version number in the URL.

It will still be interesting to see if Swiftype will index docs pages that do have canonical tags (eg: 3.x docs pages) as it seems they are respecting them a bit too eagerly. Clearly Google can handle canonical tags with this kind of redirect loop.

@sminnee
Copy link
Member Author

sminnee commented Jan 31, 2019

Clearly Google can handle canonical tags with this kind of redirect loop.

Google are... big.

@sminnee
Copy link
Member Author

sminnee commented Nov 19, 2019

This has been resolved with the Gatsby rewrite

@sminnee sminnee closed this as completed Nov 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants