Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEO Audits] Document has a valid rel=canonical #3178

Closed
rviscomi opened this issue Aug 29, 2017 · 18 comments
Closed

[SEO Audits] Document has a valid rel=canonical #3178

rviscomi opened this issue Aug 29, 2017 · 18 comments

Comments

@rviscomi
Copy link
Member

rviscomi commented Aug 29, 2017

Audit group: Content best practices
Description: Document has a valid rel=canonical
Failure description: Document does not have a valid rel=canonical ({value})
Help text: Canonical links suggest which URL to show in search results. Read more in Use canonical URLs.

Success conditions:

  • Query selector head > link[rel=canonical] doesn’t match any elements; otherwise
    • href value of canonical link is not set to the root; otherwise
      • origins of href and current page are different
      • location.pathname is not the root.
  • href value of canonical link is absolute

Valid examples:

  • URL example.de/, <link rel="canonical" href="example.com/de/">
  • URL example.com/de/, <link rel="canonical" href="example.de/">

Invalid examples:

  • URL example.com/blog/, <link rel="canonical" href="example.com/">
  • URL example.de/, <link rel="canonical" href="example.com/">
  • URL example.com/de/, <link rel="canonical" href="/">
@kdzwinel
Copy link
Collaborator

For the record: we agreed that we should also support canonical Link headers.

Questions:

  1. How do we deal with multiple canonicals? There can be multiple headers and/or tags. ATM I'm processing only the first one found (headers before tags). But according to this if there are multiple canonicals pointing to different URLs, search engines will ignore all of them.

  2. How do we deal with invalid URLs? ATM I'm failing the audit when new URL(href) fails.

  3. Where is this part coming from? I didn't found a rule that would explain this.

href value of canonical link is not set to the root; otherwise
origins of href and current page are different
location.pathname is not the root.

Why example.de/ can't have a canonical link to example.com/, but it can have one to example.com/de/?

  1. Your examples indicate that canonical can point to different domain. Official docs seem to agree saying "With content syndication, it's also easy for content to be distributed to different URLs and domains entirely.". However, all the examples given in the docs show the same domain. Also, according to yahoo and bing different domains are not supported. Should we check that with John?

@rviscomi
Copy link
Member Author

Good questions :)

How do we deal with multiple canonicals?

We should fail the audit and specify the failure reason being conflicting canonical links.

How do we deal with invalid URLs? ATM I'm failing the audit when new URL(href) fails.

Failing when invalid SGTM. Again, we should be clear about the failure reason.

Where is this part coming from?

The advice came from an offline thread. Forwarding to you for context.

Your examples indicate that canonical can point to different domain.

Yeah worth checking to be sure. It does seem like the docs use different domains, for example with/without the www subdomain, although having different TLDs may not be valid.

@kdzwinel
Copy link
Collaborator

kdzwinel commented Dec 11, 2017

Thanks for a quick response!

The advice came from an offline thread. Forwarding to you for context.

Thanks! It does sound a bit GoogleBot specific, doesn't it?

It does seem like the docs use different domains, for example with/without the www subdomain, although having different TLDs may not be valid.

Right, I meant TLDs. Sorry for the confusion.


I absolutely agree that we should show failure reason, especially that now we have couple of them. Should we just use the displayValue (it's the one appended to the end of the failure description)? Table feels like an overkill here and debugString (red text) is rather used for audit failures.

ATM we have these failure reasons:

  • "multiple conflicting URLs"
  • "invalid URL"
  • "relative URL"
  • "points to a different TLD" (to be checked with John)
  • "???" (one with canonical link set to root)

How that list looks to you? I'd appreciate a bit of help with a copy for the last one :)

@kdzwinel
Copy link
Collaborator

kdzwinel commented Dec 22, 2017

To sum up our email/hangouts discussion, we decided to fail in these cases:

  1. multiple conflicting URLs
  2. invalid URL
  3. relative URL
  4. current URL and canonical URL have different domains
  5. current URL is not a root but points to a root of the same origin
  6. current URL is a hreflang and canonical URL is a different hreflang

While writing tests I found two more edge cases:

  1. current URL and canonical URL have different domains - I assume that the domain is last two parts of the hostname (e.g. test.example.com), unfortunately this assumption breaks for second-level domains (one.co.uk and two.co.uk will be considered having same domain). Only solution I can see here involves creating a safelist of all second-level domains, but this seems impractical. IMO we should keep current solution.

  2. This article suggests that we should fail not only when multiple conflicting canonical URLs are found but always when multiple canonical URLs are found. I'll double check that with John.

@kdzwinel
Copy link
Collaborator

@rviscomi I got asked why we fail canonical audit when someone has the same canonical in both request header and head of the page (happens for e.g. https://www.12starsmedia.com/). At first it felt like a bug, but this quote tells me that we did it on purpose (?)

This article suggests that we should fail not only when multiple conflicting canonical URLs are found but always when multiple canonical URLs are found. I'll double check that with John.

I does fill counterintuitive and linked article doesn't really say, now that I reread it, what I claimed it was saying 🤔Do you remember the discussion about it?

@TimothyLoyer
Copy link

Maybe this? I'm not certain if they mean identical in this case, but it could make sense. Admittedly our duplicate is due to our using the SEOmatic plugin for Craft. If it seems likely Google would penalize us for this, we're more than happy to address it with the plugin authors.

In cases of multiple declarations of rel=canonical, Google will likely ignore all the rel=canonical hints. Any benefit that a legitimate rel=canonical might have offered will be lost.

@kdzwinel
Copy link
Collaborator

I'm not certain if they mean identical in this case

Yeah, I'm not 100% sure about that either. Maybe Rick will remember, and if not, we will double check with John.

@TimothyLoyer
Copy link

TimothyLoyer commented Apr 11, 2018

Also, in the second point of the conclusions.

Check that rel=canonical is only specified once (if at all) and in the head of the page.

Thank you for all your help with this @kdzwinel. :)

@rviscomi
Copy link
Member Author

I don't remember exactly, but rereading the doc, it does seem like we're doing the right thing. Please do reach out to John and confirm that having the same canonical URL in both a header and meta tag is invalid.

@khalwat
Copy link

khalwat commented Apr 12, 2018

This article suggests that we should fail not only when multiple conflicting canonical URLs are found but always when multiple canonical URLs are found. I'll double check that with John.

Hello everyone, I'm the author of the SEOmatic plugin for Craft CMS 2 that @TimothyLoyer has referenced.

What I'm doing is adding the exact same canonical URL both as a tag, and also as a header.

I also do this for the robots tag and X-Robots-Tag -- neither was done for any specific reason, other than "why not?"

From the linked article:

Another issue is when pages include multiple rel=canonical links to different URLs. This happens frequently in conjunction with SEO plugins that often insert a default rel=canonical link, possibly unbeknownst to the webmaster who installed the plugin. In cases of multiple declarations of rel=canonical, Google will likely ignore all the rel=canonical hints. Any benefit that a legitimate rel=canonical might have offered will be lost.

To me this implies that there is only an issue if there are multiple conflicting canonical URLs? If they are the same URL, whether appearing multiple times as a tag or one as a tag, another as a header... I'm not seeing anything stating this is an issue?

I'm happy to alter the plugin to do whatever best practices are, but on this topic, I wasn't able to find anything definitive one way or another?

c.f.: nystudio107/craft-seomatic#68

@rviscomi
Copy link
Member Author

IMO the ambiguity comes from this sentence:

In cases of multiple declarations of rel=canonical, Google will likely ignore all the rel=canonical hints.

Out of context, it's not clear if "multiple" refers to any two canonical URLs. The previous sentence about different URLs could just be an example of a common cause of this type of error, or it could be the only case.

I reached out to my resident SEO expert and I'll update this thread with their guidance.

@khalwat
Copy link

khalwat commented Apr 12, 2018

Ultimately, what really matters is how Google, and to a lesser extent, other search engines handle this situation. I would think that as long as the canonical URLs are not in conflict, that it should be okay with it, but I have no knowledge of Google's internal workings on this front.

@auralon
Copy link

auralon commented Apr 13, 2018

Given that it is highlighted as a problem by the Lighthouse tool (which is a Google product), I'd guess that it is recommended to only serve one canonical URL (be it via header or link tag). However, multiple robots tags are not highlighted as an issue by the Lighthouse tool.

@khalwat
Copy link

khalwat commented Apr 13, 2018

@auralon That definitely could be; but Google is a big place, and the team that works on Lighthouse may or may not overlap the team that works on GoogleBot.

@auralon
Copy link

auralon commented Apr 13, 2018

@khalwat true, true!

@rviscomi
Copy link
Member Author

rviscomi commented Apr 13, 2018

I got confirmation from John Mueller himself (thanks John!) that this is a bug. We should only fail when there are multiple different canonical URLs.

Reopening the issue. @TimothyLoyer @khalwat @auralon would either of you like to implement the fix? @kdzwinel is working on a higher priority issue (#4359) so this may not be fixed as quickly.

@rviscomi rviscomi reopened this Apr 13, 2018
@khalwat
Copy link

khalwat commented Apr 13, 2018

Great, thanks for tracking this down!

@TimothyLoyer
Copy link

Thank you, all, for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants