Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a canonical URL meta tag #302

Merged
merged 2 commits into from
May 10, 2018
Merged

Conversation

thomasleese
Copy link
Contributor

This is built from the base_path of the content item which means it strips out any query parameters or fragments.

Trello Card

@tijmenb tijmenb temporarily deployed to govuk-publishing-compon-pr-302 May 10, 2018 08:27 Inactive
@kevindew
Copy link
Member

Would this be for every page? I don't know a huge amount about govuk_publishing_components so I may be wrong but off the top of my head pages like https://www.gov.uk/foreign-travel-advice/thailand/safety-and-security have different path to base_path (basically any with a prefix base_path)

@thomasleese
Copy link
Contributor Author

@kevindew Yeah, I think it should be for every page. I did have a think about whether to use base_path or take the request page. I stuck with base_path as I think that's actually a better definition of the canonical URL, in the case of foreign travel advice I think you could argue that it makes sense for /foreign-travel-advice/thailand to be the canonical URL for /foreign-travel-advice/thailand/safety-and-security but of course we have a problem when the canonical URL ends up being /government. Do you think it would be better to use the request path?

@sihugh
Copy link
Contributor

sihugh commented May 10, 2018

Might be worth talking to a content designer about this? Mainstream guides have a bunch of parts with the same base path but different slugs. Some are more coherent as a single entity than others. I'd recommend Jon Sanger

@kevindew
Copy link
Member

@thomasleese Sure

I don't think it's correct that /foreign-travel-advice/thailand/safety-and-security has a canonical of /foreign-travel-advice/thailand as google would not index safety and security in that case.

I'm not really sure about request paths either as I'm not sure if there are places where we use query strings to indicate particular content (maybe smart answers but not sure if we index those)

My instinct is that this might just have to be not a catch-all solution and more of an opt-in depending on what is being rendered. That's on the basis that I think incorrectly implemented canonical tags will cause more harm than them missing.

Do we have a contact in SEO that we can hit up with questions?

@thomasleese
Copy link
Contributor Author

@kevindew In terms of query strings, we've decided that content where the user can filter shouldn't be index and that's covered in https://trello.com/c/PHjO8hP2/139-noindex-pagination-from-external-search-2. Smart answers uses paths not query strings to indicate content so that shouldn't be a problem.

I'm more in favour of using the request path now as the canonical URL, but it's true that applying this to all pages there are inevitably cases where either option doesn't work and could cause harm. We could have logic in the component that understands how to build the canonical URL of different document types, but that does feel better suited in the individual frontend apps.

Based on this and our earlier discussion on Slack @bevanloon and @tijmenb what do you think?

@thomasleese
Copy link
Contributor Author

(For context, @kevindew, my initial idea was to have a blacklist of query parameters we know that we want to remove from the request URL to build a canonical URL (i.e. utm_) and then leave the rest untouched, meaning we get a slight improvement on the situation now and we can add to that blacklist as we see fit.)

@kevindew
Copy link
Member

okey doke, cool re query strings.

If we use request path it feels like it's sufficiently dumb it might not provide much value to me. As it stands google uses sitemap to work out what it thinks is the canonical path so I can't imagine it's getting caught mistaking analytics parameters often and then that means if we have instances of rendering same content on multiple pages they'll be flagged as having different canonical paths.

Sorry Tom, I know I'm getting more caught up in the value of the task than the implementation which isn't particularly helpful.

@thomasleese thomasleese force-pushed the add-canonical-url-meta-tag branch from 9fc7acd to 9ea8b80 Compare May 10, 2018 11:12
@tijmenb tijmenb temporarily deployed to govuk-publishing-compon-pr-302 May 10, 2018 11:13 Inactive
@@ -106,6 +107,14 @@ def add_step_by_step_tags(meta_tags)
meta_tags
end

def add_canonical_tag(meta_tags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we allow them to specify the base_path through a local assign? and then fall back to base_path?

It might be even clearer still if rather than include_canonical as a boolean we just had canonical_path as a local assign then it'd be less magical and more configurable. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed up a change. It makes the attribute configurable but if you leave it as true it sticks with the default base path (to avoid having to add too much logic to the frontend apps), what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change looks good to me 👍 - although as it's a path I think we should still prefix with Plek website root when it's configured

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've amended that now!

@thomasleese thomasleese force-pushed the add-canonical-url-meta-tag branch from 9ea8b80 to 602c29d Compare May 10, 2018 11:28
@tijmenb tijmenb temporarily deployed to govuk-publishing-compon-pr-302 May 10, 2018 11:28 Inactive
@thomasleese thomasleese force-pushed the add-canonical-url-meta-tag branch from 602c29d to 23e197c Compare May 10, 2018 11:35
@tijmenb tijmenb temporarily deployed to govuk-publishing-compon-pr-302 May 10, 2018 11:35 Inactive
it "renders the canonical URL in a meta tag if overridden" do
render_component(
content_item: { base_path: "/test" },
canonical_path: "http://www.dev.gov.uk/test2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just this to go 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you spotted...

@thomasleese thomasleese force-pushed the add-canonical-url-meta-tag branch 2 times, most recently from 30e44e9 to d1d9cfb Compare May 10, 2018 11:36
@tijmenb tijmenb temporarily deployed to govuk-publishing-compon-pr-302 May 10, 2018 11:36 Inactive
assert_empty render_component(content_item: {
base_path: "/test"
}).strip
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I was all set to give the ✅ but there's some quote inconsistencies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I've made them all single quotes I think... I like to think they were consistent with the inconsistent style of the file...

This is an optional meta tag which defaults to not being shown.
It's built from the base_path of the content item which means it
strips out any query parameters or fragments.
@thomasleese thomasleese force-pushed the add-canonical-url-meta-tag branch from d1d9cfb to e40ea33 Compare May 10, 2018 11:41
@tijmenb tijmenb temporarily deployed to govuk-publishing-compon-pr-302 May 10, 2018 11:41 Inactive
Copy link
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

This is required as the updated meta tag component uses it to build
a URL.
@tijmenb
Copy link
Contributor

tijmenb commented May 10, 2018

I'm away today but let's chat tomorrow how this would work. If it's an optional thing it might be better to implement per-app anyway, it's more explicit. I'm not sure apps would know whether or not to add the tag currently.

@andysellick andysellick mentioned this pull request May 10, 2018
thomasleese added a commit that referenced this pull request May 11, 2018
…tag"

This reverts commit b53fa52, reversing
changes made to 8f9ed42.

After adding this we realised that it would be better placed for the
individual frontend applications to add the canonical tag as they work
closer to the content and would understand how to generate different
canonical URLs depending on document type and various other factors.

Although we could choose to leave it in here, we felt it would be
better for apps to be explicit about the tag rather than having a
default which might cause more harm than good.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants