Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with meta tags #1228

Merged
merged 21 commits into from
Jul 2, 2021

Conversation

ybnd
Copy link
Member

@ybnd ybnd commented Jun 14, 2021

References

Description

Implement Google Scholar review suggestions for meta tags

Instructions for Reviewers

The following meta tags were changed:

  • citation_pdf_url: select Bitstreams ~ MIME type allowlist (if more than one option and no primary in ORIGINAL Bundle)
  • citation_abstract_html_url: use dc.identifier.uri if available, fall back to current URL
  • citation_publisher: use dc.publisher (except dissertations & tech reports, those use other tags)
  • citation_datecitation_publication_date
  • Remove og:title and og:description meta tags

Testing

  1. Open some Item pages and inspect the head > meta in their HTML

    • citation_abstract_html_url should not contain links to localhost, but should contain links ~ the Item's handle dc.identifier.uri is filled out. Otherwise, it should match the Item page URL
    • citation_publisher should be present if dc.publisher is filled out
    • citation_publication_date should be present
    • The following meta tags should not be present: citation_date, og:title, og:description
  2. For Items with one ORIGINAL Bitstream

    • citation_pdf_url should contain a link to that Bitstream
  3. For Items with more than one ORIGINAL Bitstream

    • If a primary Bitstream is specified, citation_pdf_url should link to it
    • Otherwise, citation_pdf_url should contain a link to the first ORIGINAL Bitstream with one of these MIME types

Checklist

  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes TSLint validation using yarn run lint
  • My PR doesn't introduce circular dependencies
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • If my PR includes new, third-party dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.

@lgtm-com
Copy link

lgtm-com bot commented Jun 14, 2021

This pull request introduces 1 alert when merging 64049fd into d253790 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

@artlowel artlowel added bug component: SEO Search Engine Optimization e/4 Estimate in hours high priority testathon Reported by a tester during Community Testathon labels Jun 14, 2021
@artlowel artlowel added this to the 7.0 milestone Jun 14, 2021
@lgtm-com
Copy link

lgtm-com bot commented Jun 14, 2021

This pull request introduces 1 alert when merging 04b4f1c into d253790 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

@ybnd
Copy link
Member Author

ybnd commented Jun 14, 2021

@Art @tdonohue LGTM & unit test fix incoming

@ybnd ybnd force-pushed the w2p-79768_fix-issues-with-meta-tags branch from aab8bd3 to 34b117e Compare June 14, 2021 08:53
@tdonohue tdonohue self-requested a review June 17, 2021 14:47
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ybnd and @artlowel : Overall, this fixes the issues you've described, but I've noticed an odd caching issue when testing with this PR. It seems the <meta> tags don't always refresh when you browse to a new page. Here's what I'm doing:

  1. Start on a Community page, click on a Collection, then on an Item. Select "Inspect" in your browser and look at the <head> section.
    • You'll see a <meta property='title'> with the Collection Name and a second with the Item name.
    • You'll see a <meta property='citation_title>` with the Collection Name and a second with the Item name.
    • All other <meta> tags for the Item will be there though, which is correct.
  2. Click back to the Collection. Select "Inspect" in your browser and look at the <head> section.
    • The Item <meta> tags (from step 1) are still visible. They seem to be cached, as they obviously shouldn't appear on a Collection.
  3. Click on a different Item in the same Collection. Select "Inspect" in your browser and look at the <head> section.
    • Now, <meta> tags are still odd...Some still appear from the first Item (from step 1) while others are correct for the Item you are on.

Overall, the behavior I'm seeing almost seems like a caching issue...like these <meta> tags are not being entirely "reset" (to empty) when you browse to a new page, and this results in very odd behavior. (NOTE This caching issue doesn't appear to be specific to this PR...as I can also sometimes get it to occur on demo7.dspace.org. So, if you'd rather treat this as a separate bug, we can do so. Besides this bug, the rest of the behavior of this PR seems correct.)

@ybnd
Copy link
Member Author

ybnd commented Jun 28, 2021

@ybnd and @artlowel : Overall, this fixes the issues you've described, but I've noticed an odd caching issue when testing with this PR. It seems the <meta> tags don't always refresh when you browse to a new page

@tdonohue @artlowel After a bit of poking around on the current demo7.dspace.org I only ever saw the <title> meta tag change, and not even consistently.

Looks like this distinctUntilKeyChanged filters out way more changes than it should, e.g. when going back/forward between a Collection & an Item I see dso.uuid change in that pipe but it doesn't pass that filter... not sure what that's about.

@artlowel
Copy link
Member

artlowel commented Jun 30, 2021

@tdonohue and @ybnd I fixed the issue. The reason was not only the distinctUntilKeyChanged issue @ybnd mentioned, but the fact that the service responded both to the route changing, and calls for DSO page components initializing. With the second call often starting before the first had finished processing, which caused some tags that were just added to be immediately removed again, others added twice etc.

I fixed it by removing the calls on DSO pages, and only listening to route changes. That's possible now (but wasn't when the metadataservice was originally written) because we're using resolvers on those pages, and they all put the resolved object in the dso property in the route data.

I spent an additional 4 hours on this though, most of that not on the solution itself, but on the tests. They were basically integration tests in a unit test file. Didn't mock anything which caused them to break with the slightest change. I practically had to rewrite them from scratch, getting rid of anything that isn't strictly necessary for the test.

@artlowel artlowel requested a review from tdonohue June 30, 2021 12:27
@tdonohue
Copy link
Member

Thanks @artlowel . I'll add 4 hours to the original ticket and re-test this as soon as possible.

@tdonohue tdonohue removed the e/4 Estimate in hours label Jun 30, 2021
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@artlowel & @ybnd : I retested today. With @artlowel 's updates, the behavior seems improved, however, I'm still seeing the citation_author field (and seemingly only that field) be cached between pages. Here's what I'm doing:

  1. Running yarn start
  2. Bring up UI. Immediately do inspect and open up the <head> section in a separate window
  3. Now, in the UI, browse to an Item. I see the <meta> tags auto-refresh in my "inspect" window.
  4. From that Item page, click on the Collection in breadcrumbs. Most <meta> tags auto-refresh, but the citation_author tags from Item still exist from the previous page.
  5. Click on Community in Breadcrumbs. Again, most <meta> tags auto-refresh, but the citation_author tags still exist from Item.

Last time I tested, I saw this behavior across the <meta property="title"> and <meta property="citation_title"> tags...that bug seems fixed. But somehow the behavior still exists for citation_author. Is there something in the UI that could be caching the author names themselves? It's odd that this specific field is now the issue.

Overall, again, I'm in favor of this PR. I'd just really like to find a way to improve SEO by ensuring we always have accurate <meta> tags. If you both feel this final bug needs to be a separate ticket, we can split it out...but I'd still rate it as high priority for 7.0.

@tdonohue tdonohue requested a review from atarix83 July 1, 2021 14:37
@ybnd
Copy link
Member Author

ybnd commented Jul 2, 2021

@tdonohue Should be fixed now 👍

@tdonohue tdonohue self-requested a review July 2, 2021 14:04
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Perfect! Thanks @ybnd and @artlowel ! This is now working perfectly for me & is ready to be merged. Thanks for tracking down what was going on with the meta tags!

@tdonohue
Copy link
Member

tdonohue commented Jul 2, 2021

Merging this immediately, as it was previously discussed as 1 approval in meetings (it includes a large number of refactored/updated specs), and I'd like to get it in place for final quick tests from Google Scholar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component: SEO Search Engine Optimization high priority testathon Reported by a tester during Community Testathon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Several minor SEO issues in HTML <meta> tags
3 participants