Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict Angular SSR to paths in the sitemap #3682

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alanorth
Copy link
Contributor

@alanorth alanorth commented Nov 22, 2024

References

Description

Only enable Angular SSR for paths in the DSpace sitemap. This is a compromise after analyzing high CPU usage in DSpace 7+ and discussion with the Google Scholar team. We do not need to be wasting CPU and memory to generate and store SSR pages in the cache for request paths that are not "primary" DSpace objects, for example search and browse—these request paths contain data derived from the primary objects themselves and bots can spend endless time crawling them.

This solution was originally proposed by @vitorsilverio in #3110 (comment). I have opted to include the list of SSR paths as a const instead of including them from another file because it is cleaner.

Some notes:

  • This will require manual porting to DSpace 7
  • We should keep our eye on upstream work related to inlineCriticalCss because it improves the user experience. We disabled it in DSpace 7.6.2 and 8.1 because it made SSR perform even more poorly

Instructions for Reviewers

Please add a more detailed description of the changes made by your PR. At a minimum, providing a bulleted list of changes in your PR is helpful to reviewers.

List of changes in this PR:

  • Restrict SSR to request paths for primary DSpace objects like bitstreams, items, entities, communities, and collections

Include guidance for how to test or review your PR.
Try browsing the repository to see if all pages work as expected.

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You do not need to complete this checklist prior creating your PR (draft PRs are always welcome).
However, reviewers may request that you complete any actions in this list if you have not done so. If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • My PR is created against the main branch of code (unless it is a backport or is fixing an issue specific to an older branch).
  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes ESLint validation using npm run lint
  • My PR doesn't introduce circular dependencies (verified via npm run check-circ-deps)
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • My PR aligns with Accessibility guidelines if it makes changes to the user interface.
  • My PR uses i18n (internationalization) keys instead of hardcoded English text, to allow for translations.
  • My PR includes details on how to test it. I've provided clear instructions to reviewers on how to successfully test this fix or feature.
  • If my PR includes new libraries/dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR includes new features or configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

Because Angular SSR is not very efficient, after discussion with
the Google Scholar team we realized a compromise would be to only
use SSR for pages in the DSpace sitemap.
@alanorth alanorth added bug high priority performance / caching Related to performance, caching or embedded objects port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release port to dspace-8_x This PR needs to be ported to `dspace-8_x` branch for next bug-fix release labels Nov 22, 2024
@alanorth
Copy link
Contributor Author

Tests are failing because CI is checking for SSR on /home. We can fix this by:

  1. Adding /home to the SSR paths, or
  2. Using another path

The first option is probably the best because /home is one of the only paths that is guaranteed to work by default in DSpace. On the other hand, I just realized our list of SSR-enabled paths will include such endless tarpits like:

https://demo.dspace.org/entities/person/3b087e38-cd6b-4d85-9409-99a9f6f03425?spc.page=1&query=search

With entity search pages we have many combinations of pages depending on filters and number of items similar to /search. Bots will crawl those and get SSR pages, which is a massive waste of CPU and memory.

Perhaps this requires a re-think. What about inverting the logic and enabling SSR for everything, but disabling it on certain paths?

@ybnd
Copy link
Member

ybnd commented Nov 22, 2024

On the other hand, I just realized our list of SSR-enabled paths will include such endless tarpits like:

https://demo.dspace.org/entities/person/3b087e38-cd6b-4d85-9409-99a9f6f03425?spc.page=1&query=search

With entity search pages we have many combinations of pages depending on filters and number of items similar to /search. Bots will crawl those and get SSR pages, which is a massive waste of CPU and memory.

@alanorth #3231 should cover that

@tdonohue
Copy link
Member

tdonohue commented Nov 22, 2024

@alanorth : Thank you so much for getting this PR created! I was just asking someone to do this in yesterday's Developers Meeting.

Regarding the failing tests, I'd recommend adding /home to the list of SSR paths, because many bots/harvesters will start at your homepage (especially if they don't use sitemaps). So, I think that the homepage should always provide SSR.

One other suggestion. I think it'd be better to make these paths configurable instead of hardcoding them in the server.ts. It could look something like this:

ssr:
    paths: [ '/items/', '/entities/', '/collections/', '/communities/', '/bitstream/', '/bitstreams/' ]

(You'd have to update the existing ssr-config.interface.ts to support this new option)

Then in the code use environment.ssr.paths.

I'd argue that there also should be a way to enable SSR for everything (to retain current behavior). Perhaps that's the default behavior if this environment.ssr.paths configuration is unspecified or empty.

Overall, I do like this PR & support adding it quickly. I just want to add more flexibility to the configuration, as there's a chance that different sites will want to add additional paths (or keep the default behavior of SSR enabled for every path).

@alanorth
Copy link
Contributor Author

@alanorth : Thank you so much for getting this PR created! I was just asking someone to do this in yesterday's Developers Meeting.

You're welcome. I saw the meeting notes and was surprised that there wasn't already a PR, since I've been using versions of this patch for a few months already.

Regarding the failing tests, I'd recommend adding /home to the list of SSR paths...

Yes, agreed.

One other suggestion. I think it'd be better to make these paths configurable

Oh good idea, I didn't know about ssr-config.interface.ts. I will be offline for a few days but can work on this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug high priority performance / caching Related to performance, caching or embedded objects port to dspace-7_x This PR needs to be ported to `dspace-7_x` branch for next bug-fix release port to dspace-8_x This PR needs to be ported to `dspace-8_x` branch for next bug-fix release
Projects
Status: 🙋 Needs Reviewers Assigned
Development

Successfully merging this pull request may close these issues.

(Discussion) High CPU usage in DSpace frontend related to Angular Server Side Rendering (SSR)
3 participants