Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assets + stylesheet assets #1475

Open
wants to merge 193 commits into
base: master
Choose a base branch
from

Conversation

eoghanmurray
Copy link
Contributor

@eoghanmurray eoghanmurray commented May 14, 2024

Medium to large enhancement to rrweb, building upon #1239


Asset Events

Assets are a new type of event that embody a serialized version of a http resource captured during snapshotting. Some examples are images, media files and stylesheets. Resources can be fetched externally (from cache) in the case of a href, or internally for blob: urls and same-origin stylesheets. Asset events are emitted subsequent to either a FullSnapshot or an IncrementalSnapshot (mutation), and although they may have a later timestamp, during replay they are rebuilt as part of the snapshot that they are associated with. In the case where e.g. a stylesheet is referenced at the time of a FullSnapshot, but hasn't been downloaded yet, there can be a subsequent mutation event with a later timestamp which, along with the asset event, can recreate the experience of a network-delayed load of the stylesheet.

Assets to mitigate stylesheet processing cost

In the case of stylesheets, rrweb does some record-time processing in order to serialize the css rules which had a negative effect on the initial page loading times and how quickly the FullSnapshot was taken (see https://pagespeed.web.dev/). These are now taken out of the main thread and processed asynchronously to be emitted (up to processStylesheetsWithin ms) later. There is no corresponding delay on the replay side so long as the stylesheet has been successfully emitted.

Asset Capture Configuration

The captureAssets configuration option allows you to customize the asset capture process. It is an object with the following properties:

  • objectURLs (default: true): This property specifies whether to capture same-origin blob: assets using object URLs. Object URLs are created using the URL.createObjectURL() method. Setting objectURLs to true enables the capture of object URLs.

  • origins (default: false): This property determines which origins to capture assets from. It can have the following values:

    • false or []: Disables capturing any assets apart from object URLs, stylesheets (unless set to false) and images (if that setting is turned on).
    • true: Captures assets from all origins.
    • [origin1, origin2, ...]: Captures assets only from the specified origins. For example, origins: ['https://s3.example.com/'] captures all assets from the origin https://s3.example.com/.
  • images (default: false or true if inlineImages is true in rrweb.record config): When set, this option turns on asset capturing for all images irrespective of their origin. Unless this configuration option is explicitly set to false, images may still be captured if their src url matches the origins setting above.

  • stylesheets (default: 'without-fetch'): When set to true, this turns on capturing of all stylesheets and style elements via the asset system irrespective of origin. The default of 'without-fetch' is designed to match with the previous inlineStylesheet behaviour, whereas the true value allows capturing of stylesheets which are otherwise inaccessible due to CORS restrictions to be captured via a fetch call, which will normally use the browser cache. Unless this is explicitly set to false, a stylesheet will be captured if it matches via the origins config above.

  • stylesheetsRuleThreshold (default: 0): only invoke the asset system for stylesheets with more than this number of rules. Defaults to zero (rather than say 100) as it only looks at the 'outer' rules (e.g. could have a single media rule which nests 1000s of sub rules). This default may be increased based on feedback.

  • processStylesheetsWithin (default: 2000): This property defines the maximum time in milliseconds that the browser should delay before processing stylesheets. Inline <style> elements will be processed within half this value. Lower this value if you wish to improve the odds that short 'bounce' visits will emit the asset before visitor unloads page. Set to zero or a negative number to process stylesheets synchronously, which can cause poor scores on e.g. https://pagespeed.web.dev/ ("Third-party code blocked the main thread").

Copy link

changeset-bot bot commented May 14, 2024

🦋 Changeset detected

Latest commit: 4a0e267

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 19 packages
Name Type
rrweb-snapshot Major
rrweb Major
rrdom Major
@rrweb/types Major
@rrweb/rrweb-plugin-canvas-webrtc-record Major
@rrweb/rrweb-plugin-canvas-webrtc-replay Major
@rrweb/rrweb-plugin-console-record Major
@rrweb/rrweb-plugin-console-replay Major
@rrweb/rrweb-plugin-sequential-id-record Major
@rrweb/rrweb-plugin-sequential-id-replay Major
rrdom-nodejs Major
rrweb-player Major
@rrweb/all Major
@rrweb/replay Major
@rrweb/record Major
@rrweb/packer Major
@rrweb/utils Major
@rrweb/web-extension Major
rrvideo Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

eoghanmurray added a commit that referenced this pull request Aug 6, 2024
Support a contrived/rare case where a <style> element has multiple text node children (this is usually only possible to recreate via javascript append) ... this PR fixes cases where there are subsequent text mutations to these nodes; previously these would have been lost

* In this scenario, a new CSS comment may now be inserted into the captured `_cssText` for a <style> element to show where it should be broken up into text elements upon replay: `/* rr_split */`
* The new 'can record and replay style mutations' test is the principal way to the problematic scenarios, and is a detailed 'catch-all' test with many checks to cover most of the ways things can fail
* There are new tests for splitting/rebuilding the css using the rr_split marker
* The prior 'dynamic stylesheet' route is now the main route for serializing a stylesheet; dynamic stylesheet were missed out in #1533 but that case is now covered with this PR

This PR was originally extracted from #1475 so the  initial motivation was to change the approach on stringifying <style> elements to do so in a single place.  This is also the motivating factor for always serializing <style> elements via the `_cssText` attribute rather than in it's childNodes; in #1475 we will be delaying populating `_cssText` for performance and instead recorrding them as assets.

Thanks for the detailed review to  Justin Halsall <Juice10@users.noreply.github.com> & Yun Feng <https://github.com/YunFeng0817>
@eoghanmurray eoghanmurray force-pushed the stylesheet-assets branch 2 times, most recently from c593165 to efaae6d Compare August 23, 2024 15:48
ShayMalchi added a commit to SaolaAI/rrweb that referenced this pull request Nov 6, 2024
* Skip mask check on leaf elements (rrweb-io#1512)

* Minor fixup for rrweb-io#1349; the 'we can avoid the check on leaf elements' optimisation wasn't being applied as `n.childNodes` was always truthy even when there were no childNodes.

Changing it to `n.childNodes.length` directly there (see rrweb-io#1402) actually caused a bug as during a mutation, we serialize the text node directly, and need to jump to the parentElement to do the check.
This is why I've reimplemented this optimisation inside `needMaskingText` where we are already had an `isElement` test

Thanks to @Paulhejia (https://github.com/Paulhejia/rrweb/) for spotting that `Boolean(n.childNodes)` is aways true.

* Assuming all jest should have been removed in rrweb-io#1033 (rrweb-io#1511)

* all references to jest should have been removed in rrweb-io#1033
* clarify that `cross-env` is used to ensure that environmental variables get applied on Windows (previous usage of cross-env was removed in rrweb-io#1033)

* Fix async assertions in test files (rrweb-io#1510)

* fix: await assertSnapshot in test files for async assertions

* Fix maskInputFn is ignored during the creation of the full snapshot (rrweb-io#1386)

Fix that the optional `maskInputFn` was being accidentally ignored during the creation of the full snapshot

* Improve development tooling (rrweb-io#1516)

- Running `yarn build` in a `packages/*/` directory will trigger build of all dependencies too, and cache them if possible.
- Fix for `yarn dev` breaking for `rrweb` package whenever changing files in `rrweb` package
- Update typescript, turbo, vite and vite-plugin-dts
- Require `workspaces-to-typescript-project-references` from `prepublish`

* Version Packages (alpha) (rrweb-io#1513)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Keep all packages in sync

* feat: add new css parser - postcss (rrweb-io#1458)

* feat: add new css parser

* make selectors change

* selectors and tests

* media changes

* remove old css references

* better variable name

* use postcss and port tests

* fix media test

* inline plugins

* fix failing multiline selector

* correct test result

* move tests to correct file

* cleanup all tests

* remove unused css-tree

* update bundle

* cleanup dependencies

* revert config files to master

* remove d.ts files

* update snapshot

* reset rebuilt test

* apply fuzzy css matching

* remove extra test

* Fix imports

* Newer versions of nswapi break rrdom-nodejs tests.
Example:
 FAIL  test/document-nodejs.test.ts > RRDocument for nodejs environment > RRDocument API > querySelectorAll
TypeError: e[api] is not a function
 ❯ byTag ../../node_modules/nwsapi/src/nwsapi.js:390:37
 ❯ Array.<anonymous> ../../node_modules/nwsapi/src/nwsapi.js:327:113
 ❯ collect ../../node_modules/nwsapi/src/nwsapi.js:1578:32
 ❯ Object._querySelectorAll [as select] ../../node_modules/nwsapi/src/nwsapi.js:1533:36
 ❯ RRDocument.querySelectorAll src/document-nodejs.ts:96:24

* Migrate from jest to vitest

* Order of selectors has changed with postcss

* Remove unused eslint

---------

Co-authored-by: Justin Halsall <Juice10@users.noreply.github.com>

* fix: console assert only logs when arg 0 is falsy (rrweb-io#1530)

* fix: console assert only logs when arg 0 is falsy

* [Feature] Include takeFullSnapshot function in rrweb (rrweb-io#1527)

* export takeFullSnapshot function in rrweb

* chore: reduce flakey test due to '[vite] connected' message (rrweb-io#1525)

* fix: duplicate textContent for style element cause incremental style mutation invalid (rrweb-io#1417)

fix style element corner case
 - historically we have recorded duplicated css content in certain cases (demonstrated by the attached replayer test). This fix ensures that the replayer doesn't doubly add the content, which can cause problems when further mutations occur
---------
Review and further tests contributed by: Eoghan Murray <eoghan@getthere.ie>

* Added support for deprecated addRule & removeRule methods (rrweb-io#1515)

* Added support for deprecated addRule & removeRule methods

* Respect addRule default value

* fix: nested stylesheets should have absolute URLs (rrweb-io#1533)

* Replace relative URLs with absolute URLs when stringifying stylesheets

* Add test to show desired behavior for imported stylesheets from seperate directory

* Rename `absoluteToStylesheet` to `absolutifyURLs` and call it once after stringifying imported stylesheet

* Don't create the intermediary array of the spread operator

* Formalize that `stringifyRule` should expect a sheet href

* Ensure a <style> element can also import and gets it's url absolutized

* Handle case where non imported stylesheet has relative urls that need to be absolutified

* Clarify in test files where jpegs are expected to appear in absolutified urls

* Move absolutifyURLs call for import rules out of trycatch

* Add a benchmarking test for stringifyStylesheet

* Avoid the duplication on how to fall back

---------

Co-authored-by: Eoghan Murray <eoghan@getthere.ie>
Co-authored-by: eoghanmurray <eoghanmurray@users.noreply.github.com>

* Support top-layer <dialog> recording & replay (rrweb-io#1503)

* chore: its important to run `yarn build:all` before running `yarn dev`

* feat: trigger showModal from rrdom and rrweb

* feat: Add support for replaying modal and non modal dialog elements

* chore: Update dev script to remove CLEAR_DIST_DIR flag

* Get modal recording and replay working

* DRY up dialog test and dedupe snapshot images

* feat: Refactor dialog test to use updated attribute name

* feat: Update dialog test to include rr_open attribute

* chore: Add npm dependency happy-dom@14.12.0

* Add more test cases for dialog

* Clean up naming

* Refactor dialog open code

* Revert changed code that doesn't do anything

* Add documentation for unimplemented type

* chore: Remove unnecessary comments in dialog.test.ts

* rename rr_open to rr_openMode

* Replace todo with a skipped test

* Add better logging for CI

* Rename rr_openMode to rr_open_mode

rrdom downcases all attribute names which made `rr_openMode` tricky to deal with

* Remove unused images

* Move after iframe append based on @YunFeng0817's comment
rrweb-io#1503 (comment)

* Remove redundant dialog handling from rrdom.

rrdom already handles dialog element creation it's self

* Rename variables for dialog handling in rrweb replay module

* Update packages/rrdom/src/document.ts

---------

Co-authored-by: Eoghan Murray <eoghan@getthere.ie>

* Added session downloader for chrome extension (rrweb-io#1522)

* Added session downloader for chrome extension

- The session list now has a button to download sessions as .json files for use with rrweb-player
- Improved styling for the delete and download buttons

* Reverse monkey patch built in methods to support LWC (rrweb-io#1509)

* Get around monkey patched Nodes

* inlineImages: Setting of `image.crossOrigin` is not always necessary (rrweb-io#1468)

Setting of the `crossorigin` attribute is not necessary for same-origin images, and causes an immediate image reload (albeit from cache) necessitating the use of a load event listener which subsequently mutates the snapshot.  This change allows us to  avoid the mutation of the snapshot for the same-origin case.

* Modify inlineImages test to remove delay and show that we can inline images without mutation

* Add an explicit test for when the `image.crossOrigin = 'anonymous';` method is necessary.  Uses a combination of about:blank and our test server to simulate a cross-origin context

* Other test changes: there were some spurious rrweb mutations being generated by the addition of the crossorigin attribute that are now elimnated from the rrweb/__snapshots__/integration.test.ts.snap after this PR - this is good

* Move `childNodes` to @rrweb/utils

* Use non-monkey patched versions of the `childNodes`, `parentNode` `parentElement` `textContent` accessors

* Add getRootNode and contains, and add comprehensive todo list

* chore: Update turbo.json tasks for better build process

* Update caniuse-lite

* chore: Update eslint-plugin-compat to version 5.0.0

* chore: Bump @rrweb/utils version to 2.0.0-alpha.15

* delete unused yarn.lock files

* Set correct @rrweb/utils version in package.json

* Migrate over some accessors to reverse-monkey-patched version

* Add missing functions

* Fix illegal invocation error

* Revert closer to what it was.

This feels incorrect to me (Justin Halsall), but some of the tests break without it so I'm restoring this to be closer to its original here:
https://github.com/rrweb-io/rrweb/blame/cfd686d488a9b88dba6b6f8880b5e4375dd8062c/packages/rrweb-snapshot/src/snapshot.ts#L1011

* Reverse monkey patch all methods LWC hijacks

* Make tests more stable

* Safely handle rrdom nodes in hasShadowRoot

* Remove duplicated test

* Use variable `serverURL` in test

* Use monorepo default browserlist

* Fix typing issue for new typescript

* Remove unused package

* Remove unused code

* Add prefix to reverse-monkey-patched methods to make them more explicit

* Add default exports to @rrweb/utils

---------

Co-authored-by: Eoghan Murray <eoghan@getthere.ie>

* Version Packages (alpha) (rrweb-io#1526)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Single style capture (rrweb-io#1437)

Support a contrived/rare case where a <style> element has multiple text node children (this is usually only possible to recreate via javascript append) ... this PR fixes cases where there are subsequent text mutations to these nodes; previously these would have been lost

* In this scenario, a new CSS comment may now be inserted into the captured `_cssText` for a <style> element to show where it should be broken up into text elements upon replay: `/* rr_split */`
* The new 'can record and replay style mutations' test is the principal way to the problematic scenarios, and is a detailed 'catch-all' test with many checks to cover most of the ways things can fail
* There are new tests for splitting/rebuilding the css using the rr_split marker
* The prior 'dynamic stylesheet' route is now the main route for serializing a stylesheet; dynamic stylesheet were missed out in rrweb-io#1533 but that case is now covered with this PR

This PR was originally extracted from rrweb-io#1475 so the  initial motivation was to change the approach on stringifying <style> elements to do so in a single place.  This is also the motivating factor for always serializing <style> elements via the `_cssText` attribute rather than in it's childNodes; in rrweb-io#1475 we will be delaying populating `_cssText` for performance and instead recorrding them as assets.

Thanks for the detailed review to  Justin Halsall <Juice10@users.noreply.github.com> & Yun Feng <https://github.com/YunFeng0817>

* Simplify the hover replacement function (rrweb-io#1535)

Simplify the hover replacement function, which has been borrowed from postcss-pseudo-classes

Note: 'parses nested commas in selectors correctly' was failing after this PR, however I don't think that the previous behaviour was desirable, so have added a new test to formalize this expectation

* fix some typos in optimize-storage.md (rrweb-io#1565)

* fix some typos in optimize-storage.md

* Update docs/recipes/optimize-storage.md

* Create metal-mugs-mate.md

---------

Co-authored-by: Justin Halsall <Juice10@users.noreply.github.com>

* fix(rrdom): Ignore invalid DOM attributes when diffing (rrweb-io#1561)

* fix(rrdom): Ignore invalid DOM attributes when diffing (rrweb-io#213)

We encountered an issue where replays with invalid attributes (e.g.
`@click`) would break rendering the replay after seeking. The exception
bubbles up to
[here](https://github.com/rrweb-io/rrweb/blob/62093d4385a09eb0980c2ac02d97eea5ce2882be/packages/rrweb/src/replay/index.ts#L270-L279),
which means the replay will continue to play, but the replay mirror will
be incomplete.

Closes https://github.com/getsentry/team-replay/issues/458

* add changeset

* fix(snapshot): dimensions for blocked element not being applied (rrweb-io#1331)

fix for replay of a blocked element when using 'fast forward' (rrdom)

 - Dimensions were not being properly applied when you seek to a position in the replay. Need to use `setProperty` rather than trying to set the width/height directly

* ref: isParentRemoved to cache subtree (rrweb-io#1543)

* ref: isParentRemoved to cache subtree
* ref: cache at insertion too
* ref: remove wrapper function

---------

Co-authored-by: Justin Halsall <Juice10@users.noreply.github.com>

* changeset to 2.0.13

* fix snapshot build

---------

Co-authored-by: Eoghan Murray <eoghan@getthere.ie>
Co-authored-by: Justin Halsall <Juice10@users.noreply.github.com>
Co-authored-by: Alexey Babik <alexey.babik@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: David Newell <david@posthog.com>
Co-authored-by: Paul D'Ambra <paul@posthog.com>
Co-authored-by: Christopher Arredondo <arredgroup@gmail.com>
Co-authored-by: Yun Feng <yun.feng0817@gmail.com>
Co-authored-by: minja malešević <33005827+okejminja@users.noreply.github.com>
Co-authored-by: Jeff Nguyen <jeffduy@gmail.com>
Co-authored-by: eoghanmurray <eoghanmurray@users.noreply.github.com>
Co-authored-by: Arun Kunigiri <kirankunigiri@gmail.com>
Co-authored-by: Riadh Mouamnia <85134557+riadhmouamnia@users.noreply.github.com>
Co-authored-by: Billy Vong <billyvg@users.noreply.github.com>
Co-authored-by: Jonas <jonas@badalic.com>
Co-authored-by: Shay Malchi <shay@saola.ai>
…multi-page session).

 - instead of resetting between FullSnapshots, we instead record assets against a timestamp
 - prefer assets with timestamps subseqent to a snapshot (as if a reset happened)
 - if none can be found, find the most recent prior asset for a url

The asset manager now requires a rudimentary idea of where the replayer is at prior to applying an asset
…that we don't delay fullsnapshot rendering awaiting for them
… it's important to distinguish between the first style element on one page, and the first style element on another page (which would likely have different content)
…ullsnapshots. I don't fully understand the reason for returning the `failed` status; have copied over the commit message (from when allAdded was introduced) as the comment 'we don't expect assets to arrive later'
… in "should be able to modify a node's attribute once asset is loaded" test)
…ets loaded' - I believe this change is needed after the change to always storing the `src` in `rr_captured_src`, but I'm unsure as to why I didn't pick it up earlier; maybe I wasn't running these tests
…g nondeterminism in the test (e.g. in 'should nest record iframe' between the emission of the iframe and emission of the style asset which are both (asynchronously) triggered by the fullsnapshot)
…o. Add in a missing case where a <picture> element should be considered an image
…r "Ensure blocking class is respected to prevent processing of assets"
…ion ordering due to synchronous emission of an asset during a full snapshot before the FullSnapshot event could be emitted. At least if they explicitly set processStylesheetsWithin=0 then there is a cause relating to this outcome. A further update might ensure that all Assets emitted during a FullSnapshot take on the timestamp of the FullSnapshot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants