Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental performance improvements to element creation #3169

Merged
merged 10 commits into from
Apr 2, 2023

Conversation

gbj
Copy link
Contributor

@gbj gbj commented Mar 18, 2023

Description

This PR includes a series of incremental improvements to element creation speed. It is a smaller effect than I'd anticipated overall, but measurable for each step. There are two general performance approaches used:

  1. Enabling wasm-bindgen string interning, which reduces the cost of copying frequently-used strings across the WASM-JS boundary
  2. Using element.cloneNode() instead of document.createElement(), which is significantly faster on the browsers I've tested. (I implemented this with a very simple O(n) linear search, where n is the number of distinct elements used in an app, as I'm assuming the number of HTML elements used in any app is small enough for hashing the element names and using a HashMap not to be worth it, but I didn't benchmark a HashMap approach.)

Trade-offs:

  • String interning costs a constant amount for each distinct string used, in exchange for faster speed when reusing strings. This means that for applications that involve creating the same elements or setting the same attributes multiple times, it is a net win.
  • The node cloning approach adds some significant complexity over the more intuitive document().create_element() approach
  • Each level of increased runtime speed tends to come with a slight increase in binary size

I've made this in a series of commits with increasing levels of performance, each building on top of the last, so you can decide which, if any, you want to adopt.

  • intern element tag names
  • intern attribute and event listener names
  • intern attribute values
  • use cached node cloning instead of element creation

Here are local results for js-framework-benchmark for each approach:
Screenshot 2023-03-18 at 3 30 55 PM
Screenshot 2023-03-18 at 3 30 47 PM

I'd note there seems to be an exponential slowdown somewhere with element creation, such that current Yew scores 1.58 on "create 1000 rows" but 2.20 on "create 10,000 rows," and the node cloning approach is 1.41 on "create 1000" and 1.80 on "create 10,000."

I won't be sad if you decide none of this is worth it, given the magnitude of the improvements is not that big. Just wanted to offer it!

Checklist

  • I have reviewed my own code
  • I have added tests

@github-actions
Copy link

github-actions bot commented Mar 18, 2023

Visit the preview URL for this PR (updated for commit 10d5f89):

https://yew-rs-api--pr3169-performance-improvem-5zc0a3lu.web.app

(expires Sun, 09 Apr 2023 16:39:04 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

@github-actions
Copy link

github-actions bot commented Mar 18, 2023

Benchmark - SSR

Yew Master

Benchmark Round Min (ms) Max (ms) Mean (ms) Standard Deviation
Baseline 10 353.656 355.595 354.452 0.673
Hello World 10 628.596 631.143 629.305 0.692
Function Router 10 2150.388 2158.887 2155.662 2.460
Concurrent Task 10 1006.972 1008.672 1007.683 0.569
Many Providers 10 1644.552 1663.200 1652.729 6.190

Pull Request

Benchmark Round Min (ms) Max (ms) Mean (ms) Standard Deviation
Baseline 10 353.459 355.630 354.645 0.692
Hello World 10 630.781 633.112 631.369 0.736
Function Router 10 2136.385 2177.344 2144.923 11.882
Concurrent Task 10 1006.590 1008.737 1007.958 0.678
Many Providers 10 1651.992 1677.968 1659.592 9.058

@github-actions
Copy link

github-actions bot commented Mar 18, 2023

Size Comparison

examples master (KB) pull request (KB) diff (KB) diff (%)
async_clock 101.879 104.199 +2.320 +2.278%
boids 171.810 174.132 +2.322 +1.352%
communication_child_to_parent 92.598 94.914 +2.316 +2.502%
communication_grandchild_with_grandparent 103.522 105.841 +2.318 +2.239%
communication_grandparent_to_grandchild 99.693 102.019 +2.325 +2.332%
communication_parent_to_child 89.931 92.247 +2.316 +2.576%
contexts 106.125 108.443 +2.318 +2.185%
counter 87.972 90.289 +2.317 +2.634%
counter_functional 88.307 90.624 +2.317 +2.624%
dyn_create_destroy_apps 90.819 93.140 +2.320 +2.555%
file_upload 102.270 104.142 +1.872 +1.831%
function_memory_game 164.169 166.488 +2.319 +1.413%
function_router 331.980 333.572 +1.592 +0.479%
function_todomvc 159.681 162.001 +2.320 +1.453%
futures 225.227 227.547 +2.320 +1.030%
game_of_life 108.117 110.438 +2.320 +2.146%
immutable 182.636 186.136 +3.500 +1.916%
inner_html 84.624 86.941 +2.317 +2.738%
js_callback 110.230 112.552 +2.321 +2.106%
keyed_list 198.554 200.873 +2.319 +1.168%
mount_point 87.732 90.052 +2.319 +2.644%
nested_list 111.226 113.543 +2.317 +2.083%
node_refs 94.783 97.101 +2.317 +2.445%
password_strength 1542.321 1544.560 +2.238 +0.145%
portals 95.771 98.089 +2.317 +2.420%
router 303.426 305.024 +1.599 +0.527%
simple_ssr 140.751 143.052 +2.301 +1.635%
ssr_router 368.772 370.384 +1.611 +0.437%
suspense 107.342 109.661 +2.319 +2.161%
timer 90.846 93.164 +2.318 +2.552%
todomvc 142.163 144.483 +2.320 +1.632%
two_apps 88.618 90.943 +2.325 +2.624%
web_worker_fib 152.561 154.892 +2.331 +1.528%
webgl 87.260 89.579 +2.319 +2.658%

⚠️ The following examples have changed their size significantly:

examples master (KB) pull request (KB) diff (KB) diff (%)
async_clock 101.879 104.199 +2.320 +2.278%
boids 171.810 174.132 +2.322 +1.352%
communication_child_to_parent 92.598 94.914 +2.316 +2.502%
communication_grandchild_with_grandparent 103.522 105.841 +2.318 +2.239%
communication_grandparent_to_grandchild 99.693 102.019 +2.325 +2.332%
communication_parent_to_child 89.931 92.247 +2.316 +2.576%
contexts 106.125 108.443 +2.318 +2.185%
counter 87.972 90.289 +2.317 +2.634%
counter_functional 88.307 90.624 +2.317 +2.624%
dyn_create_destroy_apps 90.819 93.140 +2.320 +2.555%
file_upload 102.270 104.142 +1.872 +1.831%
function_memory_game 164.169 166.488 +2.319 +1.413%
function_todomvc 159.681 162.001 +2.320 +1.453%
futures 225.227 227.547 +2.320 +1.030%
game_of_life 108.117 110.438 +2.320 +2.146%
immutable 182.636 186.136 +3.500 +1.916%
inner_html 84.624 86.941 +2.317 +2.738%
js_callback 110.230 112.552 +2.321 +2.106%
keyed_list 198.554 200.873 +2.319 +1.168%
mount_point 87.732 90.052 +2.319 +2.644%
nested_list 111.226 113.543 +2.317 +2.083%
node_refs 94.783 97.101 +2.317 +2.445%
portals 95.771 98.089 +2.317 +2.420%
simple_ssr 140.751 143.052 +2.301 +1.635%
suspense 107.342 109.661 +2.319 +2.161%
timer 90.846 93.164 +2.318 +2.552%
todomvc 142.163 144.483 +2.320 +1.632%
two_apps 88.618 90.943 +2.325 +2.624%
web_worker_fib 152.561 154.892 +2.331 +1.528%
webgl 87.260 89.579 +2.319 +2.658%

Copy link
Member

@WorldSEnder WorldSEnder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and the performance comparisons.

My conclusion is that I favor interning attribute keys, tag names and event types, but not attribute values, by default. But also to keep it opt-in in some sense by the user.

.create_element(tag)
.expect("can't create element for vtag")
thread_local! {
static CACHED_ELEMENTS: RefCell<Vec<(String, Element)>> = Default::default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can follow your argument for using a linear search here. Since it's caching by tag, we might even be able to fine-tune the hashing used to avoid collisions, but even without this, a HashMap would be less surprising. The "usual" website uses between 40-100 different tagNames.

I ran new Set(Array.from(document.querySelectorAll("*")).map(e => e.tagName)) on the top 50 list. A few outliers are explained by the liberal use of custom elements on the google pages especially. The above also counts html, body and a few other elements that probably do not appear in the app itself, but I think expecting the average yew app to use 30 different elements at least is reasonable. I don't see the linear search being faster than a lookup in the map, and the memory overhead it most likely negligible.

Would you try this with a HashMap with a default capacity of, say 32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me give a bit more of my reasoning, since we should start from the assumption that a HashMap is the right call here.

  1. For any given comparison between O(n) and O(1) it's good to keep in mind that if n is relatively small and 1 is relatively large, n may actually be more efficient. For example, if we only have 2 items in the Vec, it will obviously be cheaper to do a linear search than to look it up in a HashMap. If we have 10,000 items, it will obviously be cheaper to look it up in the HashMap.
  2. I made the guess that n = 30 is somewhere in the "not a significant difference" range. I agree this is important to actually test rather than making an assumption.
  3. Relative to the cost of DOM rendering itself these differences are likely minimal.
  4. Binary size: Yew already has Vec<Element> in it so this doesn't add meaningful binary size. It doesn't (afaict) have a HashMap<String, Element> anywhere, so this is a new data structure to be monomorphized and included in the binary.

I did just do a HashMap version with capacity 32 as you suggested. Here are the benchmark results
Screenshot 2023-03-18 at 5 37 05 PM
Screenshot 2023-03-18 at 5 37 16 PM

I'm pleased to say you're right and I'm wrong here, in that even at this small n the HashMap is winning. You can see it when creating 1000 elements if you average "create 1000" and "append 1000", and a much bigger difference at 10,000. Of course on this particular run this wasn't enough to swamp the general statistical noise so the Vec approach was "faster" overall but this is not significant.

Note however that the HashMap version adds another 4kb to the WASM binary size.

So that's the tradeoff to consider in using a HashMap instead: even faster element creation that may or may not be measurable vs. 4kb in the binary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 4KB (I assume you meant bytes with a B and not bits) really that much difference when WASM binary is highly compressible and is streamed to the client? I'm not sure. In any sizeable application, the binary size can be in MBs. 4KB is merely a drop in the bucket for that

packages/yew/src/dom_bundle/btag/attributes.rs Outdated Show resolved Hide resolved
packages/yew/Cargo.toml Outdated Show resolved Hide resolved
@gbj
Copy link
Contributor Author

gbj commented Mar 21, 2023

I've just pushed a few changes incorporating the feedback here, so the included optimizations are

  1. cache element creation and use clone_node() (supersedes interning tag names)
  2. intern attribute names and event types

I've added an enable-interning feature in yew for convenience that enables the same feature in wasm-bindgen, so users don't have to add a wasm-bindgen dependency themselves. This is off by default.

Unless I've missed something this is done from my end, as far as I can tell.

@futursolo
Copy link
Member

I am not sure if we should be providing a feature for interning. For any sizeable application, you would need to have wasm-bindgen as a dependency, which is where interning matters.

@WorldSEnder
Copy link
Member

WorldSEnder commented Mar 26, 2023

Looks good to from an implementation perspective. Note that that some documentation should be provided for the users. Can you add a paragraph to the documentation about optimizations on how to enable that, after we decide if we want a feature in yew?

I'd argue for an "enable-interning" feature in yew directly:

  • it would be easier to teach to users, as it's just adding a flag and more prominently visible
  • on that note, it should have documentation of that feature in the crate level docs
  • if wasm-bindgen ever updates or changes the mechanism, we could change transparently to the new mechanism (hopefully)
  • a "sizeable" application might also put all the "ugly" wasm-bindgen stuff into a separate component library, and only enable string interning in the final executable.

I don't see the downsides of the feature forwarding.

@futursolo
Copy link
Member

futursolo commented Mar 26, 2023

if wasm-bindgen ever updates or changes the mechanism, we could change transparently to the new mechanism (hopefully)

If wasm-bindgen adapts a different method or deprecating interning (e.g.: WebIDL?), not providing this transitive feature will make 1 less maintaining overhead for us. If wasm-bindgen can transition transparently, it would only be natural to assume they would also do so, if they have to introduce it a breaking change, I do not think we can apply it without it being a breaking change either. Which means that avoiding this feature would help us to avoid 1 potential breaking change.

a "sizeable" application might also put all the "ugly" wasm-bindgen stuff into a separate component library, and only enable string interning in the final executable.

wasm-bindgen not only provides binding features, but other things like #[wasm_bindgen(start)] to register an additional entry point, wasm_bindgen::prelude::*, which provides things like JsCast, UnwrapThrowExt, etc. which are all useful for sizable applications at application level.

it would be easier to teach to users, as it's just adding a flag and more prominently visible
I don't see the downsides of the feature forwarding.

I would see this as maintaining overhead that can be otherwise avoided.

There are implications around interning as it would require all strings that is passed through to the JavaScript APIs to be hashed (as it needs to look up whether it is interned). This cost is add on top of the existing UTF-8 - UTF-16 encoding / decoding cost.

Hence, if we include interning as a feature, we need to provide documentation around this feature flag so users can fully understand the implications and when to use it. In which this is something that we can avoid by simply pointing this to wasm-bindgen's intern function documentation.

By the end of the day, I wouldn't be against adding this feature flag, if you think it's still worth it after considering:

  1. Users will likely need wasm-bindgen as a dependency.
  2. We have to write documentation for this.
  3. We have to potential handle issues and questions around users / libraries enabling / using this feature incorrectly.
  4. It may potentially becoming a breaking change for us.

@WorldSEnder
Copy link
Member

WorldSEnder commented Mar 27, 2023

By the end of the day, I wouldn't be against adding this feature flag, if you think it's still worth it after considering:

1. Users will likely need wasm-bindgen as a dependency.

2. We have to write documentation for this.

3. We have to potential handle issues and questions around users / libraries enabling / using this feature incorrectly.

4. It may potentially becoming a breaking change for us.

All good points. After considering, I still think we should mention string interning in the crate level and optimization docs, but just point users to the wasm-bindgen feature directly. Should be less risky and less taxing for us in the long run and equally as usable.

Copy link
Member

@ranile ranile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you merge the changes from master, the CI should be green


[features]
ssr = ["dep:html-escape", "dep:base64ct", "dep:bincode"]
csr = []
hydration = ["csr", "dep:bincode"]
enable-interning = ["wasm-bindgen/enable-interning"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove this feature? (see above discussion)

Suggested change
enable-interning = ["wasm-bindgen/enable-interning"]

If you would like to also add documentation for interning, it should go in optimizations docs

@ranile ranile added performance A-yew Area: The main yew crate labels Apr 1, 2023
@voidpumpkin voidpumpkin added the S-waiting-on-author Status: awaiting action from the author of the issue/PR label Apr 2, 2023
@voidpumpkin voidpumpkin removed the S-waiting-on-author Status: awaiting action from the author of the issue/PR label Apr 2, 2023
Copy link
Member

@ranile ranile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for taking the time to work on this

Copy link
Member

@voidpumpkin voidpumpkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@voidpumpkin voidpumpkin merged commit bdf5712 into yewstack:master Apr 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-yew Area: The main yew crate performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants