-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental performance improvements to element creation #3169
Conversation
Visit the preview URL for this PR (updated for commit 10d5f89): https://yew-rs-api--pr3169-performance-improvem-5zc0a3lu.web.app (expires Sun, 09 Apr 2023 16:39:04 GMT) 🔥 via Firebase Hosting GitHub Action 🌎 |
Benchmark - SSRYew Master
Pull Request
|
Size Comparison
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and the performance comparisons.
My conclusion is that I favor interning attribute keys, tag names and event types, but not attribute values, by default. But also to keep it opt-in in some sense by the user.
.create_element(tag) | ||
.expect("can't create element for vtag") | ||
thread_local! { | ||
static CACHED_ELEMENTS: RefCell<Vec<(String, Element)>> = Default::default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I can follow your argument for using a linear search here. Since it's caching by tag, we might even be able to fine-tune the hashing used to avoid collisions, but even without this, a HashMap
would be less surprising. The "usual" website uses between 40-100 different tagNames.
I ran new Set(Array.from(document.querySelectorAll("*")).map(e => e.tagName))
on the top 50 list. A few outliers are explained by the liberal use of custom elements on the google pages especially. The above also counts html
, body
and a few other elements that probably do not appear in the app itself, but I think expecting the average yew app to use 30 different elements at least is reasonable. I don't see the linear search being faster than a lookup in the map, and the memory overhead it most likely negligible.
Would you try this with a HashMap
with a default capacity of, say 32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me give a bit more of my reasoning, since we should start from the assumption that a HashMap is the right call here.
- For any given comparison between
O(n)
andO(1)
it's good to keep in mind that ifn
is relatively small and1
is relatively large,n
may actually be more efficient. For example, if we only have 2 items in the Vec, it will obviously be cheaper to do a linear search than to look it up in a HashMap. If we have 10,000 items, it will obviously be cheaper to look it up in the HashMap. - I made the guess that n = 30 is somewhere in the "not a significant difference" range. I agree this is important to actually test rather than making an assumption.
- Relative to the cost of DOM rendering itself these differences are likely minimal.
- Binary size: Yew already has
Vec<Element>
in it so this doesn't add meaningful binary size. It doesn't (afaict) have aHashMap<String, Element>
anywhere, so this is a new data structure to be monomorphized and included in the binary.
I did just do a HashMap version with capacity 32 as you suggested. Here are the benchmark results
I'm pleased to say you're right and I'm wrong here, in that even at this small n the HashMap is winning. You can see it when creating 1000 elements if you average "create 1000" and "append 1000", and a much bigger difference at 10,000. Of course on this particular run this wasn't enough to swamp the general statistical noise so the Vec approach was "faster" overall but this is not significant.
Note however that the HashMap version adds another 4kb to the WASM binary size.
So that's the tradeoff to consider in using a HashMap instead: even faster element creation that may or may not be measurable vs. 4kb in the binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 4KB (I assume you meant bytes with a B and not bits) really that much difference when WASM binary is highly compressible and is streamed to the client? I'm not sure. In any sizeable application, the binary size can be in MBs. 4KB is merely a drop in the bucket for that
I've just pushed a few changes incorporating the feedback here, so the included optimizations are
I've added an Unless I've missed something this is done from my end, as far as I can tell. |
I am not sure if we should be providing a feature for interning. For any sizeable application, you would need to have wasm-bindgen as a dependency, which is where interning matters. |
Looks good to from an implementation perspective. Note that that some documentation should be provided for the users. Can you add a paragraph to the documentation about optimizations on how to enable that, after we decide if we want a feature in yew? I'd argue for an
I don't see the downsides of the feature forwarding. |
If wasm-bindgen adapts a different method or deprecating interning (e.g.: WebIDL?), not providing this transitive feature will make 1 less maintaining overhead for us. If wasm-bindgen can transition transparently, it would only be natural to assume they would also do so, if they have to introduce it a breaking change, I do not think we can apply it without it being a breaking change either. Which means that avoiding this feature would help us to avoid 1 potential breaking change.
wasm-bindgen not only provides binding features, but other things like
I would see this as maintaining overhead that can be otherwise avoided. There are implications around interning as it would require all strings that is passed through to the JavaScript APIs to be hashed (as it needs to look up whether it is interned). This cost is add on top of the existing UTF-8 - UTF-16 encoding / decoding cost. Hence, if we include interning as a feature, we need to provide documentation around this feature flag so users can fully understand the implications and when to use it. In which this is something that we can avoid by simply pointing this to wasm-bindgen's intern function documentation. By the end of the day, I wouldn't be against adding this feature flag, if you think it's still worth it after considering:
|
All good points. After considering, I still think we should mention string interning in the crate level and optimization docs, but just point users to the wasm-bindgen feature directly. Should be less risky and less taxing for us in the long run and equally as usable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you merge the changes from master, the CI should be green
packages/yew/Cargo.toml
Outdated
|
||
[features] | ||
ssr = ["dep:html-escape", "dep:base64ct", "dep:bincode"] | ||
csr = [] | ||
hydration = ["csr", "dep:bincode"] | ||
enable-interning = ["wasm-bindgen/enable-interning"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove this feature? (see above discussion)
enable-interning = ["wasm-bindgen/enable-interning"] |
If you would like to also add documentation for interning, it should go in optimizations docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for taking the time to work on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Description
This PR includes a series of incremental improvements to element creation speed. It is a smaller effect than I'd anticipated overall, but measurable for each step. There are two general performance approaches used:
wasm-bindgen
string interning, which reduces the cost of copying frequently-used strings across the WASM-JS boundaryelement.cloneNode()
instead ofdocument.createElement()
, which is significantly faster on the browsers I've tested. (I implemented this with a very simple O(n) linear search, where n is the number of distinct elements used in an app, as I'm assuming the number of HTML elements used in any app is small enough for hashing the element names and using a HashMap not to be worth it, but I didn't benchmark a HashMap approach.)Trade-offs:
document().create_element()
approachI've made this in a series of commits with increasing levels of performance, each building on top of the last, so you can decide which, if any, you want to adopt.
Here are local results for js-framework-benchmark for each approach:
I'd note there seems to be an exponential slowdown somewhere with element creation, such that current Yew scores 1.58 on "create 1000 rows" but 2.20 on "create 10,000 rows," and the node cloning approach is 1.41 on "create 1000" and 1.80 on "create 10,000."
I won't be sad if you decide none of this is worth it, given the magnitude of the improvements is not that big. Just wanted to offer it!
Checklist