-
Notifications
You must be signed in to change notification settings - Fork 27
Expected to be used for large sizes? #1
Comments
My thought is that even if we tell folks to use it for large regions, they'll use it for small ones too, so we'll have to handle that anyway. I think @lukewagner originally suggested that the size have page units to prevent that. Is it worth it though? What's the cost to the VM to have to handle small regions? |
The benefit I see for clamping to page sizes is that we remove any expectation that the wasm engine might optimize
which lets engines compile |
Wouldn't that remove the binary size saving? |
That's an interesting point, but I wasn't aware that this feature was expected to reduce binary sizes by any significant amount in any case. It would certainly change the nature of the feature (and what engines needed to do) if |
I think clamping to page sizes would cripple this feature and result in a proliferation of user code that tries to divide original requests into a page-multiple-sized chunk followed by cleanup code. That's a classic abstraction inversion. |
Why would there not be a single implementation of |
Just coming back to this... It seems like the wasm page size is bit too large of a granularity -- the microbenchmark shows benefits for sizes < 64K.
True, though we also seem to have assumed a mostly symbiotic relationship with producers, where they'll produce good code so the VM doesn't have to perform complex optimizations. I think it's reasonable to assume the same here -- if we give guidelines for the producer (TBD 😉) then can the VM assume that it isn't going to have to optimize a constant 4 byte memcpy that should have just been a load/store pair? |
I'd hope so. |
I agree. I think it would be safer to leave it at the byte granularity, assume that these operations will get used at both big and small sizes, and leave it to implementations to decide how (if at all) they want to optimise the small-size cases. |
But in practice, if every wasm engine doesn't reliably optimize the small-constant-size case (which, from what I understand, is very commonly used) there will be a significant perf cliff which will require the toolchain (to provide reliable perf to its users) to do the lowering to loads anyway. With page-size-quanta, the responsibility for who does what is clear. I don't see how this cripples the feature since this is an advanced optimization emitted only in special cases by compilers, not something anyone writes by hand in the source language. |
Well, it will force producers to produce sequences that mix calls and inline code, which will be verbose and will also inherently not be optimised for more than one target processor (how do you make the unroll vs vectorise vs unrolled-and-vectorised vs call-out tradeoffs if you don't know what you're running on?) I'm also not convinced that the small-constant-size case is uncommon: I frequently see a lot of bits of memcopy/memmove being called from compiled Rust, when profiling natively compiled Rust. I do understand what you're getting at though. Would it be feasible and/or helpful to add to the spec, an advisory section that states a minimum set of copy/fill cases that an optimising Wasm implementation can reasonably expect to do well and in-line? That is to say, add some kind of quasi performance-guarantees to the contract? |
Yeah, I suppose a non-normative note that states the contract, even if informally, could effectively make it the browser's "fault" if they didn't optimize appropriately, so producers could feel confident in always emitting Also, thinking more about what a producer would need to do to optimally use a page-quanta I'm fine with byte-granularity, then. |
@lukewagner would you rather also have an alignment hint, so you can do fancy stuff on top? |
If you're talking about "page-aligned" hint, I don't think it would help (the case browsers would have to specially-optimize is when the size was small and constant; for all others we'd just call out to the libc Or perhaps you mean 1/2/4/8/16-byte alignment? When coupled with a constant size, such that the engine is inlining a straight sequence of load/stores, I guess I could see this being useful for the same reason that the alignment hint is present on scalar loads/stores, but that is a separate point from the one I made/rescinded above. |
The tracking issue for this feature says
I don't see this mentioned in the Overview.md. Is this still an expectation?
The text was updated successfully, but these errors were encountered: