Expected to be used for large sizes? #1

sunfishcode · 2017-09-08T22:45:07Z

The tracking issue for this feature says

We expect that WebAssembly producers will use these operations when the region
size is known to be large, and will use loads/stores otherwise.

I don't see this mentioned in the Overview.md. Is this still an expectation?

binji · 2017-09-08T23:14:10Z

My thought is that even if we tell folks to use it for large regions, they'll use it for small ones too, so we'll have to handle that anyway. I think @lukewagner originally suggested that the size have page units to prevent that. Is it worth it though? What's the cost to the VM to have to handle small regions?

lukewagner · 2017-09-08T23:21:59Z

The benefit I see for clamping to page sizes is that we remove any expectation that the wasm engine might optimize move_memory/set_memory by doing either of:

using constant-propagation to see if the size is constant and, if so, inlining something fast
some sort of IC to make tiny cases super-fast (i.e., not calling out to libc)

which lets engines compile move_memory to a call to libc memmove and be done with it.

jfbastien · 2017-09-08T23:23:42Z

Wouldn't that remove the binary size saving?

lukewagner · 2017-09-11T18:47:54Z

That's an interesting point, but I wasn't aware that this feature was expected to reduce binary sizes by any significant amount in any case. It would certainly change the nature of the feature (and what engines needed to do) if move_memory was used aggressively for this purpose.

titzer · 2017-09-25T12:02:02Z

I think clamping to page sizes would cripple this feature and result in a proliferation of user code that tries to divide original requests into a page-multiple-sized chunk followed by cleanup code. That's a classic abstraction inversion.

lukewagner · 2017-09-25T16:09:07Z

Why would there not be a single implementation of memcpy in libc? In general, we haven't used "toolchains will have to implement" as an argument to include things in wasm (e.g., trig).

binji · 2017-10-26T23:40:13Z

Just coming back to this...

It seems like the wasm page size is bit too large of a granularity -- the microbenchmark shows benefits for sizes < 64K.

In general, we haven't used "toolchains will have to implement" as an argument to include things in wasm (e.g., trig).

True, though we also seem to have assumed a mostly symbiotic relationship with producers, where they'll produce good code so the VM doesn't have to perform complex optimizations. I think it's reasonable to assume the same here -- if we give guidelines for the producer (TBD 😉) then can the VM assume that it isn't going to have to optimize a constant 4 byte memcpy that should have just been a load/store pair?

jfbastien · 2017-10-27T04:32:59Z

if we give guidelines for the producer (TBD 😉) then can the VM assume that it isn't going to have to optimize a constant 4 byte memcpy that should have just been a load/store pair?

I'd hope so.

julian-seward1 · 2018-03-21T19:35:31Z

I think clamping to page sizes would cripple this feature and result in a proliferation of user code that tries to divide original requests into a page-multiple-sized chunk followed by cleanup code.

I agree. I think it would be safer to leave it at the byte granularity, assume that these operations will get used at both big and small sizes, and leave it to implementations to decide how (if at all) they want to optimise the small-size cases.

lukewagner · 2018-03-21T19:45:43Z

But in practice, if every wasm engine doesn't reliably optimize the small-constant-size case (which, from what I understand, is very commonly used) there will be a significant perf cliff which will require the toolchain (to provide reliable perf to its users) to do the lowering to loads anyway. With page-size-quanta, the responsibility for who does what is clear.

I don't see how this cripples the feature since this is an advanced optimization emitted only in special cases by compilers, not something anyone writes by hand in the source language.

julian-seward1 · 2018-03-21T20:07:05Z

Well, it will force producers to produce sequences that mix calls and inline code, which will be verbose and will also inherently not be optimised for more than one target processor (how do you make the unroll vs vectorise vs unrolled-and-vectorised vs call-out tradeoffs if you don't know what you're running on?) I'm also not convinced that the small-constant-size case is uncommon: I frequently see a lot of bits of memcopy/memmove being called from compiled Rust, when profiling natively compiled Rust.

I do understand what you're getting at though. Would it be feasible and/or helpful to add to the spec, an advisory section that states a minimum set of copy/fill cases that an optimising Wasm implementation can reasonably expect to do well and in-line? That is to say, add some kind of quasi performance-guarantees to the contract?

lukewagner · 2018-03-21T22:30:56Z

Yeah, I suppose a non-normative note that states the contract, even if informally, could effectively make it the browser's "fault" if they didn't optimize appropriately, so producers could feel confident in always emitting mem.copy/mem.set.

Also, thinking more about what a producer would need to do to optimally use a page-quanta mem.copy/mem.set, it does seem suboptimal. In particular, if we use the existing wasm 64kb page size, then that means up to (128kb - 2) bytes of suboptimal copying (possibly significantly suboptimal if the producer doesn't do the extra work to use 64-bit copies (and, later, 128-bit)). If we use a non-wasm-page-size (<64kb) quanta, it'll feel rather arbitrary and probably look increasingly silly as CPUs evolve. Also, a fully-optimized memcpy wasm impl might cost a few hundred bytes which adds to the fixed runtime overhead which we'd generally like to avoid for webby use cases.

I'm fine with byte-granularity, then.

jfbastien · 2018-03-21T23:25:56Z

@lukewagner would you rather also have an alignment hint, so you can do fancy stuff on top?

lukewagner · 2018-03-22T00:40:39Z

If you're talking about "page-aligned" hint, I don't think it would help (the case browsers would have to specially-optimize is when the size was small and constant; for all others we'd just call out to the libc memmove).

Or perhaps you mean 1/2/4/8/16-byte alignment? When coupled with a constant size, such that the engine is inlining a straight sequence of load/stores, I guess I could see this being useful for the same reason that the alignment hint is present on scalar loads/stores, but that is a separate point from the one I made/rescinded above.

sunfishcode added the question label Sep 8, 2017

eqrion mentioned this issue Sep 12, 2019

memory.copy|fill semantics limit optimizations for short constant lengths #111

Closed

vshymanskyy mentioned this issue Jan 31, 2020

Improve support of embedded systems with limited RAM AssemblyScript/assemblyscript#1089

Closed

laughinghan mentioned this issue May 21, 2020

Non-overlapping version of memory.copy? #121

Open

vshymanskyy mentioned this issue Dec 9, 2020

Wasm on Bare Metal ABIs (was: Wasm Cards) WebAssembly/design#1378

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expected to be used for large sizes? #1

Expected to be used for large sizes? #1

sunfishcode commented Sep 8, 2017

binji commented Sep 8, 2017

Uh oh!

lukewagner commented Sep 8, 2017

Uh oh!

jfbastien commented Sep 8, 2017

Uh oh!

lukewagner commented Sep 11, 2017

Uh oh!

titzer commented Sep 25, 2017 •

edited

Loading

Uh oh!

lukewagner commented Sep 25, 2017

Uh oh!

binji commented Oct 26, 2017

Uh oh!

jfbastien commented Oct 27, 2017

Uh oh!

julian-seward1 commented Mar 21, 2018

Uh oh!

lukewagner commented Mar 21, 2018

Uh oh!

julian-seward1 commented Mar 21, 2018

Uh oh!

lukewagner commented Mar 21, 2018

Uh oh!

jfbastien commented Mar 21, 2018

Uh oh!

lukewagner commented Mar 22, 2018

Uh oh!

Expected to be used for large sizes? #1

Expected to be used for large sizes? #1

Comments

sunfishcode commented Sep 8, 2017

binji commented Sep 8, 2017

Uh oh!

lukewagner commented Sep 8, 2017

Uh oh!

jfbastien commented Sep 8, 2017

Uh oh!

lukewagner commented Sep 11, 2017

Uh oh!

titzer commented Sep 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukewagner commented Sep 25, 2017

Uh oh!

binji commented Oct 26, 2017

Uh oh!

jfbastien commented Oct 27, 2017

Uh oh!

julian-seward1 commented Mar 21, 2018

Uh oh!

lukewagner commented Mar 21, 2018

Uh oh!

julian-seward1 commented Mar 21, 2018

Uh oh!

lukewagner commented Mar 21, 2018

Uh oh!

jfbastien commented Mar 21, 2018

Uh oh!

lukewagner commented Mar 22, 2018

Uh oh!

titzer commented Sep 25, 2017 •

edited

Loading