Skip to content
This repository was archived by the owner on Nov 3, 2021. It is now read-only.

memory.copy|fill semantics limit optimizations for short constant lengths #111

Closed
eqrion opened this issue Sep 12, 2019 · 53 comments
Closed

Comments

@eqrion
Copy link
Contributor

eqrion commented Sep 12, 2019

As discussed in #1, the expectation is that producers will use memory.copy|fill for short lengths in addition to long lengths. We’ve already seen this to be the case, and have been investigating a performance regression resulting from LLVM 9 using memory.copy for short constant lengths [1].

Part of that regression is in a sub-optimal OOL call to the system memmove, but to really get performance to par, we’d like to inline these short memory.copys to loads and stores.

This has turned out challenging to implement in a way that is better or equal to the Wasm loads and stores that were emitted previously.

There are several problems resulting from the following:

  1. We must assume no alignment for src, dest
  2. When a range is partially OOBs, we must write all bytes up until the region is OOBs
  3. We must copy correctly when there is overlap of src, dest

(1) and (2) are related. Because we don’t know the alignment of src or dest we cannot use wider transfers than a single byte at a time (e.g 32bit, 64bit, or 128bit) or else we’d be at risk of the final store being partially OOB and not writing all bytes up to the boundary due to misalignment of src or dest.

The system memmove can work around this by aligning the src/dest pointers, using wide transfer widths, and fixing up slop afterwards. But this isn’t feasible for inlined code.

The problem with (3) is that we need to generate two sequences of loads and stores. One for if src < dest where we need to copy from high -> low, and another for low -> high. This adds to code size and is a comparison that we didn’t need to do before. This could be potentially solved in a branchless way by using vector registers as a temporary buffer, but that has difficulty still with (1) and (2).

There seem to be several options that could improve this situation.

  1. We could ask LLVM to not emit memory.copy in these situations. memory.copy is not equivalent to memcpy, it’s memmove and has more strict semantics than LLVM requires. For example, with struct copies LLVM should know the alignment and that there is no overlap. Recovering this information at runtime is unfortunate. The downside to this is potential binary size increases, and limiting to loads and store widths that are defined in Wasm (e.g. no SIMD yet).

  2. We could modify the behavior for partially OOB’s ranges to not write any bytes at all. This would allow us to load all src bytes into vector registers, then store them to dest from high -> low. If there is a trap, it will happen immediately and nothing will be written. This fixes (1) and (3) by changing (2). The uncertainty here is around whether this is possible with a future ‘memory.protect’ instruction.

  3. We could add an alignment hint similar to the one used for plain loads and stores. We could then emit loads and stores at width of the alignment along with a guard checking the alignment. If the guard fails, we’d need to fall back to a slow path. If the guard succeeds, we’d have a guarantee for (2). This approach still has the problem of (3), and it doesn’t seem like adding an overlap hint would be feasible, due to the complexity of the guard required.

cc @lars-t-hansen @julian-seward1 @lukewagner

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1570112

@alexcrichton
Copy link
Contributor

@tlively do you have thoughts perhaps about this with respect to LLVM's assumption that memory.copy is always better than inline loads/stores? (looks related to this code).

@tlively
Copy link
Member

tlively commented Sep 12, 2019

Hi, I implemented the bulk memory operations in LLVM. The way LLVM is set up now is that I have two knobs to turn: the max number of stores to use before switching to using memory.copy with or without optimizing for size. Right now they are both set to 1, so any copy that would take at least two stores will be emitted as a memory.copy instead, no matter the optimization options. This is optimal for code size, but I would be happy to change it for better performance. Do you have data about the size at which memory.copy starts being more efficient than loads?

If we don't have reliable data to inform this tuning, I could also just restore the LLVM default of maxing out at 8 loads or 4 loads when optimizing for size. Is it the case that executing memory.copy is at least as efficient as calling an implementation of memcpy compiled to wasm?

@alexcrichton
Copy link
Contributor

The data that I've experience personally at least is sourced from Firefox, and AFAIK Firefox has not, before now, attempted to optimize memory.copy (and the efforts to optimize it led to this issue). From that experience, though, Firefox's current out-of-line memcpy can be up to 5x slower than a wasm-written memcpy.

In some sense though I don't think there's great data one way or the other of how to optimize codegen of memory.copy since engines are still in flux and in the process of fully optimizing memory.copy. Data we get today (for example through Firefox) needs a grain of salt when interpreting it because the engine may not be fully optimized yet. As currently specified though I think @eqrion has a good point that memory.copy may not be a great fit for inline loads/stores because it has stricter semantics (memory.copy doesn't know whether src/dst overlap and has to partially write bytes in the case of OOB, whereas inline loads/stores don't have to deal with either of these). In that sense even with possible theoretical change to memory.copy's specification, I think part of this issue is that it'd probably still be beneficial for LLVM to emit inline loads/stores for some memcpy operations instead of memory.copy for all of them.

@conrad-watt
Copy link
Contributor

One thing that I bounced off @rossberg at the last in-person meeting was a possible alternative semantics to copy/fill which makes it entirely non-deterministic which parts of the copy/fill complete in the case of an OOB trap.

The memory model already makes it non-deterministic which parts of the copy/fill are visible to racing concurrent threads, so this would only affect the observable state of memory after a trap, and would make the possible observations in this case mirror the possible observations in the racing case.

This has the advantage that it allows any copy/reverse-copy/ahead-of-time-check implementation to be conformant, and always allows copy/fill to be expressed in terms of Wasm load/stores, which would help the issue here. It's also still deterministic in the non-OOB case. Disadvantages are that it is more ugly to spec, and might introduce more non-determinism than is desirable if traps become easy to recover from in the future (e.g. if they become catchable).

I'd definitely prefer not re-introducing guaranteed ahead-of-time OOB traps. If we end up observably exposing the copying order later on anyway (e.g. through memory.protect or memory.shrink), then as @eqrion says, this problem would appear again.

@lukewagner
Copy link
Member

lukewagner commented Sep 12, 2019

Eagerly adding another source of non-determinism to wasm's very short list of non-determinism seems unfortunate. Are there any other issues with ahead-of-time OOB traps other than racy-memory.protect/memory.shrink?

For these two racy cases, first, I'm a bit doubtful that they will in fact be added:

  1. memory.protect seems hard to implement efficiently without hardware signal handling, and I'm not sure if that's a dependency we should add to wasm across all devices/systems
  2. memory.shrink could be implemented without signal handling, but it is hazardous to any host embedding that wants to pass out views to memory (typed arrays in JS, or just raw pointers in C) [edit: and, also, the intended resource-freeing benefits can be mostly realized with a safer/simpler madvise(DONTNEED)-like operation.]

But if they were added, then perhaps that is the only case where we would allow non-deterministic writes to have occurred, keeping all other cases fully deterministic?

@conrad-watt
Copy link
Contributor

conrad-watt commented Sep 12, 2019

Eagerly adding another source of non-determinism to wasm's very short list of non-determinism seems unfortunate. Are there any other issues with ahead-of-time OOB traps other than racy-memory.protect/memory.shrink?

The only other thing I could think of off-the-cuff would be a "thread kill" operation which promises to obey Wasm's formal semantics (i.e. interrupting in-between small-step reductions). But I'm not sure whether this could be implemented in the presence of reasonable compiler optimisations.

Possibly some memory mapping shenanigans could be another way the order becomes observable? I don't have any concrete ideas on that front.

For these two racy cases, first, I'm a bit doubtful that they will in fact be added:
...
But if they were added, then perhaps that is the only case where we would allow non-deterministic writes to have occurred, keeping all other cases fully deterministic?

Way back during one of the CG discussions on this, I floated something similar as the "irregular solution", and it seemed to be instinctively disliked because you get all-or-nothing "transactionality" in most cases, except where you have a racing protect; the preference being to have the same guarantees in all cases. It's possible that I was nudging too hard against it.

I've been passionate about this in the past because I believed that memory.protect would eventually definitely become real. This might not have been a well-informed position, and I wouldn't be as strident again if the issue were re-visited.

@lars-t-hansen
Copy link
Contributor

Are there any other issues with ahead-of-time OOB traps other than racy-memory.protect/memory.shrink?

Might an ahead-of-time trap sometimes cause the copy to be slower? Suppose you have a fast block-copy primitive a la REP MOVS and that we know definitively whether src < dest or dest < src, so that the copy direction is known (could be determined at runtime); and say the copy length is known or known to be small (ditto). Then on a system using VM tricks for bounds checking, I expect that we could probably just emit the REP MOVS inline and let the memory protection take care of bounds checks. Requiring explicit checks up-front would be slower.

Even if REP MOVS is not available, some unrolled and suitably scheduled load/store pairs would do the same trick (certainly for known length).

I don't mean these considerations to stand in the way of ahead-of-time OOB traps, since I think that for small known-length copies the compiler-to-wasm should just emit loads and stores anyway; for other cases, a little ahead-of-time overhead for memory.copy should be fine.

(I also think memory.copy should be renamed memory.move, given its semantics.)

@lars-t-hansen
Copy link
Contributor

What LLVM wants to express is "I know the size, I know the alignment, I know they don't overlap, and I just want good performance and small code size" and in reality the size is probably fairly small and divisible by the word size. We are tying ourselves in knots over shoehorning this into memory.copy, which is too general.

Thus another couple of stray thoughts:

One thought is that we keep saying that we'll eventually provide higher-level compression for bytecode precisely to capture (say) a sequence of loads and stores, so frankly bytecode size should not actually be much of an issue. (Though (a) that work seems to have stalled and (b) it doesn't quite address the performance argument since in principle some sequence of loads and stores can't always be compiled as cleverly as memcpy can be.)

Another thought is that for the desired semantics, we could provide load-multiple and store-multiple instructions (a la 68K and ARM). Load-multiple would load k units onto the eval stack; store-multiple would store k units from it; k is constant. If we provide ascending and descending indexing the compiler can opt for the bounds checking guarantees it needs. Pairs of these ops would be easy to recognize in any JIT and good code could be generated both scheduling-wise and bounds-checking-wise on platforms with decent bounds checking. Bytecode size should be fine. If we want to be conservative we provide only 32-bit variants initially, limit k to something smallish, and require word alignment of at least the store-multiple pointer (typically requires a run-time check, but a cheap one).

@eqrion
Copy link
Contributor Author

eqrion commented Sep 13, 2019

Hi, I implemented the bulk memory operations in LLVM. The way LLVM is set up now is that I have two knobs to turn: the max number of stores to use before switching to using memory.copy with or without optimizing for size. Right now they are both set to 1, so any copy that would take at least two stores will be emitted as a memory.copy instead, no matter the optimization options. This is optimal for code size, but I would be happy to change it for better performance. Do you have data about the size at which memory.copy starts being more efficient than loads?

If we don't have reliable data to inform this tuning, I could also just restore the LLVM default of maxing out at 8 loads or 4 loads when optimizing for size. Is it the case that executing memory.copy is at least as efficient as calling an implementation of memcpy compiled to wasm?

I don't think we have this data. The hard part to gathering it is that our OOL 'memory.copy' implementation is relatively slow and has some major room for improvement. Comparing unrolled wasm loads/stores against it at this point is likely to have them out perform for longer than reasonable.

I'm also not sure at this point if our 'memory.copy' is faster than 'memcpy' compiled to Wasm. Certainly for large sizes, but there's some incidental overhead that we'd like to remove that could cause Wasm memcpy to be faster for some cases right now. This should be fixed in the medium term, of course.

@eqrion
Copy link
Contributor Author

eqrion commented Sep 13, 2019

I should try to clarify one point I didn't really explain well.

Tweaking LLVM's heuristics sounds like it could definitely help this situation, but I generally don't see the 'memory.copy' instruction (as it is specified today) as being a good choice for structure copies and other things with 'short' constant lengths. The reason for this is that we lose alignment and overlap info that we would have to recover at runtime, whereas LLVM could just emit the loads and stores using this information. 'memory.copy' is still very useful, but it should focus on larger copies where it's likely to outperform Wasm.

There's a number of tweaks that could help here, I think @lukewagner's suggestion makes the most sense to me. If we could go with that, then we should be able to continue allowing LLVM to optimize for size by using memory.copy aggressively.

@lars-t-hansen also has a good point that it seems like we may be trying to shoehorn this use-case into 'memory.copy' when it doesn't make sense. I'm open to alternatives.

@lukewagner
Copy link
Member

@conrad-watt Ah, thanks for reminding me of the CG slides; I think jetlag interacts poorly with long-term memory creation :-P I think the new information here, not present in that discussion, but which is what we'd expect to hear during the Implementation phase, is that there are non-trivial performance/complexity implications of the different trapping semantics choices.

Also, in the (imho, unlikely) case we add memory.protect/memory.shrink in the future and say that memory.copy is transactional in a single-threaded and also non-racy context, but that it produces safe-but-non-deterministic results in the threaded+racy case seems roughly in line with what we say for races in general and so not terribly irregular.

@tlively
Copy link
Member

tlively commented Sep 13, 2019

Ok, it sounds like making memory.copy at least as efficient as a small sequence of loads would be a lot of spec and implementation work if it's possible at all and is probably not worth the effort. I will restore LLVM's default tuning parameters for optimizing memcpy intrinsics to load-store sequences.

It sounds like it would be a reasonable expectation in the medium-term that memory.copy will perform about as well as memcpy compiled to wasm, so I will continue lowering memcpy intrinsics that aren't lowered to load-store sequences to memory.copy instructions rather than calls to memcpy.

@conrad-watt
Copy link
Contributor

Also, in the (imho, unlikely) case we add memory.protect/memory.shrink in the future and say that memory.copy is transactional in a single-threaded and also non-racy context, but that it produces safe-but-non-deterministic results in the threaded+racy case seems roughly in line with what we say for races in general and so not terribly irregular.

I would be fine with this. It has the advantage of being a reasonable balance between a neater/potentially less implementation-onerous semantics now, and a still-acceptable semantics in the (unlikely?) case where protect becomes real.

If we ran with this idea, we'd be effectively reverting the "up to the last byte" OOB behaviour changes for instantiation and bulk memory.

@eqrion
Copy link
Contributor Author

eqrion commented Sep 16, 2019

Okay, I’d like to formally propose changing the trapping behavior here so that we do not partially write in the case of a range that is partially OOBs. This would effectively revert the trapping behavior back to what it originally was.

To summarize this thread, this change is motivated by the discovered implementation complexity of trying to generate an inlined version of memory.fill/copy for short constant lengths. By reverting this behavior we believe we can generate more efficient code by sidestepping problems with partial writes when source or dest is unaligned.

The major concern that motivated the change to allow partial writes was the interaction with a future memory.map instruction. There’s some uncertainty about whether memory.map will happen due to reliance on hardware signal handling or shadow memory. If it does occur, I believe we can work around this with Luke’s suggestion by saying that memory.copy is transactional in a single-threaded and also non-racy context, but that it produces safe-but-non-deterministic results in a threaded+racy context.

The semantics of memory.init/fill/copy and table.init/fill/copy would needed to be changed. The table instructions would be modified to keep them symmetrical with the memory instructions. Additionally active segment initialization for data and element segments would be affected.

This would effectively be partial reverts of #50 (for bulk-mem instructions) and #57 (for active segment initialization).

If there are no further arguments against this change here, I would like to add this to the next CG meeting agenda.

@tlively
Copy link
Member

tlively commented Sep 16, 2019

I'm not sure there's any reason to change the spec proposal for this issue. The issue is that code becomes slower when using bulk memory operations for small fixed-size copies. We can fix this the hard way by changing the spec and doing a lot of optimization work in all the engines or the easy way by simply not using bulk memory operations for small fixed-size copies.

@eqrion
Copy link
Contributor Author

eqrion commented Sep 16, 2019

Yes, it's true that we can resolve the immediate issue by changing LLVM to not use the current bulk memory ops for small fixed-size copies. However, it seems like we would be missing a chance to improve the specification before it ships so that it is useful for this use case.

I guess it depends on the potential code size and quality improvements we can achieve, but it would be nice to be able to have an instruction that is performant for this use case.

@tlively
Copy link
Member

tlively commented Sep 16, 2019

IIUC, memory.copy with small constant size could not be inlined efficiently without having alignment hints like a normal load or store and also being able to indicate the proper copy direction. All the concrete spec changes proposed in this thread have had to do with observable behaviors in the presence of memory protection changes and would not completely solve the problem of efficient inlining. If you want an instruction that meant for doing these small, easily-inlinable copies you would need something else that looks like the multi-load and multi-store instructions that @lars-t-hansen discussed in #111 (comment).

@eqrion
Copy link
Contributor Author

eqrion commented Sep 16, 2019

It should be possible to implement an efficient inline copy with just reverting the partial write behavior. Apologies, I made a veiled reference to this in the OP but should have made it more explicit.

Essentially, you load all of the src bytes into registers (using SSE/vector registers there should be enough storage for most inline sizes we care about), then store the src bytes from high to low. If any src byte is OOBs, you'll trap before writing. If any dest byte is OOBs, you'll trap on the first write. This was taken from [1] minus the complicated signal handling logic for handling partial writes.

Additionally adding alignment information could help as well, but that's a different spec change that would need discussion.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1570112#c23

@eqrion
Copy link
Contributor Author

eqrion commented Oct 1, 2019

@conrad-watt
Copy link
Contributor

The main thing that stuck in my mind after the last CG was the following.

Even if we commit to the byte-wise semantics now, I wouldn't be willing to bet that all of today's implementations would still conform to those semantics if memory.protect was implemented. Even with the naive implementation just calling memmove, the C/C++ level semantics obviously doesn't guarantee anything like this - you'd have to tunnel through some layers of abstraction to ensure conformance. For example, an implementation of memmove using a real intermediate buffer could not be used to faithfully implement memory.copy in the presence of memory.protect. This situation would get even less clear if the "race to the bottom" on performance suggested by @lars-t-hansen were to happen, with hyper-optimised implementations doing weird things based on assumptions about bounds.

To me, this suggests that there has to be a point at which we're ok with non-determinism in interrupted copy/fill (e.g. due to protect). I think accepting it in the presence of a racy protect is ok - you already have non-determinism from the thread-local interleaving at that point - the only "determinism" you can preserve is something about the fiction of the abstract thread-local semantics for copy/fill.

All this to say, I'm starting to feel that I made a mistake by advocating for the byte-wise semantics, and I'd support going back to the bounds checked semantics, so long as people could be on board with the underlying copy/write order for bulk instructions being specified as non-deterministic, if it ever became observable (for now, it would be sufficient to specify it in the natural byte-by-byte way, since it wouldn't be observable).

@rossberg
Copy link
Member

rossberg commented Oct 2, 2019

One (perhaps minor) disadvantage is that the proposed change destroys one of the properties we tried to establish with the simplifications discussed at the last F2F meeting, namely that every bulk operation is equivalent to a sequence or loop of individual loads/stores. In particular, a memory.copy where src > dst with potential overlap, such that it must copy low to high, can no longer be transformed that way, but requires at least an extra read. This also means that there will probably be a special case in the semantics.

@penzn
Copy link
Contributor

penzn commented Oct 3, 2019

Sorry, was not aware of prior discussions. I feel that byte-wise semantics would be tricky to implement even if alignment is known. @eqrion I took a quick look on rep movs code in the bug, it seem reasonable (as far as I can understand the implementation), we may need to build it to understand why it was not performing well (cc @rrwinterton).

@rossberg
Copy link
Member

rossberg commented Oct 4, 2019

I've come around to be in favour of the proposed change.

@RossTate
Copy link

RossTate commented Oct 8, 2019

In particular, a memory.copy where src > dst with potential overlap, such that it must copy low to high, can no longer be transformed that way, but requires at least an extra read.

There is a middle ground that addresses @rossberg's concern while also enabling the proposed optimization. The semantics can be that a trap can occur only if at any point during the time of the copy there is an address in the specified range that is not allowed to be written to (by the copying thread). In such a situation, there is no guarantee as to which writes were completed. (And the instruction can only complete successfully if all writes were completed successfully.)

The current proposal (i.e. the proposal before the CG meeting) and the proposed fix (during the CG meeting) are both special cases of this semantics, and they are both efforts to determinize a corner-case of the semantics. But if we take @conrad-watt's observation that non-determinism is inevitable in this situation, we see that the middle ground here is only non-deterministic (with respect to trapping) in the presence of racy changes to memory validity, and only non-deterministic (with respect to partial writes) in the presence of racy reads or if someone catches the trap. So the middle ground has some non-determinism, but it's essentially only visible to an outside observer.

I'm not particularly advocating for any of the three choices - I just wanted to provide a third option in case it was helpful.

@rossberg
Copy link
Member

rossberg commented Oct 8, 2019

@RossTate, AFAICS, it wouldn't be possible to unroll an instruction with such a semantics into an equivalent sequence of individual accesses (because that fully determines which writes occur before a trap). So I am not sure how this semantics would address the concern I brought up. I am also not sure how it could, technically, be defined in such a way that it actually is observably different from the completely non-deterministic semantics that @conrad-watt mentioned.

@conrad-watt
Copy link
Contributor

conrad-watt commented Oct 8, 2019

If there's no initial bounds check, I agree that @RossTate's suggestion sounds like (#111 (comment)). This would allow unrolling, because it would just be a narrowing of the non-deterministic possibilities (i.e., instead of being allowed to trap anywhere having written "anything", the program now follows the semantics of the unrolled writes).

Parroting @lukewagner (#111 (comment)), having an initial bounds check allows the semantics to be deterministic in the vast majority of cases. We don't know what form shrink and protect will take, if they're even implemented at all, and I agree that it would be nice to protect WebAssembly's determinism to the best of our ability. Even if they are spec'd, we might decide in future that shrink and protect should only be used with a new kind of "mappable" memory, and in this case "non-mappable" memory could stay fully deterministic.

@penzn
Copy link
Contributor

penzn commented Oct 8, 2019

Aside of describing writes with overlapping source and destination, what benefits do the "sequential" semantics have? In case of out of bounds access there would be some data that did not get written, and recovering from it would not be fundamentally different with either sequential or bulk semantics. Non-determinism would not really make that easier, as it would require re-attempting the whole operation, just like in "bulk" semantics. All of the tree possible semantics would require some clarifications for the shared memory case anyway, with shared memory semantics being at least somewhat non-deterministic.

IMO, having the operation complete or fail as a whole makes the implementation much more transparent.

@rossberg
Copy link
Member

rossberg commented Oct 8, 2019

@penzn, one benefit is that a compiler can safely transform one form into the other and vice versa, e.g., for optimisation purposes, without having to prove the absence of (or account for) tricky corner cases. Being able to reduce a complex operation to a simple one also tends to simplify the language semantics and thus reasoning about programs.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Nov 5, 2019
… r=lth

This commit changes all bulk-memory instructions to perform up-front bounds
checks and trap if any access would be out-of-bounds before writing.

This affects:
 * memory.init,copy,fill
 * table.init,copy,fill
 * data segment instantiation (reduces to memory.init)
 * elem segment instantiation (reduces to table.init)

Spec issue: WebAssembly/bulk-memory-operations#111

Differential Revision: https://phabricator.services.mozilla.com/D51755

--HG--
extra : moz-landing-system : lando
xeonchen pushed a commit to xeonchen/gecko that referenced this issue Nov 5, 2019
… r=lth

This commit changes all bulk-memory instructions to perform up-front bounds
checks and trap if any access would be out-of-bounds before writing.

This affects:
 * memory.init,copy,fill
 * table.init,copy,fill
 * data segment instantiation (reduces to memory.init)
 * elem segment instantiation (reduces to table.init)

Spec issue: WebAssembly/bulk-memory-operations#111

Differential Revision: https://phabricator.services.mozilla.com/D51755
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Nov 6, 2019
… r=lth

This commit changes all bulk-memory instructions to perform up-front bounds
checks and trap if any access would be out-of-bounds before writing.

This affects:
 * memory.init,copy,fill
 * table.init,copy,fill
 * data segment instantiation (reduces to memory.init)
 * elem segment instantiation (reduces to table.init)

Spec issue: WebAssembly/bulk-memory-operations#111

Differential Revision: https://phabricator.services.mozilla.com/D51755

UltraBlame original commit: 801e6ae4efdeb516250c6800e31f969d337b8f40
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Nov 6, 2019
… r=lth

This commit changes all bulk-memory instructions to perform up-front bounds
checks and trap if any access would be out-of-bounds before writing.

This affects:
 * memory.init,copy,fill
 * table.init,copy,fill
 * data segment instantiation (reduces to memory.init)
 * elem segment instantiation (reduces to table.init)

Spec issue: WebAssembly/bulk-memory-operations#111

Differential Revision: https://phabricator.services.mozilla.com/D51755

UltraBlame original commit: 801e6ae4efdeb516250c6800e31f969d337b8f40
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Nov 6, 2019
… r=lth

This commit changes all bulk-memory instructions to perform up-front bounds
checks and trap if any access would be out-of-bounds before writing.

This affects:
 * memory.init,copy,fill
 * table.init,copy,fill
 * data segment instantiation (reduces to memory.init)
 * elem segment instantiation (reduces to table.init)

Spec issue: WebAssembly/bulk-memory-operations#111

Differential Revision: https://phabricator.services.mozilla.com/D51755

UltraBlame original commit: 801e6ae4efdeb516250c6800e31f969d337b8f40
eqrion added a commit to eqrion/wasm-bulk-memory-operations that referenced this issue Nov 7, 2019
Spec issue: WebAssembly#111

This commit changes the semantics of bulk-memory instructions to perform
an upfront bounds check and trap if any access would be out-of-bounds without
writing.

This affects the following:
 * memory.init/copy/fill
 * table.init/copy (fill requires reftypes)
 * data segment init (lowers to memory.init)
 * elem segment init (lowers to table.init)
eqrion added a commit to eqrion/wasm-bulk-memory-operations that referenced this issue Nov 7, 2019
Spec issue: WebAssembly#111

This commit changes the semantics of bulk-memory instructions to perform
an upfront bounds check and trap if any access would be out-of-bounds without
writing.

This affects the following:
 * memory.init/copy/fill
 * table.init/copy (fill requires reftypes)
 * data segment init (lowers to memory.init)
 * elem segment init (lowers to table.init)
eqrion added a commit to eqrion/wasm-bulk-memory-operations that referenced this issue Nov 7, 2019
Spec issue: WebAssembly#111

This commit changes the semantics of bulk-memory instructions to perform
an upfront bounds check and trap if any access would be out-of-bounds without
writing.

This affects the following:
 * memory.init/copy/fill
 * table.init/copy (fill requires reftypes)
 * data segment init (lowers to memory.init)
 * elem segment init (lowers to table.init)
rossberg pushed a commit that referenced this issue Nov 14, 2019
Spec issue: #111

This commit changes the semantics of bulk-memory instructions to perform
an upfront bounds check and trap if any access would be out-of-bounds without
writing.

This affects the following:
 * memory.init/copy/fill
 * table.init/copy (fill requires reftypes)
 * data segment init (lowers to memory.init)
 * elem segment init (lowers to table.init)
@rossberg
Copy link
Member

@eqrion, I think this can be closed.

@eqrion eqrion closed this as completed Nov 21, 2019
sbc100 added a commit to WebAssembly/wabt that referenced this issue Dec 26, 2019
The two primary changes involved are:

1. Removal of `assert_return_canonical_nan`/`arithetic nan` in favor of
   special `nan:canonical`/`nan:arithmetic` constants that can only be
   used in test expectations.
   See: WebAssembly/spec#1104

2. New trapping behaviour for bulk memory operations.  Range checks are
   now performed up front for opterations such as memory.fill and
   memory.copy.
   See: WebAssembly/bulk-memory-operations#111
   And: WebAssembly/bulk-memory-operations#123
   The old behaviour is still kept around to support table.fill which
   is defined in reference-types proposal and has yet to be updated.
sbc100 added a commit to WebAssembly/wabt that referenced this issue Dec 26, 2019
The two primary changes involved are:

1. Removal of `assert_return_canonical_nan`/`arithetic nan` in favor of
   special `nan:canonical`/`nan:arithmetic` constants that can only be
   used in test expectations.
   See: WebAssembly/spec#1104

2. New trapping behaviour for bulk memory operations.  Range checks are
   now performed up front for opterations such as memory.fill and
   memory.copy.
   See: WebAssembly/bulk-memory-operations#111
   And: WebAssembly/bulk-memory-operations#123
   The old behaviour is still kept around to support table.fill which
   is defined in reference-types proposal and has yet to be updated.

3. nullref is now permitted in the text and binary format.
sbc100 added a commit to WebAssembly/wabt that referenced this issue Dec 26, 2019
The two primary changes involved are:

1. Removal of `assert_return_canonical_nan`/`arithetic nan` in favor of
   special `nan:canonical`/`nan:arithmetic` constants that can only be
   used in test expectations.
   See: WebAssembly/spec#1104

2. New trapping behaviour for bulk memory operations.  Range checks are
   now performed up front for opterations such as memory.fill and
   memory.copy.
   See: WebAssembly/bulk-memory-operations#111
   And: WebAssembly/bulk-memory-operations#123
   The old behaviour is still kept around to support table.fill which
   is defined in reference-types proposal and has yet to be updated.

3. nullref is now permitted in the text and binary format.
sbc100 added a commit to WebAssembly/wabt that referenced this issue Jan 9, 2020
The two primary changes involved are:

1. Removal of `assert_return_canonical_nan`/`arithetic nan` in favor of
   special `nan:canonical`/`nan:arithmetic` constants that can only be
   used in test expectations.
   See: WebAssembly/spec#1104

2. New trapping behaviour for bulk memory operations.  Range checks are
   now performed up front for opterations such as memory.fill and
   memory.copy.
   See: WebAssembly/bulk-memory-operations#111
   And: WebAssembly/bulk-memory-operations#123
   The old behaviour is still kept around to support table.fill which
   is defined in reference-types proposal and has yet to be updated.

3. nullref is now permitted in the text and binary format.
sbc100 added a commit to WebAssembly/wabt that referenced this issue Jan 9, 2020
The two primary changes involved are:

1. Removal of `assert_return_canonical_nan`/`arithetic nan` in favor of
   special `nan:canonical`/`nan:arithmetic` constants that can only be
   used in test expectations.
   See: WebAssembly/spec#1104

2. New trapping behaviour for bulk memory operations.  Range checks are
   now performed up front for opterations such as memory.fill and
   memory.copy.
   See: WebAssembly/bulk-memory-operations#111
   And: WebAssembly/bulk-memory-operations#123
   The old behaviour is still kept around to support table.fill which
   is defined in reference-types proposal and has yet to be updated.

3. nullref is now permitted in the text and binary format.
sbc100 added a commit to WebAssembly/wabt that referenced this issue Jan 9, 2020
The two primary changes involved are:

1. Removal of `assert_return_canonical_nan`/`arithetic nan` in favor of
   special `nan:canonical`/`nan:arithmetic` constants that can only be
   used in test expectations.
   See: WebAssembly/spec#1104

2. New trapping behaviour for bulk memory operations.  Range checks are
   now performed up front for opterations such as memory.fill and
   memory.copy.
   See: WebAssembly/bulk-memory-operations#111
   And: WebAssembly/bulk-memory-operations#123
   The old behaviour is still kept around to support table.fill which
   is defined in reference-types proposal and has yet to be updated.

3. nullref is now permitted in the text and binary format.
sbc100 added a commit to WebAssembly/wabt that referenced this issue Jan 9, 2020
The two primary changes involved are:

1. Removal of `assert_return_canonical_nan`/`arithetic nan` in favor of
   special `nan:canonical`/`nan:arithmetic` constants that can only be
   used in test expectations.
   See: WebAssembly/spec#1104

2. New trapping behaviour for bulk memory operations.  Range checks are
   now performed up front for opterations such as memory.fill and
   memory.copy.
   See: WebAssembly/bulk-memory-operations#111
   And: WebAssembly/bulk-memory-operations#123
   The old behaviour is still kept around to support table.fill which
   is defined in reference-types proposal and has yet to be updated.

3. nullref is now permitted in the text and binary format.
sbc100 added a commit to WebAssembly/wabt that referenced this issue Jan 9, 2020
The two primary changes involved are:

1. Removal of `assert_return_canonical_nan`/`arithetic nan` in favor of
   special `nan:canonical`/`nan:arithmetic` constants that can only be
   used in test expectations.
   See: WebAssembly/spec#1104

2. New trapping behaviour for bulk memory operations.  Range checks are
   now performed up front for opterations such as memory.fill and
   memory.copy.
   See: WebAssembly/bulk-memory-operations#111
   And: WebAssembly/bulk-memory-operations#123
   The old behaviour is still kept around to support table.fill which
   is defined in reference-types proposal and has yet to be updated.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants