-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Wasm GC] wasm-ctor-eval: Handle cycles of data #5685
Conversation
Co-authored-by: Thomas Lively <tlively@google.com>
Sorry for the review delay here. This is still on my list of things to get to. The code LGTM fwiw, and I just have to look over the large test file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience with the back-and-forth here. TODO comment looks good, and going with the simplest possible algorithm seems reasonable for now.
Another thing we might want to consider at some point is that all these globals will root objects that might otherwise be garbage collected. It would be possible in principle to avoid creating new globals at all, but it would be a code size and performance trade off because the start function would have to traverse the original globals to find the patch points.
(nop) | ||
(loop | ||
(global.set $global | ||
(i32.const 999) ;; this must not be applied to the global! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it would be valid to apply 999 to the global, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess in this simple module it isn't observable. But in general that would be dangerous. I'm adding this comment now:
;; (It is true that in this simple module it would be ok to set 999 to the
;; global, but if the global were exported for example then that would not
;; be the case, nor would it be the case if the code did $global = $global + 1
;; or such. That is, since the global.set is not evalled away, its effects
;; must not be applied; we do both atomically or neither, so that the
;; global.set's execution only happens once.)
test/lit/ctor-eval/gc-cycle.wast
Outdated
;; CHECK-NEXT: ) | ||
;; CHECK-NEXT: ) | ||
|
||
;; CHECK: (func $0_3 (type $none_=>_none) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you wrote before that naming the function $test
won't make wasm-ctor-eval preserve its name. Why does function test
get renamed to 0_3
? Shouldn't wasm-ctor-eval be able to preserve the function names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can at least keep the base name, I guess, even if it adds a prefix... I added base names now.
It adds a prefix because in general an export might have another use. So we copy the function, modify that, and make the export use that one (so that other uses still use the original, unmodified function). As a result the export points to a function with a different name than before.
Hmm, we can add globals for objects as we go, that are later GC'd in later execution. But when we get to that later execution we'd be ok, I think. It's true the globals would remain, but globaldce will remove them, as they have no uses. The one thing I was worried about as I read your comment was the start function that we add to, but we clear it each time, so I think that works out too? |
I don't understand how all references to the new globals will be optimized out. Surely we can't optimize out the uses we add to the start function? |
We clear those out each time we apply the state. Imagine we have this: function foo() {
var x = { y: 10, z: null };
x.z = x;
// time A
log(x.y);
// time B
return 10;
} After we eval up to "time A" the program will look like this: var global$x = { y: 10, z: null };
function start() {
global$x.z = global$x;
}
function foo() {
var x = global$x;
log(x.y);
// time B
return 10;
} When we get to "time B" we have this: var global$x = { y: 10, z: null };
function start() {
}
function foo() {
return 10;
} Note that we cleared out the start function. We clear it out in each step and then re-fill it based on the state currently alive, but at this time A subtle point is that the global is not enough to keep things alive, since ctor-eval evals globals and then evals functions (the same as wasm modules do when they are initialized), so when it works on |
I still must be misunderstanding fundamental. Where did the call to |
Imagine it's evalled out, that is, it's a |
But it's the global state that gets serialized that I'm worried about forming new roots. Are you saying that when wasm-ctor-eval finishes, all the global state that has been serialized has been removed? In that case, why do we serialize any of the global state at all? |
We serialize the state that is still needed. The program needs to execute as if we didn't run Imagine we have this: function foo() {
var x = { a: 10, b: null };
var y = { a: 20, b: null };
x.b = x;
y.b = y;
// .. use x and y in some computations, but do not capture refs to them ..
return y;
} When we eval up to the comment (including), we'd end up with this: var global$y = { a: 10, b: null };
start {
global$y.b = global$y;
}
function foo() {
return global$y;
} We ended up serializing one object, At least, that's how it should work. I'm not 100% sure current limitations allow it to succeed perfectly, there are some hacks there. But if not it should be fixable. |
Aha, so in that case |
Oh right, that's true. If it turns out to be a serious issue in VMs then we could work on an optimization pass to "localize" globals... |
Ok yes, this was the problem I was trying to describe. I think it would be very difficult to try to localize globals, but it would be possible to change the back-patching logic for cycles to at least avoid creating additional globals used just for back-patching. |
We don't create any new globals just for back-patching, though - we define all GC allocations in globals anyhow. That's the simple thing to do so that they can be used anywhere they are needed. Figuring out which GC allocations don't need to be in globals seems as hard as localizing globals, at first thought? |
No, it should be easier because wasm-ctor-eval knows what the original globals are, and it can construct their values as trees of allocation operations in constant expressions combined with tree-traversing back patching logic in the start function for cycles. That doesn't require any kind of global dataflow analysis that a general localization optimization would require. |
Ah, now I see what you mean @tlively . Yes, it might make sense to try to avoid adding globals here because of that. |
A cycle of data is something we can't just naively emit as wasm globals. If at runtime we end up, for example, with an object A that refers to itself, then we can't just emit (global $A (struct.new $A (global.get $A))) The struct.get is of this very global, and such a self-reference is invalid. So we need to break such cycles as we emit them. The simple idea used here is to find paths in the cycle that are nullable and mutable, and replace the initial value with a null that is fixed up later in the start function: (global $A (struct.new $A (ref.null $A))) (func $start (struct.set (global.get $A) (global.get $A))) ) This is not optimal in terms of breaking cycles, but it is fast (linear time) and simple, and does well in practice on j2wasm (where cycles in fact occur).
A cycle of data is something we can't just naively emit as wasm globals. If
at runtime we end up, for example, with an object A that refers to itself,
then we can't just emit
The struct.get is of this very global, and such a self-reference is invalid. So
we need to break such cycles as we emit them. The simple idea used here
is to find paths in the cycle that are nullable and mutable, and replace the
initial value with a null that is fixed up later in the start function:
This was left as a TODO, but it turns out it is common in e.g. j2wasm output.