-
Notifications
You must be signed in to change notification settings - Fork 74
Polymorphic devirtualization and funcref not being a subtype of eqref? #239
Comments
I think we need to have a meeting where we develop a more concerted plan to address performance issues. For example, we have preliminary data that suggests implementing This is not to say we should not explore the sort of speculative specialization you're investigating; in fact, speculative specialization based on the class (i.e. the whole v-table, rather than individual methods) is pretty common. From what I can tell, you're essentially implementing the first step of guarded method inlining. But since you're doing this for v-table functions, it sounds like you're only doing this for class methods, whereas there's even more to be gained for interface methods. |
@RossTate Interesting, I'd definitely be curious to hear more about |
I think this might still be worth discussing since we can't really devirtualize bimorphic call sites on the producer side without being able to compare function references for equality. |
Can you compare the metaobjects instead? |
Yeah, we might be able to do a |
@titzer @tlively Using the metaobject is a good idea, yes. We do something like that in the GSI pass: (struct.get $vtable-type i
(..ref..)
)
;; => ;;
(select
(value1)
(value2)
(ref.eq
(..ref..)
(global.get $global1)
)
) If a location reads from a particular vtable type, and there are just two vtables of that type in the whole program, and they are stored in globals (all assumptions that are true in J2Wasm in some cases), then we can do a select between their values. Sadly, this didn't give us significant benefits so far. I'll investigate and perhaps expand that pass that at some point. Note though that even if we could do this perfectly, it would not be as good as a function comparison I think, since in theory comparing to a global is slower than comparing to a function (the global is known at link time but the function at compile time). Not sure if that matters though. Anyhow, I wouldn't block the MVP on this. It does seem like there are downsides to allowing function comparisons (like not being able to merge all duplicate functions). And in principle we can always extend post-MVP by allowing comparisons if we decide it's worth it, as that would just make more code validate. |
I don't have data on this yet, but some notes: To clarify better the case we can optimize better with comparable function references, that is when there are just a few target functions despite having many vtables/itables: vtableCat = [A];
vtableDog = [B];
vtableRabbit = [A];
vtableBird = [B];
vtableFish = [A]; In this case vtable[0]() can be optimized to if (vtable[0] == A) A() else B() Larger, but now we have direct calls which we can inline and further optimize. I'll gather data for J2Wasm on this, but that data would just be on the current state of things there, and the real question is how common that situation will be in the future, compared to the other situation of just 2 possible tables (where we can compare the tables themselves): vtableCat = [A];
vtableDog = [B];
vtable[0]()
=>
if (vtable == vtableCat) A() else B() Another thought, we could replace all function references in vtables and itables with indexes into a big (immutable in practice) table, and then use |
I think we had tentatively started leaning towards function subtyping on |
The best path forward on this issue would be to get some experimental performance data to see how much better it would be to optimize based on the function references directly. |
I gathered some empirical data today. The J2Wasm binary has
There might still be potential to optimize with function comparisons, but it doesn't seem that promising. |
Thanks, @kripken! It sounds like the next step here is to follow-up on the other optimization opportunities you've found so far then measure again to see how many cases are left where comparing function references directly could still help. Does that sound right? |
Yes. But given the low amount of promise here I'd be fine with leaving this to after the MVP, as I said before. (For posterity, here is the hackish code I used to measure this, if I or someone else wants to remeasure later). |
As mentioned in the last GC CG meeting, devirtualization helps quite a lot on j2cl, something like a 41% speedup. That handles the case of a single call target being possible, that is, we load from a vtable and binaryen can infer that the vtable must contain a particular function reference, and so we replace the load with that reference, which then allows the call to become direct and even inlined.
Looking into the polymorphic case, that is, where there is a small number of possible function references but more than one, I was hoping to do something like this:
Later optimizations can then replace a
call_ref
of aselect
of two function references into an if over two possible calls, etc. However, the condition of the select hits a problem, as funcref is not a subtype of eqref - function references cannot be compared for equality.I couldn't find a detailed discussion of that, but IIRC the motivation was to allow VMs to optimize things like folding two identical functions into a single one, etc. That sounds reasonable, but the devirtualization issue shows that might be an optimization tradeoff which is not obvious?
Gathering some data, if I disable validation in binaryen then allowing 2 functions instead of 1 leads to 14K more places where we can turn a get from a vtable into a constant (well, a select over constants). Allowing 3 raises that to 17K, and 4 to 19K. (At some amount this becomes less useful, though, of course.) For comparison, the total number of
call_ref
s is 42K, so even with 2 functions we are talking about potentially optimizing away a third of indirect call sites, which sounds like it could be very significant.Alternatives:
i32
index that we can select on. The problem is that we'd need to replace the vtable field in all relevant subtypes and supertypes, which may not be practical in general. Adding an additional field is another option, but would add memory and runtime overhead.ref.test
on the vtable. That might work if the different functions come from different types, which I believe is the general case (but I'd need to check). How fast isref.test
expected to be?The text was updated successfully, but these errors were encountered: