Polymorphic devirtualization and funcref not being a subtype of eqref? #239
Description
As mentioned in the last GC CG meeting, devirtualization helps quite a lot on j2cl, something like a 41% speedup. That handles the case of a single call target being possible, that is, we load from a vtable and binaryen can infer that the vtable must contain a particular function reference, and so we replace the load with that reference, which then allows the call to become direct and even inlined.
Looking into the polymorphic case, that is, where there is a small number of possible function references but more than one, I was hoping to do something like this:
(struct.get vtable)
=>
(select
(first possible function)
(second possible function)
(ref.eq (struct.get vtable) (first possible function))
)
Later optimizations can then replace a call_ref
of a select
of two function references into an if over two possible calls, etc. However, the condition of the select hits a problem, as funcref is not a subtype of eqref - function references cannot be compared for equality.
I couldn't find a detailed discussion of that, but IIRC the motivation was to allow VMs to optimize things like folding two identical functions into a single one, etc. That sounds reasonable, but the devirtualization issue shows that might be an optimization tradeoff which is not obvious?
Gathering some data, if I disable validation in binaryen then allowing 2 functions instead of 1 leads to 14K more places where we can turn a get from a vtable into a constant (well, a select over constants). Allowing 3 raises that to 17K, and 4 to 19K. (At some amount this becomes less useful, though, of course.) For comparison, the total number of call_ref
s is 42K, so even with 2 functions we are talking about potentially optimizing away a third of indirect call sites, which sounds like it could be very significant.
Alternatives:
- Rewrite the types, replacing the funcref with an
i32
index that we can select on. The problem is that we'd need to replace the vtable field in all relevant subtypes and supertypes, which may not be practical in general. Adding an additional field is another option, but would add memory and runtime overhead. - Use
ref.test
on the vtable. That might work if the different functions come from different types, which I believe is the general case (but I'd need to check). How fast isref.test
expected to be? - Perform such devirtualization in the VM and not the toolchain. Doing it statically is probably not reasonable (as a large LTO-style optimization), but using runtime profiling data it might be.