You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think I may have a solution for this, but it'd be awesome if it could be improved on.
For standalone traits (no supertrait), nothing changes. For traits with supertraits, I'm adding two types of pointer offset to the vtable. The first, Oa is the offset to trait A's vtable. The second, Ωa is the offset from trait A's vtable pointer to its offset table, where the O values for all its supertraits are listed in order. Here's what it looks like with these example traits:
Trait bounds
------------
A (requires only A)
B (requires only B)
C: A (requires C, A)
D: A + B (requires D, A, B)
E: C + B (requires E, C, A, B)
F: C + D (requires F, C, A, D, B)
G: C + E + F (requires G, C, A, E, B, F, D)
Vtable layout
* indicates the actual location of the vtable pointer.
Ω values are negative offsets.
-------------
A: *A
B: *B
C: Oa Ωc *C A
|__|
Ωc
D: Oa Ob Ωd *D A B
|_____|
Ωd
E: Oc Oa Ob Ωe *E Ωc C A B
| |__| |
|___Ωc___|
Ωe
F: Oc Oa Od Ob Oa Ob Ωf *F Ωc C A Ωd D B
| |__| |_____|
|___Ωc______| Ωd
Ωf
* this is a naive implementation, I'll talk about condensing it in a minute.
G: Oc Oa Oe Ob Of Od Oc Oa Ob Oc Oa Od Ob Ωg *G Ωc C A Ωe E B Ωf F Ωd D
| |__| | |_____|___________|
|___Ωc____________|_____Ωd_| Ωf
Ωg Ωe
* some unnecessary repetitions here too.
Ok, so now that the basics of this are out of the way, the only overlaps I showed so far were the trivial ones - where if you scan the existing set of offsets you find a subset in the same order as the one you're trying to add. We definitely want to rearrange the order of the bounds though to condense the offset tables as much as possible. In this case, we can reorder them and get this:
F: Oc Oa Ob Od Ωf *F Ωc C A Ωd D B
| |__| | |
| |Ωc___| |
|____ Ωd____|
Ωf
G: Oc Oa Ob Od Oe Of Ωg *G Ωc C A Ωe E B Ωf F Ωd D
| |__| | | |
| |Ωc___| | |
|_____Ωd_| | |
|___Ωe______| |
|_____Ωf__________|
Ωg
I probably could have picked a better example, because there will be cases where repetition is unavoidable, and I imagine there might even be cases where the optimal solution for the whole program involves leaving a hole in the offset table for some trait that basically acts as a "wildcard" where combined bounds can fill in whatever they need to optimize the size of their offset tables. I don't have an algorithm for this yet.
So, how casting works should be fairly obvious, but consider a cast from D to B, given these criteria which this system satisfies:
We have a pointer to trait object D, where the vtable pointer points to D's method table (indicated by D in the example).
We know the order of the O values for D, whatever and wherever they are, will be the same for every instance of D.
We know that Ωd is located at (C syntax) ((size_t *)D - 1).
We know the vtable for B is at Ob + *Ob.
So the cast/coercion (d: Box<D>) as Box<B> becomes (D + *Ωd + 1 /* index of Ob for D */) + *(D + Ωd + 1) == (D + *(D - 1) + 1) + *(D + *(D - 1) + 1).
Two pointer dereferences, I know. At least they're close together in cache, and there are some optimizations I'll get to. So why not just list the offsets inline and only do one dereference? That's the first optimization - for a trait like C with only one supertrait, it's really overkill to list two separate offsets, so for that case we should just replace Ωc with Oa. And that may end up happening in a lot of cases, because while you may have a trait that requires several other traits, you probably only use one or two of them for virtual dispatch, and there's no sense generating offset tables when they're never going to be used. But the main reason why I picked this implementation is that it supports Box<A + B> really easily.
All you do is create a pseudo trait A + B that doesn't implement any of its own methods, but requires A + B (obviously), and add it as a requirement to everything that requires both those traits. Now they end up with an Ωa+b in their vtable that takes three dereferences to get a method out of it, so we want to do one of those with the cast. Since the offsets are listed in a known order, the vtable for a combination type like this is *Ωa+b == the first of Oa, Ob. Now calling a method on one of these types takes at most two adds and two dereferences, and hopefully LLVM will cache a lot of these accesses if we're say, calling mostly methods from A on that type.
Other potential optimizations include:
If the compiler knows what concrete type it most likely is, it can emit an Objective-C style "is this what I think it is" hot path and avoid any extra lookups at all most of the time.
If it works out that all implementations of A + B list their vtables in the same order with the same distance between them, then casts between them are trivial. The algorithm that tries to achieve this may not be quite as trivial.
Thoughts?
The text was updated successfully, but these errors were encountered:
I think I may have a solution for this, but it'd be awesome if it could be improved on.
For standalone traits (no supertrait), nothing changes. For traits with supertraits, I'm adding two types of pointer offset to the vtable. The first,
Oa
is the offset to traitA
's vtable. The second,Ωa
is the offset from traitA
's vtable pointer to its offset table, where theO
values for all its supertraits are listed in order. Here's what it looks like with these example traits:Ok, so now that the basics of this are out of the way, the only overlaps I showed so far were the trivial ones - where if you scan the existing set of offsets you find a subset in the same order as the one you're trying to add. We definitely want to rearrange the order of the bounds though to condense the offset tables as much as possible. In this case, we can reorder them and get this:
I probably could have picked a better example, because there will be cases where repetition is unavoidable, and I imagine there might even be cases where the optimal solution for the whole program involves leaving a hole in the offset table for some trait that basically acts as a "wildcard" where combined bounds can fill in whatever they need to optimize the size of their offset tables. I don't have an algorithm for this yet.
So, how casting works should be fairly obvious, but consider a cast from
D
toB
, given these criteria which this system satisfies:D
, where the vtable pointer points toD
's method table (indicated byD
in the example).O
values forD
, whatever and wherever they are, will be the same for every instance ofD
.Ωd
is located at (C syntax)((size_t *)D - 1)
.B
is atOb + *Ob
.(d: Box<D>) as Box<B>
becomes(D + *Ωd + 1 /* index of Ob for D */) + *(D + Ωd + 1) == (D + *(D - 1) + 1) + *(D + *(D - 1) + 1)
.Two pointer dereferences, I know. At least they're close together in cache, and there are some optimizations I'll get to. So why not just list the offsets inline and only do one dereference? That's the first optimization - for a trait like
C
with only one supertrait, it's really overkill to list two separate offsets, so for that case we should just replaceΩc
withOa
. And that may end up happening in a lot of cases, because while you may have a trait that requires several other traits, you probably only use one or two of them for virtual dispatch, and there's no sense generating offset tables when they're never going to be used. But the main reason why I picked this implementation is that it supportsBox<A + B>
really easily.All you do is create a pseudo trait
A + B
that doesn't implement any of its own methods, but requiresA + B
(obviously), and add it as a requirement to everything that requires both those traits. Now they end up with anΩa+b
in their vtable that takes three dereferences to get a method out of it, so we want to do one of those with the cast. Since the offsets are listed in a known order, the vtable for a combination type like this is*Ωa+b == the first of Oa, Ob
. Now calling a method on one of these types takes at most two adds and two dereferences, and hopefully LLVM will cache a lot of these accesses if we're say, calling mostly methods fromA
on that type.Other potential optimizations include:
A + B
list their vtables in the same order with the same distance between them, then casts between them are trivial. The algorithm that tries to achieve this may not be quite as trivial.Thoughts?
The text was updated successfully, but these errors were encountered: