-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document the efficiency of union splitting for large unions #44131
base: master
Are you sure you want to change the base?
Conversation
j-fu
commented
Feb 11, 2022
- Benchmarks show that probably since RFC: inference: remove union-split limit for linear signatures #37378 (1.6.0), union splitting is efficient for large unions
- The union splitting example proposed here has jldoctests which work for unions of 5 types
- Information on efficiently handling collections of different types is scattered around in discourse threads, including macro based solutions for implementing manual dispatch which may be not necessary, as the functionality they cover is mostly available in Julia.
* Benchmarks (https://discourse.julialang.org/t/avoiding-vectors-of-abstract-types/61883/15) show that since JuliaLang#37378 (1.6.0), union splitting is efficient for large unions * The union splitting example proposed has jldoctests which work for unions of 5 types
Notice: figured out that this does not seem to work with dispatching on two unions. May be this is meant by "linear" in #37378... Needs to be documented as well. |
Hello, please also have a look at this post: Union splitting seems to really have a bottleneck when used for structs, for which the fields need to be accessed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's real value here, but it probably needs to be integrated with other performance tips. This seems to be an expanded version of https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-abstract-container. In particular I think illustrating the manual-dispatch alternative would be a worthwhile addition. However, I would strive for as much brevity as possible, and if a more "demo"/discoursive style is needed then perhaps link to external blog post?
@@ -851,6 +851,113 @@ or thousands of variants compiled for it. Each of these increases the size of th | |||
code, the length of internal lists of methods, etc. Excess enthusiasm for values-as-parameters | |||
can easily waste enormous resources. | |||
|
|||
## ["Unionize" collections](@id unionize-collections) | |||
|
|||
When working e.g with agent based models or finite elements with varying element geometries, a common pattern is the occurence of collections (e.g. Vectors) of objects of different types on which one wants to perform certain actions depending on their type. By default, the element type of a vector of objects of different struct types is a common supertype, often `Any`. For dispatch -- choosing the right method of a function to be applied -- the compiler needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When working e.g with agent based models or finite elements with varying element geometries
I use collections of heterogeneous object types all the time and never work with either agent-based models or finite elements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will second Tim's comment, and also mention that many (most?) readers won't know what "agent based models" or "finite elements with varying element geometries" means, so this is a barrier for understanding the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the hints, and the further explanations on discourse. I tend agree with your them and will work on an upgrade after I learned more from @timholy about the different facets of dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, in Agents.jl you can see a comparison on the same operation with multiple types vs only one: https://github.com/JuliaDynamics/Agents.jl/blob/main/test/performance/variable_agent_types_simple_dynamics.jl (results at the end of it), from 4 types on the impact is big
## ["Unionize" collections](@id unionize-collections) | ||
|
||
When working e.g with agent based models or finite elements with varying element geometries, a common pattern is the occurence of collections (e.g. Vectors) of objects of different types on which one wants to perform certain actions depending on their type. By default, the element type of a vector of objects of different struct types is a common supertype, often `Any`. For dispatch -- choosing the right method of a function to be applied -- the compiler needs | ||
to assume that new matching types can be added after compilation. Thus arises the need for expensive [dynamic dispatch](https://discourse.julialang.org/t/dynamic-dispatch/6963/2) at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just the ability to add new methods/types: if we specialized on all potential calls in the known world, compilation would never finish.
|
||
struct T1 end | ||
|
||
f(::T1,x)=1x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The x
parameter seems superfluous and adds complexity
``` | ||
|
||
|
||
With __"manual dispatch"__, each time when c is accessed as a function parameter, due to the test via `isa`, the compiler knows the type of `c` and can choose the proper method of `f` at compile time, resulting in signficant savings at runtime: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting these manual-dispatch blocks help only when T
is a concrete type. A block like
if isa(c, AbstractVector)
s += f(c)
elseif isa(c, AbstractDict)
s += f(c)
...
end
can actually hurt performance (though it can occasionally protect you from invalidation).
@@ -531,8 +531,7 @@ ERROR: TypeError: in typeassert, expected Union{Int64, AbstractString}, got a va | |||
|
|||
The compilers for many languages have an internal union construct for reasoning about types; Julia | |||
simply exposes it to the programmer. The Julia compiler is able to generate efficient code in the | |||
presence of `Union` types with a small number of types [^1], by generating specialized code | |||
in separate branches for each possible type. | |||
presence of `Union` types, by generating specialized code in separate branches for each possible type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the limit has been lifted for specific situations, as you discovered it's not completely gone. Maybe best to acknowledge there are still limits?
Seems there aren't anymore performance drops with many types in a Union in 1.11!!! https://discourse.julialang.org/t/avoiding-vectors-of-abstract-types/61883/19?u=filchristou as many other example I tried have no dynamic dispatch even with many types. It's probably worth adding this section now in my opinion. |