Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the efficiency of union splitting for large unions #44131

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions doc/src/manual/performance-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -851,6 +851,113 @@ or thousands of variants compiled for it. Each of these increases the size of th
code, the length of internal lists of methods, etc. Excess enthusiasm for values-as-parameters
can easily waste enormous resources.

## ["Unionize" collections](@id unionize-collections)

When working e.g with agent based models or finite elements with varying element geometries, a common pattern is the occurence of collections (e.g. Vectors) of objects of different types on which one wants to perform certain actions depending on their type. By default, the element type of a vector of objects of different struct types is a common supertype, often `Any`. For dispatch -- choosing the right method of a function to be applied -- the compiler needs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working e.g with agent based models or finite elements with varying element geometries

I use collections of heterogeneous object types all the time and never work with either agent-based models or finite elements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will second Tim's comment, and also mention that many (most?) readers won't know what "agent based models" or "finite elements with varying element geometries" means, so this is a barrier for understanding the point.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the hints, and the further explanations on discourse. I tend agree with your them and will work on an upgrade after I learned more from @timholy about the different facets of dispatch.

Copy link
Contributor

@Tortar Tortar Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, in Agents.jl you can see a comparison on the same operation with multiple types vs only one: https://github.com/JuliaDynamics/Agents.jl/blob/main/test/performance/variable_agent_types_simple_dynamics.jl (results at the end of it), from 4 types on the impact is big

to assume that new matching types can be added after compilation. Thus arises the need for expensive [dynamic dispatch](https://discourse.julialang.org/t/dynamic-dispatch/6963/2) at runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just the ability to add new methods/types: if we specialized on all potential calls in the known world, compilation would never finish.



```jldoctest unionsplit; setup = :(using Random; Random.seed!(1234)), filter = r"[0-9\.]+ seconds \(.*?\)"
N=100_000

struct T1 end

f(::T1,x)=1x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The x parameter seems superfluous and adds complexity


function sumup_f(collection)
s=0.0
for i=1:length(collection)
s+=f(collection[i],1)
end
s
end

t1_collection=[T1() for i=1:N]
sumup_f(t1_collection) # compile
typeof(t1_collection)
# output
Vector{T1} (alias for Array{T1, 1})
```


Define further types:
```jldoctest unionsplit; filter = r"[0-9\.]+ seconds \(.*?\)"
struct T2 end
struct T3 end
struct T4 end
struct T5 end

f(::T2,x)=2x
f(::T3,x)=3x
f(::T4,x)=4x
f(::T5,x)=5x

any_collection=[rand((T1,T2,T3,T4,T5))() for i=1:N]
sumup_f(any_collection) # compile
typeof(any_collection)
# output
Vector{Any} (alias for Array{Any, 1})
```

When defining the collection in the default way (resulting in a `Vector{Any}`, each access of an element is linked to an allocation with significant runtime overhead due to __dynamic dispatch__:
```jldoctest unionsplit; filter = r"[0-9\.]+ seconds \(.*?\)"
@time sumup_f(t1_collection); nothing
@time sumup_f(any_collection); nothing
# output
0.000095 seconds (1 allocation: 16 bytes)
0.005557 seconds (100.00 k allocations: 1.526 MiB)
```


With __"manual dispatch"__, each time when c is accessed as a function parameter, due to the test via `isa`, the compiler knows the type of `c` and can choose the proper method of `f` at compile time, resulting in signficant savings at runtime:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting these manual-dispatch blocks help only when T is a concrete type. A block like

if isa(c, AbstractVector)
    s += f(c)
elseif isa(c, AbstractDict)
    s += f(c)
...
end

can actually hurt performance (though it can occasionally protect you from invalidation).

```jldoctest unionsplit; filter = r"[0-9\.]+ seconds \(.*?\)"
function sumup_f_manual(collection)
s=0.0
for i=1:length(collection)
c=collection[i]
if isa(c,T1)
s+=f(c,1)
elseif isa(c,T2)
s+=f(c,1)
elseif isa(c,T3)
s+=f(c,1)
elseif isa(c,T4)
s+=f(c,1)
elseif isa(c,T5)
s+=f(c,1)
end
end
s
end
sumup_f_manual(any_collection) # compile
@time sumup_f_manual(any_collection); nothing
# output
0.000796 seconds (1 allocation: 16 bytes)
```

While it is possible to generate the manual dispatch code with macros, another remedy of this situation is more acessible. "Unionizing" the collection means that one pins its element type to to the union of possible types of entries:

```jldoctest unionsplit; filter = r"[0-9\.]+ seconds \(.*?\)"
const UnionT=Union{T1,T2,T3,T4,T5}
union_collection=UnionT[s for s ∈ any_collection]
sumup_f(union_collection) # compile
typeof(union_collection)
# output
Vector{Union{T1, T2, T3, T4, T5}} (alias for Array{Union{T1, T2, T3, T4, T5}, 1})
```

The compiler then knows that the number of possible types of the elements of the collection is finite -- constrained by the list of types in the union. Consequently, it can automatically create code similar to the manual dispatch statement above. This feature is called __[union splitting](https://julialang.org/blog/2018/08/union-splitting/)__ and provides similar or better performance compared to the "manual" approach.
```jldoctest unionsplit; filter = r"[0-9\.]+ seconds \(.*?\)"
@time sumup_f(union_collection); nothing
# output
0.000097 seconds (1 allocation: 16 bytes)
```
This pattern can be applied in at least the following situations

- __Collection of objects:__ as discussed above via e.g. defining `Vector{Union{T1,T2,T3,T4,T5}}`.
- __Collection of types:__ Julia allows to use types as variables. These can be stored in a collection as well, and it is possible to dispatch on a (concrete or abstract) type parameter by defining `f(::Type{T})`. A corresponding "unionized" collection can be defined e.g. as `Vector{Union{Type{T1},Type{T2},Type{T3},Type{T4},Type{T5}}}`.
- __Collection of functions:__ Instead of objects or types, one also can store functions in a collection. As each function has its own type, accessing a function as a member of a collection once again will lead to dynamic dispatch, unless this collection is defined similar to `Vector{Union{typeof(f1),typeof(f2),typeof(f3),typeof(f4),typeof(f5)}}`.

## [Access arrays in memory order, along columns](@id man-performance-column-major)

Multidimensional arrays in Julia are stored in column-major order. This means that arrays are
Expand Down
8 changes: 3 additions & 5 deletions doc/src/manual/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -531,8 +531,7 @@ ERROR: TypeError: in typeassert, expected Union{Int64, AbstractString}, got a va

The compilers for many languages have an internal union construct for reasoning about types; Julia
simply exposes it to the programmer. The Julia compiler is able to generate efficient code in the
presence of `Union` types with a small number of types [^1], by generating specialized code
in separate branches for each possible type.
presence of `Union` types, by generating specialized code in separate branches for each possible type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the limit has been lifted for specific situations, as you discovered it's not completely gone. Maybe best to acknowledge there are still limits?


A particularly useful case of a `Union` type is `Union{T, Nothing}`, where `T` can be any type and
[`Nothing`](@ref) is the singleton type whose only instance is the object [`nothing`](@ref). This pattern
Expand Down Expand Up @@ -1091,7 +1090,7 @@ Immutable composite types with no fields are called *singletons*. Formally, if
1. `T` is an immutable composite type (i.e. defined with `struct`),
1. `a isa T && b isa T` implies `a === b`,

then `T` is a singleton type.[^2] [`Base.issingletontype`](@ref) can be used to check if a
then `T` is a singleton type.[^1] [`Base.issingletontype`](@ref) can be used to check if a
type is a singleton type. [Abstract types](@ref man-abstract-types) cannot be singleton
types by construction.

Expand Down Expand Up @@ -1585,5 +1584,4 @@ in unfavorable cases, you can easily end up making the performance of your code
In particular, you would never want to write actual code as illustrated above. For more information
about the proper (and improper) uses of `Val`, please read [the more extensive discussion in the performance tips](@ref man-performance-value-type).

[^1]: "Small" is defined by the `MAX_UNION_SPLITTING` constant, which is currently set to 4.
[^2]: A few popular languages have singleton types, including Haskell, Scala and Ruby.
[^1]: A few popular languages have singleton types, including Haskell, Scala and Ruby.