-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set operation (setdiff
, union
, intersect
, symdiff
) performance issues
#40501
Comments
I'm not able to reproduce this, but |
Interesting, I see this in Julia 1.7.1: julia> versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) E-2176M CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
julia> @btime setdiff($([1, 2, 3]), $([1, 2]))
329.487 ns (8 allocations: 528 bytes)
1-element Vector{Int64}:
3
julia> @btime setdiff($([1, 2, 3]), $([1, 2]), $([]))
167.212 ns (5 allocations: 480 bytes)
1-element Vector{Int64}:
3
julia> @btime setdiff($([1, 2, 3]), $([1, 2]), $([]), $([]))
181.436 ns (6 allocations: 496 bytes)
1-element Vector{Int64}:
3 but it looks like it is now fixed on nightly: julia> versioninfo()
Julia Version 1.8.0-DEV.1342
Commit 1f266b895e (2022-01-18 19:22 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) E-2176M CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.0 (ORCJIT, skylake)
Environment:
LD_PRELOAD = libgtk3-nocsd.so.0
LD_LIBRARY_PATH = /usr/lib/x86_64-linux-gnu/libcutensor/11.1/:/opt/intel/compilers_and_libraries_2020.0.166/linux/tbb/lib/intel64_lin/gcc4.7:/opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin:/usr/local/cuda-11.1/lib64::/home/mfishman/software/itensor/lib
JULIA_EDITOR = vim
julia> @btime setdiff($([1, 2, 3]), $([1, 2]))
191.334 ns (6 allocations: 544 bytes)
1-element Vector{Int64}:
3
julia> @btime setdiff($([1, 2, 3]), $([1, 2]), $([]))
234.517 ns (6 allocations: 512 bytes)
1-element Vector{Any}:
3
julia> @btime setdiff($([1, 2, 3]), $([1, 2]), $([]), $([]))
244.295 ns (7 allocations: 528 bytes)
1-element Vector{Any}:
3 @oscardssmith, I'm not sure if this issue is related. For larger sets, it only leads to a small slowdown: julia> versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) E-2176M CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
julia> @btime setdiff($(1:50), $(1:49))
1.221 μs (14 allocations: 2.27 KiB)
1-element Vector{Int64}:
50
julia> @btime setdiff($(1:50), $(1:49), $([]))
1.097 μs (10 allocations: 2.18 KiB)
1-element Vector{Int64}:
50
julia> @btime setdiff($(1:100), $(1:99))
2.242 μs (14 allocations: 3.79 KiB)
1-element Vector{Int64}:
100
julia> @btime setdiff($(1:100), $(1:99), $([]))
2.109 μs (10 allocations: 3.70 KiB)
1-element Vector{Int64}:
100 Interestingly, it looks like the two-input versions speed up on nightly, but the three-input version slows down: julia> versioninfo()
Julia Version 1.8.0-DEV.1342
Commit 1f266b895e (2022-01-18 19:22 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) E-2176M CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.0 (ORCJIT, skylake)
Environment:
LD_PRELOAD = libgtk3-nocsd.so.0
LD_LIBRARY_PATH = /usr/lib/x86_64-linux-gnu/libcutensor/11.1/:/opt/intel/compilers_and_libraries_2020.0.166/linux/tbb/lib/intel64_lin/gcc4.7:/opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin:/usr/local/cuda-11.1/lib64::/home/mfishman/software/itensor/lib
JULIA_EDITOR = vim
julia> @btime setdiff($(1:50), $(1:49))
938.607 ns (9 allocations: 1.74 KiB)
1-element Vector{Int64}:
50
julia> @btime setdiff($(1:50), $(1:49), $([]))
1.850 μs (10 allocations: 1.74 KiB)
1-element Vector{Any}:
50
julia> @btime setdiff($(1:100), $(1:99))
1.861 μs (9 allocations: 2.87 KiB)
1-element Vector{Int64}:
100
julia> @btime setdiff($(1:100), $(1:99), $([]))
3.833 μs (10 allocations: 2.87 KiB)
1-element Vector{Any}:
100 Code reorganizations like the one proposed by @bkamins may be more important in that example. |
In terms of performance of the algorithms themselves, I found that in practice for small sets (maybe the crossover is around 10), it is faster to avoid converting to Perhaps something like that would be of interest in Base, or in a separate package. |
Oh. I was looking on nightly. |
I think the solution might just be to do a vector based implementation for small sets. That would belong in |
Thank you for opening this issue (I wanted to do it after the SO answer, but had some other things to do today). I think we should also benchmark also cases of large Also we could add a special case for |
setdiff(::AbstractVector...)
is slightly slower for two inputssetdiff
, union
, intersect
, symdiff
) performance issues
I've made a gist of implementations of Julia set operations ( Here are some results: julia> using BenchmarkTools
julia> include("small_set_operations.jl")
_union! (generic function with 2 methods)
julia> versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) E-2176M CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = vim
julia> @btime setdiff($([1, 2, 3, 4]), $([1, 2, 3]))
338.877 ns (7 allocations: 528 bytes)
1-element Vector{Int64}:
4
julia> @btime _setdiff($([1, 2, 3, 4]), $([1, 2, 3]))
51.320 ns (1 allocation: 96 bytes)
1-element Vector{Int64}:
4
julia> @btime intersect($([1, 2, 3, 4]), $([1, 2, 3]))
484.041 ns (12 allocations: 944 bytes)
3-element Vector{Int64}:
1
2
3
julia> @btime _intersect($([1, 2, 3, 4]), $([1, 2, 3]))
57.184 ns (2 allocations: 144 bytes)
3-element Vector{Int64}:
1
2
3
julia> @btime symdiff($([1, 2, 3, 4]), $([1, 2, 3]))
222.391 ns (6 allocations: 544 bytes)
1-element Vector{Int64}:
4
julia> @btime _symdiff($([1, 2, 3, 4]), $([1, 2, 3]))
68.136 ns (2 allocations: 144 bytes)
1-element Vector{Int64}:
4
julia> @btime union($([1, 2, 3, 4]), $([1, 2, 3]))
188.991 ns (6 allocations: 544 bytes)
4-element Vector{Int64}:
1
2
3
4
julia> @btime _union($([1, 2, 3, 4]), $([1, 2, 3]))
86.902 ns (3 allocations: 208 bytes)
4-element Vector{Int64}:
1
2
3
4 I think a downside of this approach is that it doesn't handle repeated elements (input vectors with nonunique vectors) in the same way as the versions which convert back and forth from EDIT: And here are some results using julia> @btime setdiff($(BitSet([1, 2, 3, 4])), $(BitSet([1, 2, 3])))
68.885 ns (3 allocations: 176 bytes)
BitSet with 1 element:
4
julia> @btime intersect($(BitSet([1, 2, 3, 4])), $(BitSet([1, 2, 3])))
71.939 ns (3 allocations: 176 bytes)
BitSet with 3 elements:
1
2
3
julia> @btime symdiff($(BitSet([1, 2, 3, 4])), $(BitSet([1, 2, 3])))
68.020 ns (3 allocations: 176 bytes)
BitSet with 1 element:
4
julia> @btime union($(BitSet([1, 2, 3, 4])), $(BitSet([1, 2, 3])))
68.975 ns (3 allocations: 176 bytes)
BitSet with 4 elements:
1
2
3
4 Obviously these results will depend strongly on the set sizes involved. |
I haven't investigated the large set limit at all, but I'm curious if |
Oh, this is probably one of the cases where it would be a lot better if we had #10116 |
At some point I tried benchmarking various available dictionary/set options for implementing set operations on small sets:
I didn't do a very systematic job and can't find the code now, but in the end for small sets I concluded it was best to just roll my own version which just directly used It looks like |
Yes, correct, that’s the intention with those two types. Note that sometimes I thought OrderedCollections split out the code from DataStructures that was originally #10116? (Maybe I am confused on this point). |
Great, thanks for confirming. Looks like I could probably just use Sounds like that is correct: #10116 (comment) |
EDIT: This has morphed into a more general discussion about set operation performance, beyond the initial issue presented.
I'm finding that
setdiff
is slightly slower with exactly 2 inputs:and the same on nightly:
It's not the case for
Set
inputs, so is probably caused by the generic code here. I've been staring at that code trying to figure out what would be special about inputting 2 Vectors and can't figure it out...The text was updated successfully, but these errors were encountered: