Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High (default) gcthreads count causing slowdown on Julia 1.10.0-beta2 #51044

Closed
jaakkor2 opened this issue Aug 25, 2023 · 33 comments
Closed

High (default) gcthreads count causing slowdown on Julia 1.10.0-beta2 #51044

jaakkor2 opened this issue Aug 25, 2023 · 33 comments
Assignees
Labels
GC Garbage collector regression Regression in behavior compared to a previous version
Milestone

Comments

@jaakkor2
Copy link
Contributor

Just showing versioninfo() takes 55 s (varies from run to run)

$ ./julia/julia-1.10.0-beta2/bin/julia --threads 55 --gcthreads=16 --startup-file=no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-beta2 (2023-08-17)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> @time display(versioninfo())
Julia Version 1.10.0-beta2
Commit a468aa198d0 (2023-08-17 06:27 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD EPYC 7371 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver1)
  Threads: 70 on 64 virtual cores
nothing
 55.698993 seconds (439.98 k allocations: 29.438 MiB, 99.01% gc time, 99.99% compilation time)

but then setting --threads auto takes more than I could wait

$ ./julia/julia-1.10.0-beta2/bin/julia --threads auto --startup-file=no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-beta2 (2023-08-17)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> Threads.ngcthreads()
32

julia> @time display(versioninfo())

Setting --threads auto --gcthreads 1 seems to be fine.

@giordano giordano added the GC Garbage collector label Aug 25, 2023
@giordano giordano added this to the 1.10 milestone Aug 25, 2023
@d-netto d-netto self-assigned this Aug 25, 2023
@d-netto d-netto added the regression Regression in behavior compared to a previous version label Aug 25, 2023
@d-netto
Copy link
Member

d-netto commented Aug 25, 2023

Can't reproduce:

./julia --threads=64 --gcthreads=1
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-beta2 (2023-08-17)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> @time display(versioninfo())
Julia Version 1.10.0-beta2
Commit a468aa198d (2023-08-17 06:27 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 7502 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 64 on 128 virtual cores
nothing
  0.497072 seconds (440.23 k allocations: 29.451 MiB, 1.12% gc time, 95.90% compilation time)
./julia --threads=64 --gcthreads=64
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-beta2 (2023-08-17)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> @time display(versioninfo())
Julia Version 1.10.0-beta2
Commit a468aa198d (2023-08-17 06:27 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 7502 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 127 on 128 virtual cores
nothing
  0.506744 seconds (440.23 k allocations: 29.451 MiB, 1.81% gc time, 95.35% compilation time)

@d-netto
Copy link
Member

d-netto commented Aug 25, 2023

The results in the post above were measured using a smaller number of threads than the number of cores reported in versioninfo.

Setting the number of GC threads to something higher than the number of virtual cores seems to cause a regression:

./julia --threads=128 --gcthreads=1
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-beta2 (2023-08-17)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> @time display(versioninfo())
Julia Version 1.10.0-beta2
Commit a468aa198d (2023-08-17 06:27 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 7502 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 128 on 128 virtual cores
nothing
  0.498495 seconds (440.23 k allocations: 29.451 MiB, 1.73% gc time, 95.73% compilation time)
./julia --threads=128 --gcthreads=132
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-beta2 (2023-08-17)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> @time display(versioninfo())
Julia Version 1.10.0-beta2
Commit a468aa198d (2023-08-17 06:27 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 7502 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 259 on 128 virtual cores
nothing
  2.171682 seconds (440.23 k allocations: 29.451 MiB, 76.90% gc time, 98.86% compilation time)

@jaakkor2
Copy link
Contributor Author

jaakkor2 commented Nov 4, 2023

This problem persists on 1.10.0-rc1. This is a Linux box with 2 x AMD EPYC 7371.

$ ./julia-1.10.0-rc1/bin/julia --startup-file=no --threads=auto
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-rc1 (2023-11-03)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> Threads.nthreads()
64

julia> Threads.ngcthreads()
32

julia> Threads.nthreadpools()
2

julia> versioninfo()

does not print anything as the the Julia process spins at 3200 %CPU.

Without -t auto I get

$ ./julia-1.10.0-rc1/bin/julia --startup-file=no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-rc1 (2023-11-03)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo()
Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD EPYC 7371 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver1)
  Threads: 1 on 64 virtual cores

julia> (Threads.nthreads(), Threads.ngcthreads(), Threads.nthreadpools())
(1, 1, 2)

@vchuravy vchuravy modified the milestone: 1.10 Nov 4, 2023
@jaakkor2
Copy link
Contributor Author

jaakkor2 commented Nov 8, 2023

@KristofferC I think this was removed from milestone 1.10 prematurely.

@LilithHafner LilithHafner added this to the 1.10 milestone Nov 8, 2023
@LilithHafner
Copy link
Member

Re-adding to the milestone because this regressing is visible with -t auto on systems with high core count (summarizing above results, I have not personally reproduced this)

Perhaps adjusting the default number of gc threads would be a reasonable hotfix for the 1.10 regression?

@ufechner7
Copy link

ufechner7 commented Nov 9, 2023

Just limit the default number of gc threads to four? Or to 8?

@jaakkor2
Copy link
Contributor Author

jaakkor2 commented Nov 9, 2023

--gcthreads=16 is too much for my system. I observed up to 70 s pauses using

for i in {1..20}; do julia -t auto --gcthreads=16 --startup-file=no -e "using InteractiveUtils; @time versioninfo();"; done

FWIW, I run https://github.com/JuliaCI/GCBenchmarks on Julia 1.10.0-rc1

$ julia --startup-file=no --project=. run_benchmarks.jl multithreaded binary_tree -t 8 -s 8
bench = "tree_immutable.jl"
(gcthreads, threads) = (1, 8)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       6283 │    3720 │      1894 │       1613 │          441 │              3719 │     1579 │         54 │
│  median │       6617 │    3917 │      2221 │       1787 │          468 │              3893 │     1711 │         61 │
│ maximum │       6892 │    4271 │      2431 │       1840 │          622 │              4189 │     2010 │         62 │
│   stdev │        212 │     173 │       148 │         73 │           51 │               135 │      109 │          3 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 8)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       5393 │    2836 │      1081 │       1755 │          368 │              3779 │     1578 │         51 │
│  median │       6128 │    3270 │      1367 │       1894 │          404 │              4149 │     1712 │         52 │
│ maximum │       6425 │    3422 │      1473 │       1948 │          426 │              4450 │     1807 │         57 │
│   stdev │        339 │     158 │       110 │         54 │           18 │               235 │       56 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 8)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       5368 │    2816 │      1001 │       1815 │          338 │              3720 │     1578 │         49 │
│  median │       5572 │    3017 │      1139 │       1851 │          355 │              4020 │     1588 │         54 │
│ maximum │       6084 │    3086 │      1209 │       1949 │          385 │              6345 │     1853 │         55 │
│   stdev │        193 │      87 │        73 │         46 │           16 │               763 │       90 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 8)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       5538 │    2825 │      1028 │       1797 │          321 │              3762 │     1573 │         48 │
│  median │       5650 │    3030 │      1147 │       1879 │          334 │              4100 │     1637 │         54 │
│ maximum │       6118 │    3150 │      1271 │       1928 │          371 │              5003 │     1826 │         56 │
│   stdev │        205 │     100 │        82 │         40 │           20 │               355 │       95 │          3 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "tree_mutable.jl"
(gcthreads, threads) = (1, 8)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      12044 │    7448 │      3909 │       3539 │          419 │              5082 │     1179 │         58 │
│  median │      12835 │    7758 │      4035 │       3680 │          439 │              5241 │     1263 │         61 │
│ maximum │      13318 │    8270 │      4589 │       3770 │          568 │             13585 │     1459 │         65 │
│   stdev │        357 │     278 │       256 │         70 │           50 │              2648 │       84 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 8)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      10054 │    5357 │      1850 │       3497 │          278 │              4473 │     1192 │         49 │
│  median │      10685 │    5848 │      2230 │       3609 │          292 │              5058 │     1286 │         55 │
│ maximum │      11262 │    6392 │      2572 │       3871 │          313 │              6286 │     1531 │         58 │
│   stdev │        345 │     331 │       240 │        132 │           10 │               474 │       95 │          3 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 8)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       9465 │    4667 │      1077 │       3127 │          211 │              4751 │     1180 │         48 │
│  median │       9734 │    5010 │      1180 │       3703 │          234 │              5151 │     1277 │         51 │
│ maximum │      11219 │    6567 │      2884 │       3871 │          317 │              9820 │     1506 │         59 │
│   stdev │        625 │     667 │       785 │        223 │           41 │              1502 │       87 │          4 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 8)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       8707 │    4242 │       740 │       3456 │          198 │              4706 │     1211 │         46 │
│  median │       9851 │    4616 │       806 │       3769 │          236 │              5097 │     1302 │         49 │
│ maximum │      12010 │    6949 │      3049 │       3910 │          341 │              8696 │     1491 │         60 │
│   stdev │       1213 │    1211 │      1147 │        135 │           63 │              1172 │       98 │          6 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

@jaakkor2
Copy link
Contributor Author

jaakkor2 commented Nov 9, 2023

One more benchmark run with 64 threads

$ julia --startup-file=no --project=. run_benchmarks.jl multithreaded binary_tree -t 64 -s 8
bench = "tree_immutable.jl"
(gcthreads, threads) = (1, 64)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       5738 │    3039 │      1371 │       1668 │          564 │              3181 │     2186 │         53 │
│  median │       6205 │    3445 │      1632 │       1819 │          639 │              3388 │     2277 │         55 │
│ maximum │       6632 │    3887 │      1934 │       1952 │          847 │              5302 │     2370 │         59 │
│   stdev │        245 │     215 │       146 │         73 │           76 │               647 │       51 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 64)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       5250 │    2518 │       813 │       1704 │          499 │              3379 │     2212 │         48 │
│  median │       5588 │    2820 │      1001 │       1823 │          534 │              3709 │     2272 │         51 │
│ maximum │       5723 │    3004 │      1161 │       1898 │          651 │              5812 │     2323 │         52 │
│   stdev │        149 │     124 │       112 │         64 │           56 │               851 │       40 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 64)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       5015 │    2337 │       530 │       1770 │          450 │              3288 │     2231 │         46 │
│  median │       5573 │    2795 │       909 │       1859 │          514 │              3695 │     2284 │         50 │
│ maximum │       5858 │    3025 │      1054 │       1971 │          654 │              4100 │     2370 │         52 │
│   stdev │        293 │     240 │       189 │         68 │           84 │               301 │       44 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 64)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       5164 │    2351 │       404 │       1838 │          446 │              3202 │     2217 │         46 │
│  median │       5610 │    2804 │       899 │       1892 │          545 │              3551 │     2287 │         50 │
│ maximum │       5774 │    2996 │      1036 │       1973 │          633 │              4579 │     2351 │         52 │
│   stdev │        182 │     181 │       178 │         45 │           64 │               431 │       35 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "tree_mutable.jl"
(gcthreads, threads) = (1, 64)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      11401 │    6894 │      3285 │       3550 │          669 │              4412 │     2321 │         58 │
│  median │      12017 │    7239 │      3526 │       3751 │          794 │              5015 │     2487 │         61 │
│ maximum │      12359 │    7691 │      3749 │       3942 │          912 │              7176 │     2600 │         62 │
│   stdev │        256 │     251 │       180 │        107 │           79 │               841 │       71 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (2, 64)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       9322 │    5079 │      1587 │       3263 │          637 │              4575 │     2374 │         54 │
│  median │      10181 │    5580 │      1870 │       3735 │          666 │              5056 │     2468 │         55 │
│ maximum │      11006 │    6280 │      2379 │       4065 │          777 │              7232 │     2536 │         57 │
│   stdev │        508 │     350 │       242 │        238 │           48 │               980 │       51 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (4, 64)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       9405 │    4831 │      1082 │       3620 │          255 │              3878 │     1383 │         50 │
│  median │      10016 │    5373 │      1627 │       3815 │          670 │              5230 │     2460 │         54 │
│ maximum │      11197 │    6398 │      2541 │       3964 │          731 │              6536 │     2536 │         59 │
│   stdev │        567 │     628 │       630 │        118 │          137 │               825 │      345 │          3 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
(gcthreads, threads) = (8, 64)
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       8960 │    4440 │       734 │       3705 │          554 │              4373 │     2281 │         49 │
│  median │      10310 │    5475 │      1599 │       3848 │          651 │              4963 │     2485 │         53 │
│ maximum │      11187 │    6293 │      2419 │       4020 │          710 │             10504 │     2529 │         57 │
│   stdev │        902 │     814 │       777 │         99 │           54 │              1841 │       72 │          3 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

@ufechner7
Copy link

What is the way forward? Just limit the default number of gcthreads to 4 or 8, or something else?

@d-netto
Copy link
Member

d-netto commented Nov 27, 2023

We're capping the number of GC threads to number of cores. X-ref: #52192.

@jaakkor2
Copy link
Contributor Author

We're capping the number of GC threads to number of cores. X-ref: #52192.

On my machine -t auto gives 32 GC threads, and that does not work, see #51044 (comment). Even 16 GC threads is bad. The setup is two EPYC 7371 16-core processors.

@ufechner7
Copy link

ufechner7 commented Nov 27, 2023

We're capping the number of GC threads to number of cores. X-ref: #52192.

This is not sufficient. At least not for the DEFAULT number of threads. If a user explicitly specifies a higher number than 4 or 8 that might be OK, but not as default.

@ufechner7
Copy link

Solving this issue would only require a few lines of code, for example like this:

const MAX_DEFAULT_GC_THREADS = 4
default_gcthreads=Int64(round(Sys.CPU_THREADS/4))
if default_gcthreads > MAX_DEFAULT_GC_THREADS
    default_gcthreads = MAX_DEFAULT_GC_THREADS
end

If we would agree that this is the solution I could write a pull request.

@jaakkor2
Copy link
Contributor Author

Sys.CPU_THREADS returns 2 on my MacBook5,1 with Intel P7350. Int64(round(Sys.CPU_THREADS/4)) gives 0.

@KristofferC
Copy link
Member

Capping it to 8 might be the "quick fix" for 1.10. I don't think we have seen any CPU where that would be problematic.

@ufechner7
Copy link

Sys.CPU_THREADS returns 2 on my MacBook5,1 with Intel P7350. Int64(round(Sys.CPU_THREADS/4)) gives 0.

Well, on this MacBook it probably does not make any sense to use multi-threaded garbage collection, so zero is correct...

@ufechner7
Copy link

So current suggestion:

const MAX_DEFAULT_GC_THREADS = 8
default_gcthreads=Int64(round(Sys.CPU_THREADS/4))
if default_gcthreads > MAX_DEFAULT_GC_THREADS
    default_gcthreads = MAX_DEFAULT_GC_THREADS
end

@KristofferC
Copy link
Member

https://github.com/JuliaLang/julia/pull/52192/files would be slightly tweaked yes.

@ufechner7
Copy link

ufechner7 commented Nov 29, 2023

Well, you cannot tweak a commit, you can only create a new commit/ pull request, correct?

@ufechner7
Copy link

@KristofferC Can you review my pull request?

@d-netto
Copy link
Member

d-netto commented Nov 30, 2023

As mentioned above, I was able to reproduce a slowdown (on both master and 1.10) when oversubscribing my machine. For this scenario, #52294 seems to help:

  • master (36 GC threads on 36 cores):
julia> Threads.ngcthreads()
36

julia> @time for i in 1:50; GC.gc(); end
  0.791003 seconds (99.44% gc time)
  • master (72 GC threads on 36 cores):
julia> Threads.ngcthreads()
72

julia> @time for i in 1:50; GC.gc(); end
  5.519171 seconds (99.90% gc time)

julia> Threads.ngcthreads()
36

julia> @time for i in 1:50; GC.gc(); end
  0.761869 seconds (99.51% gc time)
julia> Threads.ngcthreads()
72

julia> @time for i in 1:50; GC.gc(); end
  0.788096 seconds (99.19% gc time)

Unfortunately I don't have easy access to an AMD EPYC at the moment, so not sure if it fixes the issue above.

We've also backported #52294 at RAI and run our CI and a few benchmarks internally: it seems fine from a correctness and performance point of view. Backporting the GC scheduler PR above to 1.10 could be an option, but obviously it has its own share of risks.

@PallHaraldsson
Copy link
Contributor

Isn't it best to merge that then, and backport to 1.10, to not have it different from 1.11? And NOT cap at 8, was that a hack that should then not be needed? We want aggressive good defaults for 1.10 and could add a cap in a point release if needed. Anyone could also do it manually (e.g. those on unusual hardware; two-socket), with CLI option.

It seems worse for me to cap too much (and otherwise have different from 1.11) and then later to relax in a point-release.

[You can allocate in all your threads, and it seems to me same amount of GC threads (or more?!) would be useful to not fall behind many allocating threads. I'm skeptical of 1/2 let alone 1/4th the amount of GC threads. What do other languages do?]

@ufechner7
Copy link

It seems worse for me to cap too much (and otherwise have different from 1.11) and then later to relax in a point-release.

[You can allocate in all your threads, and it seems to me same amount of GC threads (or more?!) would be useful to not fall behind many allocating threads. I'm skeptical of 1/2 let alone 1/4th the amount of GC threads. What do other languages do?]

For me Julia (sometimes) crashes when I use PyPlot together with too many GC threads... So I prefer a lower GC thread count... I prefer safe defaults.

@d-netto
Copy link
Member

d-netto commented Dec 1, 2023

For me Julia (sometimes) crashes when I use PyPlot together with too many GC threads... So I prefer a lower GC thread count... I prefer safe defaults.

This is derailing the discussion of this issue. If Julia is crashing with GC threads, then that's a serious correctness bug and we should have a separate issue for that. I'd encourage you to open an issue and share a reproducer so that we can take a look.

@jaakkor2
Copy link
Contributor Author

jaakkor2 commented Dec 1, 2023

I tried to find the artifact of #52294, but I could not figure out the path. I could test the PR preferably in binary format.

This is not it https://julialangnightlies-s3.julialang.org/bin/linux/x86_64/1.11/julia-5f372689e6-linux-x86_64.tar.gz.

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Dec 1, 2023

@d-netto Can we merge #52294 ASAP for 1.11/master, is it likely to make the "crash" go away? I'm not sure why but I see it has "This pull request may be merged without approvals."

For me Julia (sometimes) crashes when I use PyPlot together with too many GC threads... So I prefer a lower GC thread count... I prefer safe defaults.

I also prefer safe, non-crashing, if you meant that as in segfault (as opposed to OOM, which is also annoying, not a "crash", not what you're experiencing?).

I want safe with or without -t auto, but at least it's not the default, and then no GC threads, so you will be safe by default. It's just that I think capping GC threads at 8 or any value would hide a bug, i.e. make it less likely, but still non-zero probability. Then it might actually be better to have it more frequent...

@ufechner7 I don't rule out merging your capping PR but I think we should first try his, you and others test 1.11, to see if it gives confidence that it can be backported to 1.10, then your PR could be the alternative. Since this is the last open issue for 1.10, then I think we should move quickly on this, maybe even merge to 1.10 for testing?

FYI: This crash #52256 was removed from 1.10 milestone, 3 days ago. I'm not sure what to read into that, since it's till open, is it just not considered a blocker, or was the GC incorrectly implicated (even if the GC not the reason, should it still be a blocker)? And one hour ago this (1.10) crash issue #52363 opened, not yet on the milestone.

Also FYI, I DID suggest -t auto be made the default, but despite many using it, e.g. I believe Pluto by default and VS Code, it was suggested unsafe, not thread-safe, and I realized it never will be/made default. So maybe it's just not supported option in some sense, though I would like it not to get worse between versions...

@d-netto
Copy link
Member

d-netto commented Dec 4, 2023

And one hour ago this (1.10) crash issue #52363 opened, not yet on the milestone.

Even though this crash manifests as a GC corruption/segfault, it seems to be stemming from a bug in the LLVM compiler.

@d-netto
Copy link
Member

d-netto commented Dec 5, 2023

I tried to find the artifact of #52294, but I could not figure out the path. I could test the PR preferably in binary format.

Not sure if there would be a binary available for download. @vchuravy or @gbaraldi might know.

@topolarity
Copy link
Member

The latest nightlies at https://julialang.org/downloads/nightlies/ now include #52294, which merged today

@jaakkor2
Copy link
Contributor Author

jaakkor2 commented Dec 6, 2023

The latest nightlies at https://julialang.org/downloads/nightlies/ now include #52294, which merged today

It works for me!

julia> @time versioninfo()
Julia Version 1.11.0-DEV.1026
Commit 42094740f8f (2023-12-05 18:24 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD EPYC 7371 16-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-15.0.7 (ORCJIT, znver1)
  Threads: 95 on 64 virtual cores
  0.754491 seconds (491.96 k allocations: 28.245 MiB, 2.49% gc time, 99.56% compilation time)

@ufechner7
Copy link

ufechner7 commented Dec 7, 2023

Can this be closed now, or only after #52294 is back-ported?

@ufechner7
Copy link

ufechner7 commented Dec 12, 2023

My pull request was closed, #52294 is back-ported, so why can this bug not be closed?
See: #52503

@IanButterworth
Copy link
Member

Rationale for closing seems fair. Someone can reopen if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants