Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling many spawn and sync does not work on Windows #43124

Closed
tkf opened this issue Nov 18, 2021 · 4 comments · Fixed by #55515
Closed

Profiling many spawn and sync does not work on Windows #43124

tkf opened this issue Nov 18, 2021 · 4 comments · Fixed by #55515
Labels
bug Indicates an unexpected problem or unintended behavior profiler

Comments

@tkf
Copy link
Member

tkf commented Nov 18, 2021

@DilumAluthge noticed that the test I added in #43072 fails sometimes (presumably only) on Windows #43072 (comment)

The test is to run this script

function spawnmany(n)
if n > 2
m = n ÷ 2
t = Threads.@spawn spawnmany(m)
spawnmany(m)
wait(t)
end
end
@profile spawnmany(parse(Int, get(ENV, "NTASKS", "2000000")))

and check that it does not error or time out. There's no backtrace from this script (yet) because I forgot to redirect stdout and stderr.

@tkf
Copy link
Member Author

tkf commented Nov 19, 2021

The initial version of #43125 contained some logging upon error and...

I get the log output

┌ Error: Failed test: "spawn and wait *a lot* of tasks in @profile" (n = 2000000)
│ 
...
│   proc.exitcode = 3221225477
│   proc.termsignal = 0

in https://build.julialang.org/#/builders/65/builds/5543

3221225477 is STATUS_ACCESS_VIOLATION so there's something wrong. But it looks like there's no backtrace.

Note that n = 2000000 is the largest number of tasks in the test. I wonder if ignoring this case is sufficient.

--- #43125 (comment)

@tkf
Copy link
Member Author

tkf commented Nov 27, 2021

@vtjnash noticed that Windows CI can hit the timeout too (even after #43125):

Error in testset threads:
Test Failed at C:\buildbot\worker-tabularasa\tester_win64\build\share\julia\test\threads.jl:214
  Expression: !timeout

https://build.julialang.org/#/builders/65/builds/5821

@tkf
Copy link
Member Author

tkf commented Nov 28, 2021

Oops, my PR comment in #43247 contained a substring fixes #43124.

@tkf tkf reopened this Nov 28, 2021
@tkf tkf added bug Indicates an unexpected problem or unintended behavior help wanted Indicates that a maintainer wants help on an issue or pull request labels Nov 28, 2021
@LilithHafner LilithHafner added the system:mac Affects only macOS label Jun 5, 2023
@LilithHafner
Copy link
Member

This also fails somtimes on mac (see https://buildkite.com/julialang/julia-master/builds/24520#0188847a-6af8-4be7-92cd-c03932978df4/591-594)

┌ Error: A "spawn and wait lots of tasks" test failed
│   n = 2000000
│   proc.exitcode = 0
│   proc.termsignal = 15
│   success(proc) = false
│   timeout = true
└ @ Main.Test31Main_threads /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-x64-4.0/build/default-macmini-x64-4-0/julialang/julia-master/julia-f407a4cac3/share/julia/test/threads.jl:313
  threads                                           (1) \|         failed at 2023-06-04T00:31:23.823
Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-x64-4.0/build/default-macmini-x64-4-0/julialang/julia-master/julia-f407a4cac3/share/julia/test/threads.jl:319
  Expression: success(proc)
   
Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-x64-4.0/build/default-macmini-x64-4-0/julialang/julia-master/julia-f407a4cac3/share/julia/test/threads.jl:320
  Expression: !timeout

on test x86_64-apple-darwin

@vtjnash vtjnash removed help wanted Indicates that a maintainer wants help on an issue or pull request multithreading Base.Threads and related functionality system:windows Affects only Windows system:mac Affects only macOS labels Aug 19, 2024
vtjnash added a commit that referenced this issue Aug 21, 2024
Removes the warning on platforms where CFI_NORETURN appears likely to be
sufficient alone for this to work (from observation in gdb/lldb) and
re-enables the test on all platforms so we can see if more work here is
needed at all (e.g. similar annotations in jl_setjmp/jl_longjmp).

Refs #43124
vtjnash added a commit that referenced this issue Aug 27, 2024
Removes the warning on platforms where CFI_NORETURN appears likely to be
sufficient alone for this to work (from observation in gdb/lldb) and
re-enables the test on all platforms so we can see if more work here is
needed at all (e.g. similar annotations in jl_setjmp/jl_longjmp).

Refs #43124
udesou pushed a commit to udesou/julia that referenced this issue Aug 28, 2024
Removes the warning on platforms where CFI_NORETURN appears likely to be
sufficient alone for this to work (from observation in gdb/lldb) and
re-enables the test on all platforms so we can see if more work here is
needed at all (e.g. similar annotations in jl_setjmp/jl_longjmp).

Refs JuliaLang#43124
KristofferC pushed a commit that referenced this issue Sep 12, 2024
Move the registers onto the stack, so that they only are present when
the Task is actually switched out, saving memory when the Task is not
running yet or already finished. It makes this mostly just a huge
renaming job.

On Linux x86_64 this reduces it from 376 bytes to 184 bytes.

Has some additional advantages too, such as copy_stack tasks (e.g. with
always_copy_stacks) can migrate to other threads before starting if they
are not sticky.

Also fixes a variable that got mixed up by #54639 and caused
always_copy_stacks to abort, since the stack limits were wrong.

Also now fixes #43124, though I
am not quite confident enough in it to re-enable that test right now.
KristofferC pushed a commit that referenced this issue Sep 12, 2024
Removes the warning on platforms where CFI_NORETURN appears likely to be
sufficient alone for this to work (from observation in gdb/lldb) and
re-enables the test on all platforms so we can see if more work here is
needed at all (e.g. similar annotations in jl_setjmp/jl_longjmp).

Refs #43124
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior profiler
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants