Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault fix #30

Merged
merged 65 commits into from
Mar 13, 2023
Merged
Changes from 1 commit
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
23b59d3
Keeping only the suspected faulty test (by commenting out rest (!))
alexandrebouchard Mar 6, 2023
40001d0
Removing Turing from test to isolate only one error at the time
alexandrebouchard Mar 6, 2023
ff5aac9
Check behaviour on different threadlevels
alexandrebouchard Mar 6, 2023
d00ff53
Back to same threadlevel as not changing crashing behaviour
alexandrebouchard Mar 6, 2023
69bb270
More conservative tag ub?
alexandrebouchard Mar 6, 2023
d94252f
Revert "More conservative tag ub?"
alexandrebouchard Mar 6, 2023
bcf016d
Trying on OpenMPI instead of mpich
alexandrebouchard Mar 6, 2023
379df4b
Going back to mpich, trying threadlevel = :multiple
alexandrebouchard Mar 7, 2023
2cd3571
Back to funneled as :multiple still crashes
alexandrebouchard Mar 7, 2023
c8a0261
Further simplification
alexandrebouchard Mar 7, 2023
6217650
Candidate fix
alexandrebouchard Mar 8, 2023
1f16936
Reintroducing other tests
alexandrebouchard Mar 8, 2023
0ca198f
purge tests from Turing, use only DynamicPPL
miguelbiron Mar 8, 2023
ee276fc
Another free
alexandrebouchard Mar 8, 2023
491c26b
Trying to run tests with more rounds
alexandrebouchard Mar 8, 2023
fab7934
system MPI test
miguelbiron Mar 9, 2023
2ed0aae
fix typos
miguelbiron Mar 9, 2023
c271261
remove step that does not make sense outside MPI.jl
miguelbiron Mar 9, 2023
e3f569a
setup MPIPreferences in the test env
miguelbiron Mar 9, 2023
77824ea
possible fix to Pkg missing
miguelbiron Mar 9, 2023
c69b76a
possible fix to Pkg missing
miguelbiron Mar 9, 2023
523f9c1
add missing env
miguelbiron Mar 9, 2023
949b628
another try
miguelbiron Mar 9, 2023
895a45f
test now has its own Project.toml
miguelbiron Mar 9, 2023
13c9cd2
force Pkg.instantiate in ChildProcess
miguelbiron Mar 9, 2023
1772d82
dont use --project in ChildProcess
miguelbiron Mar 9, 2023
bdf71ce
ChildProcess inherits the exact same active project
miguelbiron Mar 9, 2023
2bcd825
new approach
miguelbiron Mar 9, 2023
eadbfcd
Changing approach for Isend/isend: explicit Waitall for all requests
alexandrebouchard Mar 9, 2023
57d0921
Is this a Turing multi-threading issue?
alexandrebouchard Mar 9, 2023
63aad97
Testing single-thread for all
alexandrebouchard Mar 9, 2023
7a8de7e
Adding a temporary "dry run" to see if problem cause by some compilation
alexandrebouchard Mar 9, 2023
5812127
Commenting out Turing and Blang, to see behaviour on toy_mvn and swap…
alexandrebouchard Mar 9, 2023
270333f
openmpi_jll test
miguelbiron Mar 9, 2023
2b49008
openmpi_jll test
miguelbiron Mar 9, 2023
9c6497f
openmpi_jll test
miguelbiron Mar 9, 2023
d68dc6c
openmpi_jll test
miguelbiron Mar 9, 2023
38de7cf
more messages from Child processes
miguelbiron Mar 9, 2023
8537307
extra arg to OpenMPI
miguelbiron Mar 9, 2023
c2b732b
rm mpi version query
miguelbiron Mar 9, 2023
0e82a84
use old julia_cmd + rm manifest as in MPI.jl CI
miguelbiron Mar 9, 2023
82a4210
add --oversubscribe for OpenMPI + rethrow exception
miguelbiron Mar 9, 2023
700c712
simplify logging
miguelbiron Mar 9, 2023
ebcb3a6
force instantiate + precompile
miguelbiron Mar 9, 2023
ad69b34
Add mpi_args to mpi_test
alexandrebouchard Mar 10, 2023
6bde8fe
Temporary: trying to speed up some key tests
alexandrebouchard Mar 10, 2023
7b5e7d4
Fix
alexandrebouchard Mar 10, 2023
fc34b85
Fix the fix + reintroducing the system-MPI tests
alexandrebouchard Mar 10, 2023
5ba839e
Add back libmpich-dev to resume investigation on ghostbug
alexandrebouchard Mar 10, 2023
2b37c0f
Trying to simplify CI setup needed to reproduce ghostbug
alexandrebouchard Mar 10, 2023
0847901
Fix last commit
alexandrebouchard Mar 10, 2023
f75a280
toy_mvn not enough to manifest ghostbug, trying Turing
alexandrebouchard Mar 10, 2023
93033fc
test mpich+openmpi using brew
miguelbiron Mar 10, 2023
d46d429
fix wrong abi detection for mpich
miguelbiron Mar 10, 2023
7a11074
move xtra args to MPI struct + remove prints + failsafe for empty cur…
miguelbiron Mar 10, 2023
d89911a
mpiexec args for childprocess
miguelbiron Mar 10, 2023
aa233e8
re-introduce all CI tests + fix bug in building mpi cmd
miguelbiron Mar 11, 2023
42993fd
add support for using without Project.toml
miguelbiron Mar 11, 2023
42bdd75
add comment explaining why we wait on Isend
miguelbiron Mar 11, 2023
c5cb70a
mpiexec_args is a Cmd now
miguelbiron Mar 11, 2023
9e31b49
re-instate all tests
miguelbiron Mar 11, 2023
74a4355
Test hypothesis that GC+multithread is issue; determine all MPIs affects
alexandrebouchard Mar 11, 2023
7454ae9
force instantiate in mpi_test + add MicrosoftMPI test
miguelbiron Mar 11, 2023
18ef655
adding back all test
miguelbiron Mar 12, 2023
b9c34d9
Change mpi_active impl to fix open mpi bug
alexandrebouchard Mar 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/mpi_utils/Entangler.jl
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ mpi_active() =
Comm_size(COMM_WORLD) > 1
end

init_mpi() = Init() #threadlevel = :funneled)
init_mpi() = Init(threadlevel = :funneled)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As troubleshooting, you should try MPI_THREAD_MULTIPLE. The default is thread-single, which is nearly the same as thread-funneled. But if you have race condition into MPI, then you need thread-multiple.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! Unfortunately, the segfault still arise with threadlevel = :multiple. On the positive side, I am now able to reproduce the problem locally.


"""
$SIGNATURES
Expand Down