-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault fix #30
Merged
Merged
Segfault fix #30
Changes from 59 commits
Commits
Show all changes
65 commits
Select commit
Hold shift + click to select a range
23b59d3
Keeping only the suspected faulty test (by commenting out rest (!))
alexandrebouchard 40001d0
Removing Turing from test to isolate only one error at the time
alexandrebouchard ff5aac9
Check behaviour on different threadlevels
alexandrebouchard d00ff53
Back to same threadlevel as not changing crashing behaviour
alexandrebouchard 69bb270
More conservative tag ub?
alexandrebouchard d94252f
Revert "More conservative tag ub?"
alexandrebouchard bcf016d
Trying on OpenMPI instead of mpich
alexandrebouchard 379df4b
Going back to mpich, trying threadlevel = :multiple
alexandrebouchard 2cd3571
Back to funneled as :multiple still crashes
alexandrebouchard c8a0261
Further simplification
alexandrebouchard 6217650
Candidate fix
alexandrebouchard 1f16936
Reintroducing other tests
alexandrebouchard 0ca198f
purge tests from Turing, use only DynamicPPL
miguelbiron ee276fc
Another free
alexandrebouchard 491c26b
Trying to run tests with more rounds
alexandrebouchard fab7934
system MPI test
miguelbiron 2ed0aae
fix typos
miguelbiron c271261
remove step that does not make sense outside MPI.jl
miguelbiron e3f569a
setup MPIPreferences in the test env
miguelbiron 77824ea
possible fix to Pkg missing
miguelbiron c69b76a
possible fix to Pkg missing
miguelbiron 523f9c1
add missing env
miguelbiron 949b628
another try
miguelbiron 895a45f
test now has its own Project.toml
miguelbiron 13c9cd2
force Pkg.instantiate in ChildProcess
miguelbiron 1772d82
dont use --project in ChildProcess
miguelbiron bdf71ce
ChildProcess inherits the exact same active project
miguelbiron 2bcd825
new approach
miguelbiron eadbfcd
Changing approach for Isend/isend: explicit Waitall for all requests
alexandrebouchard 57d0921
Is this a Turing multi-threading issue?
alexandrebouchard 63aad97
Testing single-thread for all
alexandrebouchard 7a8de7e
Adding a temporary "dry run" to see if problem cause by some compilation
alexandrebouchard 5812127
Commenting out Turing and Blang, to see behaviour on toy_mvn and swap…
alexandrebouchard 270333f
openmpi_jll test
miguelbiron 2b49008
openmpi_jll test
miguelbiron 9c6497f
openmpi_jll test
miguelbiron d68dc6c
openmpi_jll test
miguelbiron 38de7cf
more messages from Child processes
miguelbiron 8537307
extra arg to OpenMPI
miguelbiron c2b732b
rm mpi version query
miguelbiron 0e82a84
use old julia_cmd + rm manifest as in MPI.jl CI
miguelbiron 82a4210
add --oversubscribe for OpenMPI + rethrow exception
miguelbiron 700c712
simplify logging
miguelbiron ebcb3a6
force instantiate + precompile
miguelbiron ad69b34
Add mpi_args to mpi_test
alexandrebouchard 6bde8fe
Temporary: trying to speed up some key tests
alexandrebouchard 7b5e7d4
Fix
alexandrebouchard fc34b85
Fix the fix + reintroducing the system-MPI tests
alexandrebouchard 5ba839e
Add back libmpich-dev to resume investigation on ghostbug
alexandrebouchard 2b37c0f
Trying to simplify CI setup needed to reproduce ghostbug
alexandrebouchard 0847901
Fix last commit
alexandrebouchard f75a280
toy_mvn not enough to manifest ghostbug, trying Turing
alexandrebouchard 93033fc
test mpich+openmpi using brew
miguelbiron d46d429
fix wrong abi detection for mpich
miguelbiron 7a11074
move xtra args to MPI struct + remove prints + failsafe for empty cur…
miguelbiron d89911a
mpiexec args for childprocess
miguelbiron aa233e8
re-introduce all CI tests + fix bug in building mpi cmd
miguelbiron 42993fd
add support for using without Project.toml
miguelbiron 42bdd75
add comment explaining why we wait on Isend
miguelbiron c5cb70a
mpiexec_args is a Cmd now
miguelbiron 9e31b49
re-instate all tests
miguelbiron 74a4355
Test hypothesis that GC+multithread is issue; determine all MPIs affects
alexandrebouchard 7454ae9
force instantiate in mpi_test + add MicrosoftMPI test
miguelbiron 18ef655
adding back all test
miguelbiron b9c34d9
Change mpi_active impl to fix open mpi bug
alexandrebouchard File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,7 +7,8 @@ | |
.vscode/settings.json | ||
build | ||
.interfaces.md | ||
|
||
*.log | ||
*.err | ||
machines.txt | ||
results | ||
.includes_bu.jl | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As troubleshooting, you should try
MPI_THREAD_MULTIPLE
. The default is thread-single, which is nearly the same as thread-funneled. But if you have race condition into MPI, then you need thread-multiple.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion! Unfortunately, the segfault still arise with
threadlevel = :multiple
. On the positive side, I am now able to reproduce the problem locally.