-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault sampling without instrumentation python app. #220
Comments
How many threads is the application creating? This isn't a segfault. This is omnitrace hitting its limit for the number of active threads (which is set to 2048 for release builds). |
This is what gdb is telling me when the exception is raised:
and
So there are 291 threads active at that point. This is a python code using GPUs, and python is known for forking many processes. What other info should I try capture? |
If I run strace with omnitrace I count 864 occurences of |
By the way, I have made some improvements to handling forks in #250 and I have a solution that allows omnitrace to support any number of threads |
I get the segmentation fault below with OpenSUSE version for ROCm 5.3.3, the same code built for ROCm 5.2.3 works well with the corresponding omnitrace release. The code itself is a python code (pytorch) workload. The code is run as
omnitrace-sample --include rcclp -c $wd/omnitrace.cfg -- python -u ./train.py ...
. The RCCLP include doesn't make a difference.I appreciate it is hard to debug these things without the actual but at the same type it is not trivial to build. I am thinking it might be easier to get some guidance on what I should look for to troubleshoot.
The text was updated successfully, but these errors were encountered: