Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadSanitizer: thread leak from HIP runtime #3182

Closed
al42and opened this issue Mar 13, 2023 · 7 comments
Closed

ThreadSanitizer: thread leak from HIP runtime #3182

al42and opened this issue Mar 13, 2023 · 7 comments

Comments

@al42and
Copy link

al42and commented Mar 13, 2023

Trying to run any app which uses HIP API with TSAN triggers a "thread leak" error at the end:

$ hipcc tsan.cpp -g -fsanitize=thread -o tsan && ./tsan
clang-15: warning: ignoring '-fsanitize=thread' option as it is not currently supported for target 'amdgcn-amd-amdhsa' [-Woption-ignored]
Detected 2 devices
==================
WARNING: ThreadSanitizer: thread leak (pid=2176329)
  Thread T2 (tid=2176332, finished) created by main thread at:
    #0 pthread_create /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1022:3 (tsan+0x263213)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State>>, void (*)()) <null> (libstdc++.so.6+0xd70a8) (BuildId: c90e6603c7cdf84713cd445700a575d3ea446d9b)

SUMMARY: ThreadSanitizer: thread leak (/lib/x86_64-linux-gnu/libstdc++.so.6+0xd70a8) (BuildId: c90e6603c7cdf84713cd445700a575d3ea446d9b) in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State>>, void (*)())
==================
ThreadSanitizer: reported 1 warnings

Tested with ROCm 5.4.1 on MI50 and ROCm 5.4.2 on RX 6400.

Code used (anything doing HIP API calls should work):

#include "hip/hip_runtime.h"
#include <iostream>

int main() {
  int n;
  auto err = hipGetDeviceCount(&n);
  std::cout << "Detected " << n << " devices\n";
  return 0;
}
@jatinx
Copy link
Contributor

jatinx commented Mar 14, 2023

Thanks for reporting, will look into it.

@ppanchad-amd
Copy link

@al42and Apologies for the lack of response. Can you please test with latest ROCm 6.0.2 (HIP 6.0.32831)? If resolved, please close ticket. Thanks!

@al42and
Copy link
Author

al42and commented Apr 11, 2024

Don't have 6.0.2 at hand, but the problem still occurs with 6.0.0:

$ hipcc tsan.cpp -g -fsanitize=thread -o tsan && ./tsan
clang: warning: ignoring '-fsanitize=thread' option as it is not currently supported for target 'amdgcn-amd-amdhsa' [-Woption-ignored]
Detected 1 devices
==================
WARNING: ThreadSanitizer: thread leak (pid=3481992)
  Thread T2 (tid=3482001, finished) created by main thread at:
    #0 pthread_create /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1048 (tsan+0x296503)
    #1 <null> <null> (libhsa-runtime64.so.1+0x2972b) (BuildId: fdfae95418d176670b25ac26f0542b05d0aec181)
    #2 hipGetDeviceCount ??:? (libamdhip64.so.6+0xa9c23) (BuildId: c119a12e92604d9b1dd360dcf538793bfab296a4)
    #3 __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58 (libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)

SUMMARY: ThreadSanitizer: thread leak (/opt/rocm-6.0.0/lib/llvm/bin/../../../lib/libhsa-runtime64.so.1+0x2972b) (BuildId: fdfae95418d176670b25ac26f0542b05d0aec181) 
==================
ThreadSanitizer: reported 1 warnings

Note for others trying to reproduce: Since hipcc in ROCm 6.0 is based on Clang 17, it requires a workaround for TSAN on newer kernels: google/sanitizers#1716 (comment). But this is not directly related to the issue here.

@al42and
Copy link
Author

al42and commented May 20, 2024

Still happens with 6.1:

$ hipcc --version
HIP version: 6.1.40092-038397aaa
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-6.1.1 24154 f53cd7e03908085f4932f7329464cd446426436a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.1.1/llvm/bin
Configuration file: /opt/rocm-6.1.1/lib/llvm/bin/clang++.cfg

$ hipcc tsan.cpp -g -fsanitize=thread -o tsan && ./tsan
clang: warning: ignoring '-fsanitize=thread' option as it is not currently supported for target 'amdgcn-amd-amdhsa' [-Woption-ignored]
Detected 1 devices
/usr/bin/addr2line: DWARF error: invalid or unhandled FORM value: 0x23
==================
WARNING: ThreadSanitizer: thread leak (pid=64226)
  Thread T2 (tid=64235, finished) created by main thread at:
    #0 pthread_create ??:? (tsan+0x29c90b)
    #1 <null> <null> (libhsa-runtime64.so.1+0x2c0fc) (BuildId: 8575df86329e78c19cac825f819d82b0361816da)
    #2 hipGetCmdName ??:? (libamdhip64.so.6+0xad053) (BuildId: daff87db3cceb0402dea325b66af7507d54d0eb2)
    #3 __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58 (libc.so.6+0x29d8f) (BuildId: 962015aa9d133c6cbcfb31ec300596d7f44d3348)

SUMMARY: ThreadSanitizer: thread leak ??:? in pthread_create
==================
ThreadSanitizer: reported 1 warnings

@ppanchad-amd
Copy link

@al42and We have an internal ticket to investigate this issue. Thanks!

@darren-amd
Copy link

Hi @al42and,

I tried to reproduce the issue you are facing but could not find any threads that were leaking with the latest version of ROCm (6.2.2). I verified with threadSanitizer and gdb.

However, there was an issue with threadSanitizer where I got an error message with unexpected memory mapping. If you face a similar issue, there was a recent kernel update that bumped vm.mmap_rnd_bits up from 28 to 32 for amd64 systems. There was also an update to support only up to 30 ASLR bits for threadSanitizer: ThreadSanitizer ASLR Change. Therefore, to solve this issue, you would have to reduce ASLR bits from 32 to 30:

sudo sysctl vm.mmap_rnd_bits=30

Please give that a try on the latest version of ROCm and let me know if the issue persists, thanks!

@al42and
Copy link
Author

al42and commented Oct 17, 2024

Hi @darren-amd

I tried to reproduce the issue you are facing but could not find any threads that were leaking with the latest version of ROCm (6.2.2). I verified with threadSanitizer and gdb.

Thanks. I can confirm that the issue can no longer be reproduced with 6.2.2 while still happening on the same machine with 6.1.1.

However, there was an issue with threadSanitizer where I got an error message with unexpected memory mapping. If you face a similar issue, there was a recent kernel update that bumped vm.mmap_rnd_bits up from 28 to 32 for amd64 systems. There was also an update to support only up to 30 ASLR bits for threadSanitizer: ThreadSanitizer ASLR Change. Therefore, to solve this issue, you would have to reduce ASLR bits from 32 to 30:

Yes, I'm aware of that, see the note in #3182 (comment).

@al42and al42and closed this as completed Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants