You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks like the all_gpu_id_array is not cleaned up when KMT is unloaded. If KMT is initialized multiple times in the same process it will leak the array multiple times. hsakmt_fmm_destroy_process_apertures seems to clean up the other global (gpu_mem) but not all_gpu_id_array like it should.
From ASAN:
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x5ff5b2387bcf in malloc (/home/nod/src/iree-build/runtime/src/iree/hal/drivers/amdgpu/cts/amdgpu_all_driver_test+0x223bcf) (BuildId: 1530ccada4eb72df)
#1 0x74e567024f56 in hsakmt_fmm_init_process_apertures /home/nod/src/ROCR-Runtime/libhsakmt/src/fmm.c:2642:22
#2 0x74e567034da9 in hsaKmtAcquireSystemProperties /home/nod/src/ROCR-Runtime/libhsakmt/src/topology.c:2190:8
#3 0x74e566ea3a10 in rocr::AMD::BuildTopology() /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/amd_topology.cpp:306:36
#4 0x74e566ea420e in rocr::AMD::Load() /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/amd_topology.cpp:433:18
#5 0x74e566ee96c2 in rocr::core::Runtime::Load() /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/runtime.cpp:1995:17
#6 0x74e566ee0945 in rocr::core::Runtime::Acquire() /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/runtime.cpp:140:51
#7 0x74e566eaaf83 in rocr::HSA::hsa_init() /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/hsa.cpp:206:52
#8 0x74e566f567f5 in hsa_init /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/common/hsa_table_interface.cpp:70:35
#9 0x5ff5b243eeed in iree_hsa_init /home/nod/src/iree/runtime/src/iree/hal/drivers/amdgpu/util/libhsa_tables.h:11:1
#10 0x5ff5b243e426 in iree_hal_amdgpu_libhsa_initialize /home/nod/src/iree/runtime/src/iree/hal/drivers/amdgpu/util/libhsa.c:498:14
#11 0x5ff5b2400e80 in iree_hal_amdgpu_driver_load_libhsa /home/nod/src/iree/runtime/src/iree/hal/drivers/amdgpu/driver.c:231:26
#12 0x5ff5b2400b63 in iree_hal_amdgpu_driver_create /home/nod/src/iree/runtime/src/iree/hal/drivers/amdgpu/driver.c:270:26
#13 0x5ff5b23d4222 in iree_hal_amdgpu_driver_factory_try_create /home/nod/src/iree/runtime/src/iree/hal/drivers/amdgpu/registration/driver_module.c:40:26
#14 0x5ff5b23fffaa in iree_hal_driver_registry_try_create /home/nod/src/iree/runtime/src/iree/hal/driver_registry.c:314:14
#15 0x5ff5b23c94f9 in iree::hal::cts::TryGetDriver(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, iree_hal_driver_t**) /home/nod/src/iree/runtime/src/iree/hal/cts/cts_test_base.h:73:26
#16 0x5ff5b23ca866 in iree::hal::cts::DriverTest::CreateDriver() /home/nod/src/iree/runtime/src/iree/hal/cts/driver_test.h:38:14
#17 0x5ff5b23c81ee in iree::hal::cts::DriverTest_QueryAndCreateAvailableDevicesByOrdinal_Test::TestBody() /home/nod/src/iree/runtime/src/iree/hal/cts/driver_test.h:103:17
#18 0x5ff5b2525ce8 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2635:10
#19 0x5ff5b24e5491 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2671:14
#20 0x5ff5b2498e23 in testing::Test::Run() /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2710:5
#21 0x5ff5b249a796 in testing::TestInfo::Run() /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2856:11
#22 0x5ff5b249bde6 in testing::TestSuite::Run() /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:3034:30
#23 0x5ff5b24bef9e in testing::internal::UnitTestImpl::RunAllTests() /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:5964:44
#24 0x5ff5b252f928 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2635:10
#25 0x5ff5b24ea6b6 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2671:14
#26 0x5ff5b24be225 in testing::UnitTest::Run() /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:5543:10
#27 0x5ff5b2400690 in RUN_ALL_TESTS() /home/nod/src/iree/third_party/googletest/googletest/include/gtest/gtest.h:2334:73
#28 0x5ff5b24005b3 in main /home/nod/src/iree/runtime/src/iree/testing/gtest_main.cc:20:13
#29 0x74e575c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
The text was updated successfully, but these errors were encountered:
It looks like the
all_gpu_id_array
is not cleaned up when KMT is unloaded. If KMT is initialized multiple times in the same process it will leak the array multiple times.hsakmt_fmm_destroy_process_apertures
seems to clean up the other global (gpu_mem
) but notall_gpu_id_array
like it should.From ASAN:
The text was updated successfully, but these errors were encountered: