-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory error in devArrMatch #3860
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending CI
@hcho3 orthogonal request to the issue being fixed: could you open an issue to remove the duplicated file from cuML? We probably need to do some cleanup regarding situations like this |
@gpucibot merge |
Codecov Report
@@ Coverage Diff @@
## branch-0.20 #3860 +/- ##
===============================================
- Coverage 85.96% 85.40% -0.56%
===============================================
Files 225 227 +2
Lines 16986 17313 +327
===============================================
+ Hits 14602 14787 +185
- Misses 2384 2526 +142
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Related: rapidsai/raft#229 We recently discovered a memory error in the `devArrMatch()` function: https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci-v0.20/job/cuml/job/prb/job/cuml-gpu-test/CUDA=11.0,GPU_LABEL=gpu,OS=ubuntu16.04,PYTHON=3.7/161/console ``` 13:09:34 [----------] 44 tests from FilTests/TreeliteDenseFilTest 13:09:34 [ RUN ] FilTests/TreeliteDenseFilTest.Import/0 13:09:34 *** Error in `./test/ml': free(): invalid pointer: 0x00007f632b691fe8 *** 13:09:34 ======= Backtrace: ========= 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(+0x777f5)[0x7f632b3447f5] 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(+0x8038a)[0x7f632b34d38a] 13:09:34 /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f632b35158c] 13:09:34 /workspace/ci/artifacts/cuml/cpu/conda_work/cpp/build/libcuml++.so(_ZN8treelite9ModelImplIffED0Ev+0xf5)[0x7f632c556405] ``` which was traced to the `devArrMatch()` function as follows: ``` $ valgrind ./cpp/build/test/ml --gtest_filter=FilTests/TreeliteDenseFilTest.Import/0 ==6398== Mismatched free() / delete / delete [] ==6398== at 0x483D1CF: operator delete(void*, unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6398== by 0x209287: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC253: testing::AssertionResult raft::devArrMatch<float, raft::CompareApprox<float> >(float const*, float const*, unsigned long, raft::CompareApprox<float>, CUstream_st*) (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC51C: ML::BaseFilTest::compare() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x4858098D: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580BE0: testing::Test::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580F0E: testing::TestInfo::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581035: testing::TestSuite::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x485815EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581858: testing::UnitTest::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x4853007E: main (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest_main.so) ==6398== Address 0x232bfa860 is 0 bytes inside a block of size 160,000 alloc'd ==6398== at 0x483C583: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==6398== by 0x2ABFF8: testing::AssertionResult raft::devArrMatch<float, raft::CompareApprox<float> >(float const*, float const*, unsigned long, raft::CompareApprox<float>, CUstream_st*) (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x2AC51C: ML::BaseFilTest::compare() (in /home/ubuntu/cuml/cpp/build/test/ml) ==6398== by 0x4858098D: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580BE0: testing::Test::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48580F0E: testing::TestInfo::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581035: testing::TestSuite::Run() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x485815EB: testing::internal::UnitTestImpl::RunAllTests() (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x48581858: testing::UnitTest::Run() (in/home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest.so) ==6398== by 0x4853007E: main (in /home/ubuntu/miniconda3/envs/cuml_dev/lib/libgtest_main.so) ``` **Diagnosis**. The `devArrMatch` functions are allocating a temporary buffer using `new[]` and then assigning it to a `shared_ptr<T>`. This is not valid because the destructor of `shared_ptr<T>` will invoke `delete`, not `delete[]`. Calling `delete` with a buffer allocated by `new[]` leads to an undefined behavior. See https://docs.microsoft.com/en-us/cpp/code-quality/c6278?view=msvc-160. **Proposed fix**. Use `std:unique_ptr<T[]>` instead to store temporary buffers. Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#3860
Related: rapidsai/raft#229
We recently discovered a memory error in the
devArrMatch()
function: https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci-v0.20/job/cuml/job/prb/job/cuml-gpu-test/CUDA=11.0,GPU_LABEL=gpu,OS=ubuntu16.04,PYTHON=3.7/161/consolewhich was traced to the
devArrMatch()
function as follows:Diagnosis. The
devArrMatch
functions are allocating a temporary buffer usingnew[]
and then assigning it to ashared_ptr<T>
. This is not valid because the destructor ofshared_ptr<T>
will invokedelete
, notdelete[]
. Callingdelete
with a buffer allocated bynew[]
leads to an undefined behavior. See https://docs.microsoft.com/en-us/cpp/code-quality/c6278?view=msvc-160.Proposed fix. Use
std:unique_ptr<T[]>
instead to store temporary buffers.