Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in shared_ptr<> when compiled with -O3 #1806

Closed
sirzooro opened this issue Nov 25, 2022 · 7 comments · Fixed by #1815
Closed

Crash in shared_ptr<> when compiled with -O3 #1806

sirzooro opened this issue Nov 25, 2022 · 7 comments · Fixed by #1815
Labels
bug Something isn't working

Comments

@sirzooro
Copy link

Describe your environment
opentelemetry-cpp-1.7.0 compiled from official release package, configured with cmake -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DBUILD_SHARED_LIBS=ON -DWITH_OTLP=ON -DWITH_OTLP_GRPC=OFF -DWITH_PROMETHEUS=ON -DCMAKE_INSTALL_PREFIX=/usr ..
g++ (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1)
CentOS Stream release 8
Linux localhost.localdomain 4.18.0-408.el8.x86_64 #1 SMP Mon Jul 18 17:42:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Steps to reproduce
Compile below code using g++ -O3 -g -o test test.cc -lopentelemetry_common and run it:

#include <opentelemetry/sdk/common/global_log_handler.h>

class MyLogger : public opentelemetry::sdk::common::internal_log::LogHandler
{
public:
    virtual void Handle(
        opentelemetry::sdk::common::internal_log::LogLevel level, const char* file, int line,
        const char* msg, const opentelemetry::sdk::common::AttributeMap& attributes) noexcept override
    {}

    static opentelemetry::nostd::shared_ptr<MyLogger> getInstance() { return instance; }

private:
    static opentelemetry::nostd::shared_ptr<MyLogger> instance;
};

opentelemetry::nostd::shared_ptr<MyLogger> MyLogger::instance = opentelemetry::nostd::shared_ptr<MyLogger>(new MyLogger());

int main()
{
    opentelemetry::sdk::common::internal_log::GlobalLogHandler::SetLogHandler(MyLogger::getInstance());
    return 0;
}

What is the expected behavior?
No crash

What is the actual behavior?
Crash. This happen only when code is compiled with -O3. -O2 and lower optimization levels works fine (no crash, no issues from Valgrind).

Additional context
Crash details from gdb:

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000000402199 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x417f10)
    at /opt/rh/gcc-toolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:151
#2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x417f10)
    at /opt/rh/gcc-toolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:151
#3  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:733
#4  std::__shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>)
    at /opt/rh/gcc-toolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:1183
#5  std::shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler>::~shared_ptr (this=<optimized out>,
    __in_chrg=<optimized out>) at /opt/rh/gcc-toolset-10/root/usr/include/c++/10/bits/shared_ptr.h:121
#6  opentelemetry::v1::nostd::shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler>::shared_ptr_wrapper::~shared_ptr_wrapper (this=<optimized out>, __in_chrg=<optimized out>)
    at /usr/include/opentelemetry/nostd/shared_ptr.h:43
#7  opentelemetry::v1::nostd::shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler>::operator= (
    other=..., this=<optimized out>) at /usr/include/opentelemetry/nostd/shared_ptr.h:115
#8  opentelemetry::v1::sdk::common::internal_log::GlobalLogHandler::SetLogHandler (eh=...)
    at /usr/include/opentelemetry/sdk/common/global_log_handler.h:110
#9  main () at test.cc:21

Crash details from Valgrind (I had to stop it using Ctrl-C):

==34199== Memcheck, a memory error detector
==34199== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==34199== Using Valgrind-3.16.0 and LibVEX; rerun with -h for copyright info
==34199== Command: ./test
==34199==
==34199== Invalid read of size 4
==34199==    at 0x4021B9: __exchange_and_add (atomicity.h:50)
==34199==    by 0x4021B9: __exchange_and_add_dispatch (atomicity.h:84)
==34199==    by 0x4021B9: _M_release (shared_ptr_base.h:155)
==34199==    by 0x4021B9: ~__shared_count (shared_ptr_base.h:733)
==34199==    by 0x4021B9: ~__shared_ptr (shared_ptr_base.h:1183)
==34199==    by 0x4021B9: ~shared_ptr (shared_ptr.h:121)
==34199==    by 0x4021B9: ~shared_ptr_wrapper (shared_ptr.h:43)
==34199==    by 0x4021B9: operator= (shared_ptr.h:115)
==34199==    by 0x4021B9: SetLogHandler (global_log_handler.h:110)
==34199==    by 0x4021B9: main (test.cc:21)
==34199==  Address 0x5d69d88 is 8 bytes inside a block of size 24 free'd
==34199==    at 0x4C375FC: operator delete(void*) (vg_replace_malloc.c:584)
==34199==    by 0x404C42B: std::_Sp_counted_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler*, (__gnu_cxx::_Lock_policy)2>::~_Sp_counted_ptr() (in /usr/lib64/libopentelemetry_common.so)
==34199==    by 0x404C520: std::_Sp_counted_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler*, (__gnu_cxx::_Lock_policy)2>::_M_destroy() (in /usr/lib64/libopentelemetry_common.so)
==34199==    by 0x4021B5: _M_release (shared_ptr_base.h:174)
==34199==    by 0x4021B5: _M_release (shared_ptr_base.h:151)
==34199==    by 0x4021B5: ~__shared_count (shared_ptr_base.h:733)
==34199==    by 0x4021B5: ~__shared_ptr (shared_ptr_base.h:1183)
==34199==    by 0x4021B5: ~shared_ptr (shared_ptr.h:121)
==34199==    by 0x4021B5: ~shared_ptr_wrapper (shared_ptr.h:43)
==34199==    by 0x4021B5: operator= (shared_ptr.h:115)
==34199==    by 0x4021B5: SetLogHandler (global_log_handler.h:110)
==34199==    by 0x4021B5: main (test.cc:21)
==34199==  Block was alloc'd at
==34199==    at 0x4C36586: operator new(unsigned long) (vg_replace_malloc.c:342)
==34199==    by 0x404C27F: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<opentelemetry::v1::sdk::common::internal_log::LogHandler*>(opentelemetry::v1::sdk::common::internal_log::LogHandler*) (in /usr/lib64/libopentelemetry_common.so)
==34199==    by 0x404C1C0: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<opentelemetry::v1::sdk::common::internal_log::LogHandler*>(opentelemetry::v1::sdk::common::internal_log::LogHandler*, std::integral_constant<bool, false>) (in /usr/lib64/libopentelemetry_common.so)
==34199==    by 0x404BF75: std::__shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler, void>(opentelemetry::v1::sdk::common::internal_log::LogHandler*) (in /usr/lib64/libopentelemetry_common.so)
==34199==    by 0x404BE30: std::shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler>::shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler, void>(opentelemetry::v1::sdk::common::internal_log::LogHandler*) (in /usr/lib64/libopentelemetry_common.so)
==34199==    by 0x404BC1F: opentelemetry::v1::nostd::shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler>::shared_ptr(opentelemetry::v1::sdk::common::internal_log::LogHandler*) (in /usr/lib64/libopentelemetry_common.so)
==34199==    by 0x404B903: opentelemetry::v1::sdk::common::internal_log::GlobalLogHandler::GetHandlerAndLevel() (in /usr/lib64/libopentelemetry_common.so)
==34199==    by 0x402155: SetLogHandler (global_log_handler.h:110)
==34199==    by 0x402155: main (test.cc:21)
==34199==
==34199==
==34199== More than 10000000 total errors detected.  I'm not reporting any more.
==34199== Final error counts will be inaccurate.  Go fix your program!
==34199== Rerun with --error-limit=no to disable this cutoff.  Note
==34199== that errors may occur in your program without prior warning from
==34199== Valgrind, because errors are no longer being displayed.
==34199==
^C==34199==
==34199== Process terminating with default action of signal 2 (SIGINT)
==34199==    at 0x4021B6: __exchange_and_add (atomicity.h:50)
==34199==    by 0x4021B6: __exchange_and_add_dispatch (atomicity.h:84)
==34199==    by 0x4021B6: _M_release (shared_ptr_base.h:155)
==34199==    by 0x4021B6: ~__shared_count (shared_ptr_base.h:733)
==34199==    by 0x4021B6: ~__shared_ptr (shared_ptr_base.h:1183)
==34199==    by 0x4021B6: ~shared_ptr (shared_ptr.h:121)
==34199==    by 0x4021B6: ~shared_ptr_wrapper (shared_ptr.h:43)
==34199==    by 0x4021B6: operator= (shared_ptr.h:115)
==34199==    by 0x4021B6: SetLogHandler (global_log_handler.h:110)
==34199==    by 0x4021B6: main (test.cc:21)
==34199==
==34199== HEAP SUMMARY:
==34199==     in use at exit: 32 bytes in 2 blocks
==34199==   total heap usage: 5 allocs, 3 frees, 72,768 bytes allocated
==34199==
==34199== LEAK SUMMARY:
==34199==    definitely lost: 0 bytes in 0 blocks
==34199==    indirectly lost: 0 bytes in 0 blocks
==34199==      possibly lost: 0 bytes in 0 blocks
==34199==    still reachable: 32 bytes in 2 blocks
==34199==         suppressed: 0 bytes in 0 blocks
==34199== Rerun with --leak-check=full to see details of leaked memory
==34199==
==34199== For lists of detected and suppressed errors, rerun with: -s
==34199== ERROR SUMMARY: 10000000 errors from 1 contexts (suppressed: 0 from 0)
@sirzooro sirzooro added the bug Something isn't working label Nov 25, 2022
@sirzooro
Copy link
Author

sirzooro commented Nov 25, 2022

I have noticed in allocation callstack from Valgrind that shared pointer type changes from nostd::shared_ptr to std::shared_ptr, corresponding two lines are below. This change is either unexpected or not handled properly in the code. I suspect that nostd::shared_ptr ABI at -O3 is not compatible with one at -O2 and below.

==34199==    by 0x404BE30: std::shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler>::shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler, void>(opentelemetry::v1::sdk::common::internal_log::LogHandler*) (in /usr/lib64/libopentelemetry_common.so)
==34199==    by 0x404BC1F: opentelemetry::v1::nostd::shared_ptr<opentelemetry::v1::sdk::common::internal_log::LogHandler>::shared_ptr(opentelemetry::v1::sdk::common::internal_log::LogHandler*) (in /usr/lib64/libopentelemetry_common.so)

@marcalff
Copy link
Member

This part of the call stack probably comes from:

api/include/opentelemetry/nostd/shared_ptr.h

  explicit shared_ptr(pointer ptr)
  {
    std::shared_ptr<T> ptr_(ptr);
    new (buffer_.data) shared_ptr_wrapper{std::move(ptr_)};
  }

@sirzooro
Copy link
Author

I tried to run this with Clang Undefined Behavior Sanitizer and got this:

 clang++ -fsanitize=undefined -O3 -g -o test test.cc -lopentelemetry_common; ./test
/usr/include/opentelemetry/nostd/shared_ptr.h:116:21: runtime error: member call on address 0x7ffcd995fbc8 which does not point to an object of type 'opentelemetry::nostd::shared_ptr<opentelemetry::sdk::common::internal_log::LogHandler>::shared_ptr_wrapper'
0x7ffcd995fbc8: note: object is of type 'opentelemetry::v1::nostd::shared_ptr<MyLogger>::shared_ptr_wrapper'
 04 7f 00 00  a0 8c 44 00 00 00 00 00  d0 12 c3 01 00 00 00 00  b0 2f c4 01 00 00 00 00  c0 fc 95 d9
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'opentelemetry::v1::nostd::shared_ptr<MyLogger>::shared_ptr_wrapper'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/opentelemetry/nostd/shared_ptr.h:116:21 in
/usr/include/opentelemetry/nostd/shared_ptr.h:98:30: runtime error: member call on address 0x7ffcd995fbc8 which does not point to an object of type 'opentelemetry::nostd::shared_ptr<opentelemetry::sdk::common::internal_log::LogHandler>::shared_ptr_wrapper'
0x7ffcd995fbc8: note: object is of type 'opentelemetry::v1::nostd::shared_ptr<MyLogger>::shared_ptr_wrapper'
 04 7f 00 00  a0 8c 44 00 00 00 00 00  d0 12 c3 01 00 00 00 00  b0 2f c4 01 00 00 00 00  c0 fc 95 d9
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'opentelemetry::v1::nostd::shared_ptr<MyLogger>::shared_ptr_wrapper'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/opentelemetry/nostd/shared_ptr.h:98:30 in

Looks like related to or duplicate of #1337

@owent
Copy link
Member

owent commented Nov 28, 2022

Do you compile otel-cpp with -DWITH_STL=ON?

@sirzooro
Copy link
Author

No. here are my options:
cmake -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DBUILD_SHARED_LIBS=ON -DWITH_OTLP=ON -DWITH_OTLP_GRPC=OFF -DWITH_PROMETHEUS=ON -DCMAKE_INSTALL_PREFIX=/usr ..

@owent
Copy link
Member

owent commented Nov 28, 2022

Thanks, the problem is when using nostd::shared_ptr it construct with a wrong type when we move a it from some base or derived type.I have created a PR(#1815) to fix it.

@sirzooro
Copy link
Author

Thanks for the fix, I do not see crash anymore. Clang sanitizer also does not report anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants