-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DHCP timeout on AWS guest #20
Comments
Can you try the debug version? |
Adding @gleb-cloudius |
On 01/28/15 19:56, Avi Kivity wrote:
Yes. Debug seems to report trash. Note that the below is reported before DHCP sending discover ASAN:SIGSEGV==7801==ERROR: AddressSanitizer: SEGV on unknown address 0x602000076480 (pc 0x00000074e165 sp 0x7fff92516e58 bp 0x7fff92516ee0 T0) AddressSanitizer can not provide additional info.
|
On 01/28/2015 08:04 PM, vladzcloudius wrote:
Can you bisect the debug build to find where the error started? It seems unrelated to the patch.
|
duplicate of #18 |
…o_with Fixes failures in debug mode: ``` $ build/debug/tests/unit/closeable_test -l all -t deferred_close_test WARNING: debug mode. Not for benchmarking or production random-seed=3064133628 Running 1 test case... Entering test module "../../tests/unit/closeable_test.cc" ../../tests/unit/closeable_test.cc(0): Entering test case "deferred_close_test" ../../src/testing/seastar_test.cc(43): info: check true has passed ==9449==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! terminate called after throwing an instance of 'seastar::broken_promise' what(): broken promise ==9449==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fbf1f49f000; bottom 0x7fbf40971000; size: 0xffffffffdeb2e000 (-558702592) False positive error reports may follow For details see google/sanitizers#189 ================================================================= ==9449==AddressSanitizer CHECK failed: ../../../../libsanitizer/asan/asan_thread.cpp:356 "((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0, 0x0) #0 0x7fbf45f39d0b (/lib64/libasan.so.6+0xb3d0b) #1 0x7fbf45f57d4e (/lib64/libasan.so.6+0xd1d4e) #2 0x7fbf45f3e724 (/lib64/libasan.so.6+0xb8724) #3 0x7fbf45eb3e5b (/lib64/libasan.so.6+0x2de5b) #4 0x7fbf45eb51e8 (/lib64/libasan.so.6+0x2f1e8) #5 0x7fbf45eb7694 (/lib64/libasan.so.6+0x31694) #6 0x7fbf45f39398 (/lib64/libasan.so.6+0xb3398) #7 0x7fbf45f3a00b in __asan_report_load8 (/lib64/libasan.so.6+0xb400b) #8 0xfe6d52 in bool __gnu_cxx::operator!=<dl_phdr_info*, std::vector<dl_phdr_info, std::allocator<dl_phdr_info> > >(__gnu_cxx::__normal_iterator<dl_phdr_info*, std::vector<dl_phdr_info, std::allocator<dl_phdr_info> > > const&, __gnu_cxx::__normal_iterator<dl_phdr_info*, std::vector<dl_phdr_info, std::allocator<dl_phdr_info> > > const&) /usr/include/c++/10/bits/stl_iterator.h:1116 #9 0xfe615c in dl_iterate_phdr ../../src/core/exception_hacks.cc:121 #10 0x7fbf44bd1810 in _Unwind_Find_FDE (/lib64/libgcc_s.so.1+0x13810) #11 0x7fbf44bcd897 (/lib64/libgcc_s.so.1+0xf897) #12 0x7fbf44bcea5f (/lib64/libgcc_s.so.1+0x10a5f) #13 0x7fbf44bcefd8 in _Unwind_RaiseException (/lib64/libgcc_s.so.1+0x10fd8) #14 0xfe6281 in _Unwind_RaiseException ../../src/core/exception_hacks.cc:148 #15 0x7fbf457364bb in __cxa_throw (/lib64/libstdc++.so.6+0xaa4bb) #16 0x7fbf45e10a21 (/lib64/libboost_unit_test_framework.so.1.73.0+0x1aa21) #17 0x7fbf45e20fe0 in boost::execution_monitor::execute(boost::function<int ()> const&) (/lib64/libboost_unit_test_framework.so.1.73.0+0x2afe0) #18 0x7fbf45e21094 in boost::execution_monitor::vexecute(boost::function<void ()> const&) (/lib64/libboost_unit_test_framework.so.1.73.0+0x2b094) #19 0x7fbf45e43921 in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::function<void ()> const&, unsigned long) (/lib64/libboost_unit_test_framework.so.1.73.0+0x4d921) #20 0x7fbf45e5eae1 (/lib64/libboost_unit_test_framework.so.1.73.0+0x68ae1) #21 0x7fbf45e5ed31 (/lib64/libboost_unit_test_framework.so.1.73.0+0x68d31) #22 0x7fbf45e2e547 in boost::unit_test::framework::run(unsigned long, bool) (/lib64/libboost_unit_test_framework.so.1.73.0+0x38547) #23 0x7fbf45e43618 in boost::unit_test::unit_test_main(bool (*)(), int, char**) (/lib64/libboost_unit_test_framework.so.1.73.0+0x4d618) #24 0x44798d in seastar::testing::entry_point(int, char**) ../../src/testing/entry_point.cc:77 #25 0x4134b5 in main ../../include/seastar/testing/seastar_test.hh:65 #26 0x7fbf44a1b1e1 in __libc_start_main (/lib64/libc.so.6+0x281e1) #27 0x4133dd in _start (/home/bhalevy/dev/seastar/build/debug/tests/unit/closeable_test+0x4133dd) ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210406100911.12278-1-bhalevy@scylladb.com>
When `posix_server_socket_impl::accept()` runs it may start a cross-core background fiber that inserts a pending connection into the thread local container posix_ap_server_socket_impl::conn_q. However, the continuation that enqueues the pending connection may not aactually run until after the target core calls abort_accept() (e.g. parallel shutdown via a seastar::sharded<server>::stop). This can leave an entry in the conn_q container that is destroyed when the reactor thread exits. Unfortunately the conn_q container holds conntrack::handle type that schedules additional work in its destructor. ``` class handle { foreign_ptr<lw_shared_ptr<load_balancer>> _lb; ~handle() { (void)smp::submit_to(_host_cpu, [cpu = _target_cpu, lb = std::move(_lb)] { lb->closed_cpu(cpu); }); } ... ``` When this race occurs and the destructor runs the reactor is no longer available, leading to the following memory leak in which the continuation that is scheduled onto the reactor is leaked: Direct leak of 88 byte(s) in 1 object(s) allocated from: #0 0x557c91ca5b7d in operator new(unsigned long) /v/llvm/llvm/src/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 #1 0x557ca3e3cc08 in void seastar::future<void>::schedule<seastar::internal::promise_ba... ... // the unordered map here is conn_q scylladb#19 0x557ca47034d8 in std::__1::unordered_multimap<std::__1::tuple<int, seastar::socket... scylladb#20 0x7f98dcaf238e in __call_tls_dtors (/lib64/libc.so.6+0x4038e) (BuildId: 6e3c087aca9... fixes: scylladb#738 Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
When `posix_server_socket_impl::accept()` runs it may start a cross-core background fiber that inserts a pending connection into the thread local container posix_ap_server_socket_impl::conn_q. However, the continuation that enqueues the pending connection may not aactually run until after the target core calls abort_accept() (e.g. parallel shutdown via a seastar::sharded<server>::stop). This can leave an entry in the conn_q container that is destroyed when the reactor thread exits. Unfortunately the conn_q container holds conntrack::handle type that schedules additional work in its destructor. ``` class handle { foreign_ptr<lw_shared_ptr<load_balancer>> _lb; ~handle() { (void)smp::submit_to(_host_cpu, [cpu = _target_cpu, lb = std::move(_lb)] { lb->closed_cpu(cpu); }); } ... ``` When this race occurs and the destructor runs the reactor is no longer available, leading to the following memory leak in which the continuation that is scheduled onto the reactor is leaked: Direct leak of 88 byte(s) in 1 object(s) allocated from: #0 0x557c91ca5b7d in operator new(unsigned long) /v/llvm/llvm/src/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 #1 0x557ca3e3cc08 in void seastar::future<void>::schedule<seastar::internal::promise_ba... ... // the unordered map here is conn_q scylladb#19 0x557ca47034d8 in std::__1::unordered_multimap<std::__1::tuple<int, seastar::socket... scylladb#20 0x7f98dcaf238e in __call_tls_dtors (/lib64/libc.so.6+0x4038e) (BuildId: 6e3c087aca9... fixes: scylladb#738 Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
When `posix_server_socket_impl::accept()` runs it may start a cross-core background fiber that inserts a pending connection into the thread local container posix_ap_server_socket_impl::conn_q. However, the continuation that enqueues the pending connection may not aactually run until after the target core calls abort_accept() (e.g. parallel shutdown via a seastar::sharded<server>::stop). This can leave an entry in the conn_q container that is destroyed when the reactor thread exits. Unfortunately the conn_q container holds conntrack::handle type that schedules additional work in its destructor. ``` class handle { foreign_ptr<lw_shared_ptr<load_balancer>> _lb; ~handle() { (void)smp::submit_to(_host_cpu, [cpu = _target_cpu, lb = std::move(_lb)] { lb->closed_cpu(cpu); }); } ... ``` When this race occurs and the destructor runs the reactor is no longer available, leading to the following memory leak in which the continuation that is scheduled onto the reactor is leaked: Direct leak of 88 byte(s) in 1 object(s) allocated from: #0 0x557c91ca5b7d in operator new(unsigned long) /v/llvm/llvm/src/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 #1 0x557ca3e3cc08 in void seastar::future<void>::schedule<seastar::internal::promise_ba... ... // the unordered map here is conn_q scylladb#19 0x557ca47034d8 in std::__1::unordered_multimap<std::__1::tuple<int, seastar::socket... scylladb#20 0x7f98dcaf238e in __call_tls_dtors (/lib64/libc.so.6+0x4038e) (BuildId: 6e3c087aca9... fixes: scylladb#738 Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
When running with DHCP on AWS we get a bellow assert after about 30 seconds.
This happens only with SMP configuration and doesn't reproduce in a UP configuration.
master hash is: 24d5c31
The bisect shows that the patch responsible for the breakage is:
To reproduce run:
sudo ./build/release/apps/httpd/httpd --network-stack native --dpdk-pmd -m 512M -c 4
And wait for about 30 seconds.
The text was updated successfully, but these errors were encountered: