Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Infinite Loop while Scheduling Transactions #261

Closed
bytemaster opened this issue Aug 29, 2017 · 8 comments
Closed

Infinite Loop while Scheduling Transactions #261

bytemaster opened this issue Aug 29, 2017 · 8 comments
Assignees
Labels
Milestone

Comments

@bytemaster
Copy link
Contributor

image

@bytemaster bytemaster added the bug label Aug 29, 2017
@bytemaster bytemaster added this to the Test Network Release milestone Aug 29, 2017
@bytemaster
Copy link
Contributor Author

bytemaster commented Aug 29, 2017

Process 20537 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x304c37000)
    frame #0: 0x0000000100237bb0 eosd`void std::__1::__sort<eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&, eos::chain::schedule_entry*>(eos::chain::schedule_entry*, eos::chain::schedule_entry*, eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&) + 816
eosd`std::__1::__sort<eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&, eos::chain::schedule_entry*>:
->  0x100237bb0 <+816>: movq   0x10(%rbx), %rdi
    0x100237bb4 <+820>: addq   $0x10, %rbx
    0x100237bb8 <+824>: cmpl   %edx, %edi
    0x100237bba <+826>: jb     0x100237bd0               ; <+848>
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x304c37000)
  * frame #0: 0x0000000100237bb0 eosd`void std::__1::__sort<eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&, eos::chain::schedule_entry*>(eos::chain::schedule_entry*, eos::chain::schedule_entry*, eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&) + 816
    frame #1: 0x00000001002378a5 eosd`void std::__1::__sort<eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&, eos::chain::schedule_entry*>(eos::chain::schedule_entry*, eos::chain::schedule_entry*, eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&) + 37
    frame #2: 0x00000001002378a5 eosd`void std::__1::__sort<eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&, eos::chain::schedule_entry*>(eos::chain::schedule_entry*, eos::chain::schedule_entry*, eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&) + 37
    frame #3: 0x00000001002378a5 eosd`void std::__1::__sort<eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&, eos::chain::schedule_entry*>(eos::chain::schedule_entry*, eos::chain::schedule_entry*, eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&)::$_0&) + 37
    frame #4: 0x00000001002365e3 eosd`eos::chain::from_entries(std::__1::vector<eos::chain::schedule_entry, std::__1::allocator<eos::chain::schedule_entry> >&) + 35
    frame #5: 0x0000000100235fc9 eosd`eos::chain::block_schedule::by_threading_conflicts(std::__1::vector<fc::static_variant<std::__1::reference_wrapper<eos::chain::SignedTransaction const>, std::__1::reference_wrapper<eos::chain::GeneratedTransaction const> >, std::__1::allocator<fc::static_variant<std::__1::reference_wrapper<eos::chain::SignedTransaction const>, std::__1::reference_wrapper<eos::chain::GeneratedTransaction const> > > > const&, eos::chain::global_property_object const&) + 2009
    frame #6: 0x000000010015dd31 eosd`eos::chain::chain_controller::_generate_block(fc::time_point_sec, eos::types::Name const&, fc::ecc::private_key const&, eos::chain::block_schedule (*)(std::__1::vector<fc::static_variant<std::__1::reference_wrapper<eos::chain::SignedTransaction const>, std::__1::reference_wrapper<eos::chain::GeneratedTransaction const> >, std::__1::allocator<fc::static_variant<std::__1::reference_wrapper<eos::chain::SignedTransaction const>, std::__1::reference_wrapper<eos::chain::GeneratedTransaction const> > > > const&, eos::chain::global_property_object const&)) + 1313
    frame #7: 0x000000010015cd1e eosd`eos::chain::chain_controller::generate_block(fc::time_point_sec, eos::types::Name const&, fc::ecc::private_key const&, eos::chain::block_schedule (*)(std::__1::vector<fc::static_variant<std::__1::reference_wrapper<eos::chain::SignedTransaction const>, std::__1::reference_wrapper<eos::chain::GeneratedTransaction const> >, std::__1::allocator<fc::static_variant<std::__1::reference_wrapper<eos::chain::SignedTransaction const>, std::__1::reference_wrapper<eos::chain::GeneratedTransaction const> > > > const&, eos::chain::global_property_object const&), unsigned int) + 542
    frame #8: 0x000000010008826b eosd`eos::producer_plugin_impl::maybe_produce_block(fc::mutable_variant_object&) + 1179
    frame #9: 0x00000001000863de eosd`eos::producer_plugin_impl::block_production_loop() + 46
    frame #10: 0x000000010008e1c4 eosd`boost::asio::detail::wait_handler<boost::_bi::bind_t<eos::block_production_condition::block_production_condition_enum, boost::_mfi::mf0<eos::block_production_condition::block_production_condition_enum, eos::producer_plugin_impl>, boost::_bi::list1<boost::_bi::value<eos::producer_plugin_impl*> > > >::do_complete(boost::asio::detail::task_io_service*, boost::asio::detail::task_io_service_operation*, boost::system::error_code const&, unsigned long) + 100
    frame #11: 0x000000010010baf7 eosd`boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) + 679
    frame #12: 0x000000010010b605 eosd`boost::asio::detail::task_io_service::run(boost::system::error_code&) + 165
    frame #13: 0x000000010000b123 eosd`appbase::application::exec() + 723
    frame #14: 0x0000000100001fa9 eosd`main + 169
    frame #15: 0x00007fffb2c96235 libdyld.dylib`start + 1
(lldb)

@bytemaster
Copy link
Contributor Author

This is reproducible on the https://github.com/EOSIO/eos/tree/benchmark branch with the following steps:

./eosd -d data/ --resync --skip-transaction-signatures
./eosc benchmark setup 100
./eosc benchmark transfer 100 100 1

@wanderingbort
Copy link
Contributor

I'm having trouble repro'ing locally. Time to compare the details:
I am using the benchmark branch @ 3f426bc
building with clang 4.0:

clang version 4.0.0-1ubuntu1~16.04.2 (tags/RELEASE_400/rc1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/5
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/6.0.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.4.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.0.0
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0
Candidate multilib: .;@m64
Selected multilib: .;@m64

building with:

$ cmake -G Ninja -DCMAKE_CXX_COMPILER=clang++-4.0 -DCMAKE_C_COMPILER=clang-4.0
$ ninja -j 4

then using your steps above I see no crashes/freezes.

I do get a lot of insufficient funds messages when I start up the client. Is that a clue?

@wanderingbort
Copy link
Contributor

Blocked on more details

@bytemaster bytemaster modified the milestones: EOS Beta, Test Network Release Sep 1, 2017
@bytemaster
Copy link
Contributor Author

Moving the milestone for fixing this back now that single-threaded scheduling works without issue.

To facilitate testing, can we make the scheduling algorithm configurable by command line?

@heifner
Copy link
Contributor

heifner commented Sep 22, 2017

Picking this up today to see if I can reproduce.

@wanderingbort wanderingbort removed their assignment Sep 22, 2017
@heifner
Copy link
Contributor

heifner commented Sep 22, 2017

  • I'm able to reproduce this consistently with master and clang on linux.
  • We are getting a bad address on sorting a vector.
    • Adding some additional debug to attempt to pinpoint memory corruption.
  • I am able to get a core dump on every run rather quickly.

@heifner
Copy link
Contributor

heifner commented Sep 22, 2017

  • This appears to be an issue with using a comparator to std::sort that does not satisfy strict weak ordering.
  • Modifying the code to use std::tie so that strict weak ordering is honored.
  • Searched through the code and didn't find any other uses of std::sort that looked incorrect.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants