vnode hangs in `{eleveldb,write,3}` #273

systream · 2023-04-12T20:09:33Z

In RIAK 3.0.12, vnodes started to hang in leveldb:write/3 and the process mailbox could grow up to 10K messages.
After a while whole node starts to behave unstable. The queue size of the vnode does not decrease, even without a load.

[{current_function,{eleveldb,write,3}},
 {initial_call,{proc_lib,init_p,5}},
 {status,waiting},
 {message_queue_len,10001},
 {links,[<0.1687.0>,<0.2146.0>,<0.258.0>]},
 {dictionary,[{rand_seed,{#{bits => 58,jump => #Fun<rand.13.8986388>,
                            next => #Fun<rand.10.8986388>,type => exsss,
                            uniform => #Fun<rand.11.8986388>,
                            uniform_n => #Fun<rand.12.8986388>},
                          [112506433278832578|159401657947672872]}},
              {#Ref<0.2265291839.1457520643.98734>,
               {bc_state,"/var/lib/riak/bitcask/11417981541647679048466287755595961091061972992",
                         fresh,undefined,[],2147483648,
                         [{expiry_secs,-1},
                          {io_mode,erlang},
                          {expiry_grace_time,0},
                          {small_file_threshold,10485760},
                          {dead_bytes_threshold,134217728},
                          {frag_threshold,40},
                          {dead_bytes_merge_trigger,536870912},
                          {frag_merge_trigger,60},
                          {max_file_size,2147483648},
                          {open_timeout,4},
                          {data_root,"/var/lib/riak/bitcask"},
                          {sync_strategy,none},
                          {merge_window,always},
                          {max_fold_age,-1},
                          {max_fold_puts,0},
                          {expiry_secs,-1},
                          {require_hint_crc,true},
                          {key_transform,#Fun<riak_kv_bitcask_backend.1.43379991>},
                          {read_write,true}], 
                         #Fun<riak_kv_bitcask_backend.1.43379991>,
                         #Ref<0.2265291839.1457651715.98733>,1,2}},
              {'$initial_call',{riak_core_vnode,init,1}},
              {'$ancestors',[riak_core_vnode_sup,riak_core_sup,<0.254.0>]},
              {bitcask_time_fudge,no_testing},
              {hashtree_tokens,47},
              {#Ref<0.2265291839.1457520644.95496>,
               {bc_state,"/var/lib/riak/bitcask/11417981541647679048466287755595961091061972992",
                         fresh,undefined,[],2147483648,
                         [{expiry_secs,-1},
                          {io_mode,erlang},
                          {expiry_grace_time,0},
                          {small_file_threshold,10485760},
                          {dead_bytes_threshold,134217728},
                          {frag_threshold,40},
                          {dead_bytes_merge_trigger,536870912},
                          {frag_merge_trigger,60},
                          {max_file_size,2147483648},
                          {open_timeout,4},
                          {data_root,"/var/lib/riak/bitcask"},
                          {sync_strategy,none},
                          {merge_window,always},
                          {max_fold_age,-1},
                          {max_fold_puts,0},
                          {expiry_secs,-1},
                          {require_hint_crc,true},
                          {key_transform,#Fun<riak_kv_bitcask_backend.1.43379991>},
                          {read_write,true}],
                         #Fun<riak_kv_bitcask_backend.1.43379991>,
                         #Ref<0.2265291839.1457651719.94834>,1,2}},
              {bitcask_file_mod,bitcask_file}]},
 {trap_exit,true},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.253.0>},
 {total_heap_size,1542816},
 {heap_size,10958},
 {stack_size,70},
 {reductions,29125337},
 {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                      {min_bin_vheap_size,46422},
                      {min_heap_size,233},
                      {fullsweep_after,0},
                      {minor_gcs,0}]},
 {suspending,[]}]

{backtrace,<<"Program counter: 0x00007f9e9f962d90 (eleveldb:write/3 + 120)\nCP: 0x0000000000000000 (invalid)\n\n0x00007f9d1db7de20 Return addr 0x00007fa3f6e51e28 (riak_kv_eleveldb_backend:do_put/6 + 608)\ny(0)     []\ny(1)     []\ny(2)     #Ref<0.2265291839.1457782787.52859>\n\n0x00007f9d1db7de40 Return addr 0x00007fa3f719fa00 (riak_kv_multi_backend:put/5 + 200)\ny(0)     [{sync,false}]\ny(1)     {state,#Ref<0.2265291839.1457651712.94836>,\"/var/lib/riak/leveldb/11417981541647679048466287755595961091061972992\",[{block_cache_threshold,33554432},{block_restart_interval,16},{block_size_steps,16},{cache_object_warming,true},{compression,snappy},{create_if_missing,true},{delete_threshold,1000},{eleveldb_threads,71},{expiry_enabled,false},{expiry_minutes,0},{fadvise_willneed,false},{limited_developer_mem,false},{sst_block_size,4096},{tiered_slow_level,0},{total_leveldb_mem_percent,35},{use_bloomfilter,true},{whole_file_expiry,true},{write_buffer_size,16662690}],[{block_cache_threshold,33554432},{block_restart_interval,16},{block_size_steps,16},{cache_object_warming,true},{compression,snappy},{create_if_missing,true},{data_root,\"/var/lib/riak/leveldb\"},{delete_threshold,1000},{eleveldb_threads,71},{expiry_enabled,false},{expiry_minutes,0},{fadvise_willneed,false},{limited_developer_mem,false},{sst_block_size,4096},{sync,false},{tiered_slow_level,0},{total_leveldb_mem_percent,35},{use_bloomfilter,true},{verify_checksums,true},{verify_compaction,true},{whole_file_expiry,true},{write_buffer_size,16662690},{write_buffer_size_max,31457280},{write_buffer_size_min,15728640}],[{verify_checksums,true}],[{sync,false}],[{verify_checksums,true},{fill_cache,false}],false,false}\ny(2)     []\ny(3)     []\ny(4)     []\ny(5)     []\ny(6)     []\n\n0x00007f9d1db7de80 Return addr 0x00007f9e6c21e490 (riak_kv_vnode:encode_and_put_no_sib_check/8 + 2104)\ny(0)     riak_kv_eleveldb_backend\ny(1)     {<<\"leveldb_mult\">>,riak_kv_eleveldb_backend,{state,#Ref<0.2265291839.1457651712.94836>,\"/var/lib/riak/leveldb/11417981541647679048466287755595961091061972992\",[{block_cache_threshold,33554432},{block_restart_interval,16},{block_size_steps,16},{cache_object_warming,true},{compression,snappy},{create_if_missing,true},{delete_threshold,1000},{eleveldb_threads,71},{expiry_enabled,false},{expiry_minutes,0},{fadvise_willneed,false},{limited_developer_mem,false},{sst_block_size,4096},{tiered_slow_level,0},{total_leveldb_mem_percent,35},{use_bloomfilter,true},{whole_file_expiry,true},{write_buffer_size,16662690}],[{block_cache_threshold,33554432},{block_restart_interval,16},{block_size_steps,16},{cache_object_warming,true},{compression,snappy},{create_if_missing,true},{data_root,\"/var/lib/riak/leveldb\"},{delete_threshold,1000},{eleveldb_threads,71},{expiry_enabled,false},{expiry_minutes,0},{fadvise_willneed,false},{limited_developer_mem,false},{sst_block_size,4096},{sync,false},{tiered_slow_level,0},{total_leveldb_mem_percent,35},{use_bloomfilter,true},{verify_checksums,true},{verify_compaction,true},{whole_file_expiry,true},{write_buffer_size,16662690},{write_buffer_size_max,31457280},{write_buffer_size_min,15728640}],[{verify_checksums,true}],[{sync,false}],[{verify_checksums,true},{fill_cache,false}],false,false}}\ny(2)     {state,[{<<\"leveldb\">>,riak_kv_bitcask_backend,{state,#Ref<0.2265291839.1457520643.98734>,\"11417981541647679048466287755595961091061972992\",[{io_mode,erlang},{expiry_grace_time,0},{small_file_threshold,10485760},{dead_bytes_threshold,134217728},{frag_threshold,40},{dead_bytes_merge_trigger,536870912},{frag_merge_trigger,60},{max_file_size,2147483648},{open_timeout,4},{data_root,\"/var/lib/riak/bitcask\"},{sync_strategy,none},{merge_window,always},{max_fold_age,-1},{max_fold_puts,0},{expiry_secs,-1},{require_hint_crc,true},{key_transform,#Fun<riak_kv_bitcask_backend.1.43379991>},{read_write,true}],11417981541647679048466287755595961091061972992,\"/var/lib/riak/bitcask\",1}},{<<\"memory_mult\">>,riak_kv_memory_backend,{state,#Ref<0.2265291839.1457651713.96426>,#Ref<0.2265291839.1457651713.96425>,#Ref<0.2265291839.1457651713.96424>,134217728,0,0,129600}},{<<\"leveldb_mult\">>,riak_kv_eleveldb_backend,{state,#Ref<0.2265291839.1457651712.94836>,\"/var/lib/riak/leveldb/11417981541647679048466287755595961091061972992\",[{block_cache_threshold,33554432},{block_restart_interval,16},{block_size_steps,16},{cache_object_warming,true},{compression,snappy},{create_if_missing,true},{delete_threshold,1000},{eleveldb_threads,71},{expiry_enabled,false},{expiry_minutes,0},{fadvise_willneed,false},{limited_developer_mem,false},{sst_block_size,4096},{tiered_slow_level,0},{total_leveldb_mem_percent,35},{use_bloomfilter,true},{whole_file_expiry,true},{write_buffer_size,16662690}],[{block_cache_threshold,33554432},{block_restart_interval,16},{block_size_steps,16},{cache_object_warming,true},{compression,snappy},{create_if_missing,true},{data_root,\"/var/lib/riak/leveldb\"},{delete_threshold,1000},{eleveldb_threads,71},{expiry_enabled,false},{expiry_minutes,0},{fadvise_willneed,false},{limited_developer_mem,false},{sst_block_size,4096},{sync,false},{tiered_slow_level,0},{total_leveldb_mem_percent,35},{use_bloomfilter,true},{verify_checksums,true},{verify_compaction,true},{whole_file_expiry,true},{write_buffer_size,16662690},{write_buffer_size_max,31457280},{write_buffer_size_min,15728640}],[{verify_checksums,true}],[{sync,false}],[{verify_checksums,true},{fill_cache,false}],false,false}},{<<\"bitcask_mult\">>,riak_kv_bitcask_backend,{state,#Ref<0.2265291839.1457520644.95496>,\"11417981541647679048466287755595961091061972992\",[{io_mode,erlang},{expiry_grace_time,0},{small_file_threshold,10485760},{dead_bytes_threshold,134217728},{frag_threshold,40},{dead_bytes_merge_trigger,536870912},{frag_merge_trigger,60},{max_file_size,2147483648},{open_timeout,4},{data_root,\"/var/lib/riak/bitcask\"},{sync_strategy,none},{merge_window,always},{max_fold_age,-1},{max_fold_puts,0},{expiry_secs,-1},{require_hint_crc,true},{key_transform,#Fun<riak_kv_bitcask_backend.1.43379991>},{read_write,true}],11417981541647679048466287755595961091061972992,\"/var/lib/riak/bitcask\",1}}],<<\"bitcask_mult\">>}\ny(3)     []\ny(4)     []\ny(5)     []\ny(6)     []\n\n0x00007f9d1db7dec0 Return addr 0x00007f9e6c217110 (riak_kv_vnode:actual_put/8 + 264)\ny(0)

Another interesting thing that i noticed, that in the vnode state the compression is set to snappy, but in the config it is defined to lz4.

 {<<"leveldb_mult">>,riak_kv_eleveldb_backend,
                               {state,#Ref<0.1786542404.3883008000.67404>,
                                      "/var/lib/riak/leveldb/451010270895083322414418366346040463096947933184",
                                      [{block_cache_threshold,33554432},
                                       {block_restart_interval,16},
                                       {block_size_steps,16},
                                       {cache_object_warming,true},
                                       {compression,snappy},
                                       {create_if_missing,true},
                                       {delete_threshold,1000},
                                       {eleveldb_threads,71},
                                       {expiry_enabled,false},
                                       {expiry_minutes,0},
                                       {fadvise_willneed,false},
                                       {limited_developer_mem,false},
                                       {sst_block_size,4096},
                                       {tiered_slow_level,0},
                                       {total_leveldb_mem_percent,35},
                                       {use_bloomfilter,true},
                                       {whole_file_expiry,true},
                                       {write_buffer_size,22791803}],
                                      [{block_cache_threshold,33554432},
                                       {block_restart_interval,16},
                                       {block_size_steps,16},
                                       {cache_object_warming,true},
                                       {compression,snappy},
                                       {create_if_missing,true},
                                       {data_root,"/var/lib/riak/leveldb"},
                                       {delete_threshold,1000},
                                       {eleveldb_threads,71},
                                       {expiry_enabled,false},
                                       {expiry_minutes,0},
                                       {fadvise_willneed,false},
                                       {limited_developer_mem,false},
                                       {sst_block_size,4096},
                                       {sync,false},
                                       {tiered_slow_level,0},
                                       {total_leveldb_mem_percent,35},
                                       {use_bloomfilter,true},
                                       {verify_checksums,true},
                                       {verify_compaction,true},
                                       {whole_file_expiry,true},
                                       {write_buffer_size,22791803},
                                       {write_buffer_size_max,31457280},
                                       {write_buffer_size_min,15728640}],
                                      [{verify_checksums,true}],
                                      [{sync,false}],
                                      [{verify_checksums,true},{fill_cache,false}],
                                      true,false}},

It's not fully confirmed, but t seems that in 3.0.9 doesn't have the issue.
Are there any incompatibility between snappy versions?

The text was updated successfully, but these errors were encountered:

systream · 2023-04-12T20:24:03Z

I forgot to mention that we did "in place" upgrade. Data files kept, but riak upgraded to 3.0.12.
It is a test environment, but we plan to upgrade in prod too.

martinsumner · 2023-04-12T20:33:57Z

So there was an update done to snappy as part of 3.0.12 - https://github.com/basho/eleveldb/releases/tag/riak_kv-3.0.12. The upgrade was from 1.0.4 to 1.1.9 - 6ef9202.

There should be LOG files within the leveldb backend folder for the problem partition (i.e. "/var/lib/riak/leveldb/11417981541647679048466287755595961091061972992"), which are normally quite verbose - so it would be worth looking through those logs at the time of the issue.

systream · 2023-04-13T07:31:30Z

Thx, it was a good hint.
Previously id did a repair, beaus when vnodes stuck impossible to stop riak gracefully.

It is definitely something with the compression.

I'll keep digging.

2023/04/13-08:47:52.157605 7f7d3a7a7700 Level-0 table #25200: started
2023/04/13-08:47:52.157609 7f7d2ef90700 running 2...
2023/04/13-08:47:52.157735 7f7d2ef90700 waiting 2...
2023/04/13-08:47:52.359207 7f7d3a7a7700 Level-0 table #25200: 9598847 bytes, 17832 keys Corruption: corrupted compressed block contents
2023/04/13-08:47:52.359259 7f7d3a7a7700 Waiting after background imm compaction error: Corruption: corrupted compressed block contents
2023/04/13-08:47:52.359275 7f7d2ef90700 running 2...
2023/04/13-08:47:52.359325 7f7d2ef90700 waiting 2..

martinsumner · 2023-04-14T10:31:18Z

I went back to look at the basic upgrade test https://github.com/basho/riak_test/blob/develop-3.0/tests/verify_basic_upgrade.erl - going from 3.0.9 to 3.0.12. This passed with the eleveldb backend, but that is because the default compression algorithm is lz4. When I switched the test to force snappy compression:

2023-04-14 11:05:17 =ERROR REPORT====
** State machine <0.2253.0> terminating 
** Last event in was timeout
** When State == started
**      Data  == {state,730750818665451459101842416358141509827966271488,riak_kv_vnode,undefined,undefined,none,undefined,undefined,undefined,undefined,undefined,0}
** Reason for termination ==
** {function_clause,[{riak_kv_vnode,terminate,[{bad_return_value,{stop,{error,"Corruption: corrupted compressed block contents"}}},undefined] ....

So it does look like there is a fairly clear issue of incompatibility, that we could have picked up in test.

I haven't been able to dig into snappy history in any meaningful way to see if there is an interim version that can safely bridge between.

I'll have a think about workarounds that may help. One way would be to do the upgrade via rolling transfer rather than rolling restart (so you data transfer into the upgraded nodes - rolling one node in/out at a time), but this is a lot more time consuming (and it may in that case be worth waiting for 3.0.16 to get the final improvements to transfers).

martinsumner · 2023-04-14T11:45:48Z

Also with regards to the configuration of lz4. Looking at the cuttlefish logic for compression in the eleveldb schema there is a translation operation:

https://github.com/basho/eleveldb/blob/develop/priv/eleveldb.schema#L174-L183

However, this translation operation isn't present in the eleveldb_multi.schema:

https://github.com/basho/eleveldb/blob/develop/priv/eleveldb_multi.schema#L115-L129

This might mean that the compression_algorithm is ignored in multi-backends, as it doesn't get translated into the compression setting - and hence it gets defaulted to snappy.

systream · 2023-04-18T08:08:02Z

Thanks.
In our case, the rolling transfer can take weeks. :/
Is there a tweak where I can simply swap 2 nodes?
I mean +1 node, -1 node, so I would expect all partitions from nodeA to move to nodeB.

martinsumner · 2023-04-18T09:14:33Z

Using https://docs.riak.com/riak/kv/latest/using/admin/commands/index.html#replace - is what I meant by rolling transfer. This allows you to setup a new node with the new code, and then using cluster replace to replace an existing node, then update that node, and replace another node etc.

systream · 2023-04-21T07:44:52Z

Thx.

systream closed this as completed Apr 21, 2023

benjoe87 mentioned this issue Jul 7, 2023

Snappy support is broken #275

Open

This was referenced Aug 10, 2023

Mas i275 upgradesupport basho/riak_test#1382

Open

Fix broken snappy support in leveldb #276

Merged

martinsumner mentioned this issue Sep 6, 2023

Nhse d30 eldb275 OpenRiak/eleveldb-forked#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vnode hangs in `{eleveldb,write,3}` #273

vnode hangs in `{eleveldb,write,3}` #273

systream commented Apr 12, 2023 •

edited

Loading

systream commented Apr 12, 2023

martinsumner commented Apr 12, 2023 •

edited

Loading

systream commented Apr 13, 2023

martinsumner commented Apr 14, 2023

martinsumner commented Apr 14, 2023

systream commented Apr 18, 2023

martinsumner commented Apr 18, 2023

systream commented Apr 21, 2023

vnode hangs in {eleveldb,write,3} #273

vnode hangs in {eleveldb,write,3} #273

Comments

systream commented Apr 12, 2023 • edited Loading

systream commented Apr 12, 2023

martinsumner commented Apr 12, 2023 • edited Loading

systream commented Apr 13, 2023

martinsumner commented Apr 14, 2023

martinsumner commented Apr 14, 2023

systream commented Apr 18, 2023

martinsumner commented Apr 18, 2023

systream commented Apr 21, 2023

vnode hangs in `{eleveldb,write,3}` #273

vnode hangs in `{eleveldb,write,3}` #273

systream commented Apr 12, 2023 •

edited

Loading

martinsumner commented Apr 12, 2023 •

edited

Loading