Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting an iterator error on db_stress #802

Closed
ofriedma opened this issue Jan 4, 2024 · 20 comments · Fixed by #811
Closed

Getting an iterator error on db_stress #802

ofriedma opened this issue Jan 4, 2024 · 20 comments · Fixed by #811
Assignees
Labels
bug Something isn't working tests

Comments

@ofriedma
Copy link
Contributor

ofriedma commented Jan 4, 2024

How to reproduce:

mkdir /tmp/rocksdb_crashtest_expected_b1cqsg8_
run this command:
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --allow_wbm_stalls=0 --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=2 --block_size=16384 --bloom_bits=5.680994457101304 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_ttl=0 --compare_full_db_state_snapshot=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=lz4 --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --crash_test=1 --customopspercent=0 --data_block_index_type=0 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --db_write_buffer_size=0 --delpercent=17 --delrangepercent=5 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_speedb_features=1 --enable_thread_tracking=1 --expected_values_dir=/tmp/rocksdb_crashtest_expected_b1cqsg8_ --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=none --flush_one_in=1000000 --format_version=6 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=5 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=0 --initiate_wbm_flushes=1 --iterpercent=17 --key_len_percent_dist=100 --kill_random_test=888887 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=1000000 --long_running_snapshots=0 --manual_wal_flush_one_in=1000000 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_background_jobs=4 --max_bytes_for_level_base=67108864 --max_key=102400 --max_key_len=1 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=1048576 --memtable_max_range_deletions=1000 --memtable_prefix_bloom_size_ratio=0.1 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=False --nooverwritepercent=5 --num_file_reads_for_auto_readahead=0 --num_iterations=60 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --pinning_policy=scoped --prefix_size=1 --prefixpercent=6 --prepopulate_block_cache=0 --preserve_internal_time_seconds=36000 --progress_reports=0 --read_fault_one_in=32 --readahead_size=524288 --readpercent=20 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=999 --secondary_cache_fault_one_in=32 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --seed=2468986424 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --start_delay_percent=70 --stats_dump_period_sec=10 --subcompactions=2 --sync=0 --sync_fault_injection=0 --sync_wal_one_in=100000 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --total_ram_size=1073741824 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_dynamic_delay=1 --use_full_merge_v1=False --use_get_entity=0 --use_merge=1 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=False --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35

run this command:

./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_wbm_stalls=0 --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=8 --block_protection_bytes_per_key=1 --block_size=16384 --bloom_bits=2 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_size=8388608 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=0 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_ttl=0 --compare_full_db_state_snapshot=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=2097151 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --crash_test=1 --customopspercent=0 --data_block_index_type=0 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --db_write_buffer_size=1048576 --delpercent=28 --delrangepercent=4 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_speedb_features=1 --enable_thread_tracking=0 --expected_values_dir=/tmp/rocksdb_crashtest_expected_b1cqsg8_ --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=3 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --initiate_wbm_flushes=0 --iterpercent=3 --key_len_percent_dist=100 --kill_exclude_prefixes=WritableFileWriter::Append,WritableFileWriter::WriteBuffered --kill_random_test=88889 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=1000000 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_background_jobs=8 --max_bytes_for_level_base=67108864 --max_key=102400 --max_key_len=1 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=1048576 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=5 --num_file_reads_for_auto_readahead=2 --num_iterations=19 --open_files=500000 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --pinning_policy=default --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=32 --readahead_size=524288 --readpercent=30 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --seed=2468986424 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --start_delay_percent=0 --stats_dump_period_sec=600 --subcompactions=3 --sync=1 --sync_fault_injection=0 --sync_wal_one_in=100000 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=1 --total_ram_size=536870912 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_dynamic_delay=1 --use_full_merge_v1=False --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=False --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35

You should get the error similar to this:

2023-12-28T14:37:27.4899757Z Expected state has key 0000000000005982, iterator is at key 0000000000005966
2023-12-28T14:37:27.4901103Z Column family: default, op_logs: S 000000000000597D N SFP 000000000000598F P S 0000000000005986 NPNPP
2023-12-28T14:37:27.4902028Z Verification failed :(

@ofriedma ofriedma added bug Something isn't working tests labels Jan 4, 2024
@ofriedma ofriedma self-assigned this Jan 4, 2024
@hilikspdb
Copy link
Contributor

hilikspdb commented Jan 5, 2024

  1. run with the flag kill_random_test=0 this will prevent db_stress from killing the test and it will failed constantly
  2. i used ./ldb scan --from="0x0000000000001080" --to="0x0000000000001100" --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno/ --hex --no_value to get a full list of all the keys in the range ... the expected key is in the database

@hilikspdb
Copy link
Contributor

the relevant dump is:
"0x0000000000001089
0x000000000000108A
0x000000000000108B
0x000000000000108C
0x000000000000108D
0x000000000000108F
0x0000000000001092
0x0000000000001093
0x0000000000001094
0x0000000000001097
0x0000000000001098
0x00000000000010A3
0x00000000000010A4
0x00000000000010A6"

based on the database dump, SFP 00000000000010A0 should return 0000000000001098 and now "NNPPPNNPPPP" (N=next P = Prev) should return 0000000000001093 (4 next and 7 prev = 3 prev ). need to check my logic and if items were added after the iterator started.
@Yuval-Ariel where is the patch that enable output of a trace file ...

@hilikspdb
Copy link
Contributor

Opening the database again pass the verification phase. hence the problem is probably on deciding which entries are part of the iterator.

@Yuval-Ariel
Copy link
Contributor

@hilikspdb 6ba2441
its on by default and the trace files are stored in the expected_values_dir

@hilikspdb
Copy link
Contributor

thanks , anyway the second test fail even when it starts on an empty database .

@hilikspdb
Copy link
Contributor

hilikspdb commented Jan 7, 2024

I have simplified the test:
rm -rf /tmp/rocksdb*/*; ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --expected_values_dir=/tmp/rocksdb_crashtest_expected_b1cqsg8_ --destroy_db_initially=1 --key_len_percent_dist=100 --kill_random_test=0 --max_key=102400 --max_key_len=1 --memtablerep=skip_list --seed=2468986424 --verify_iterator_with_expected_state_one_in=5 --write_buffer_size=33554432 --iterpercent=3 --readpercent=30 --writepercent=35 --delpercent=28 --delrangepercent=4 --prefixpercent=0

this test pass ok with memtable=skip-list and in 2.6 while crushes in main with memtablerep=hash_spdb ... therefore we left we two options : bug in translation or a change in the behaviour of db_stress

@hilikspdb
Copy link
Contributor

here is even simpler version:
rm -rf /tmp/rocksdb*/*; ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --expected_values_dir=/tmp/rocksdb_crashtest_expected_b1cqsg8_ --destroy_db_initially=1 --key_len_percent_dist=100 --kill_random_test=0 --max_key=102400 --max_key_len=1 --memtablerep=hash_spdb --enable_speedb_features=false --seed=2468986424 --verify_iterator_with_expected_state_one_in=5 --write_buffer_size=33554432 --iterpercent=3 --readpercent=30 --writepercent=67 --delpercent=0 --delrangepercent=0 --prefixpercent=0

@ofriedma
Copy link
Contributor Author

ofriedma commented Jan 7, 2024

This commit : 6213b63 seems to introduce the issue

@udi-speedb
Copy link
Contributor

This commit : 6213b63 seems to introduce the issue

@ofriedma - that is a commit that was made before version 2.3.0?

@ayulas
Copy link
Contributor

ayulas commented Jan 8, 2024

this commit looks a force push commit that was done to the plugin which was remove later on. any way it looks that changes were done to my commit which i didnt do and forced push. also there is an issue with db stress and enable speedb features as i mentioned in the past since the shared options doesnt support hardening. currently there is a wrong way how the db stress handles it. i will attach a branch that handles both cases and you can proceed from there

@ofriedma
Copy link
Contributor Author

ofriedma commented Jan 8, 2024

For 2.7 the error does not appear, so it has been revealed somewhere between 2.7 and main.

@udi-speedb
Copy link
Contributor

@ofriedma - Are you referring to the rebased branch or to the official 2.7?
And please clarify: You have said that you have identified the commit that is causing this, but this commit was made before version 2.3.
I am confused.

@ofriedma
Copy link
Contributor Author

ofriedma commented Jan 8, 2024

official 2.7, It appears that it happened probably after the rebase, before v2.7 no errors(and the commit was there).

@erez-speedb
Copy link

erez-speedb commented Jan 8, 2024

@ayulas fixs are at branch /ayula/fix_bug
@ofriedma @hilikspdb pls review them

@ofriedma
Copy link
Contributor Author

@erez-speedb
This issue is not related to db_stress enable_speedb_features becuase we can reproduce it without this feature entirely.

@hilikspdb
Copy link
Contributor

Here is a quick description of the bug and what can be done:

  • write starts at time t and finishes at t+n
  • iterators starts at time t+k where k < n
  • seek is done at time t+x
  • db_stress internal locking mechanism prevents the SEEK to start before the end of the write.
  • The skip list iterator uses the mutable copy and the iterator sequence number (so data that were added to the memtable after the new iterator is in)
  • vector and hash-spdb are using an immutable copy (so data is not added)
    NOTE: this is not a bug in the memtable the definition of what to do with data in flight is not clear .
    There are two options to fix it:
  1. fix db_bench so new iterator will also use lock of the region that it is doing seek on
  2. mimic skip-list behaviour add the last vector in the time of seek and use the sequence number to decide if data is in or not.

@udi-speedb
Copy link
Contributor

udi-speedb commented Jan 18, 2024 via email

@hilikspdb
Copy link
Contributor

  1. As i said data that is written while the iterator is created should be in "unkown" state. The issue is that the lock is done during the first seek and not in the creation of the iterator (the lock is done on a range of data). this is a conceptual issue with the db_stress and the best place to fix it is there .
  2. Yes, this is the place to fix
  3. As far as i know there is no support for tailing iterators ..

ayulas added a commit that referenced this issue Jan 21, 2024
@ayulas
Copy link
Contributor

ayulas commented Jan 21, 2024

the issue is a result of the Refresh iterator that by definition is not suppose to be supported in the immutable iterator. (hash_spdb, vectors memtablerep)
i added the supoorted flag to the memtablerep and if it is false we avoid doing refresh in no_batched_ops_stress test
@hilikspdb @ofriedma pls review
ayula/802-refresh-iter

@ayulas
Copy link
Contributor

ayulas commented Jan 22, 2024

@ofriedma as we discussed i expose the allow refresh pls review
cc249df

ofriedma pushed a commit that referenced this issue Jan 22, 2024
…lerep should avoid that (#802) (#811)

* Refresh exist iterator isnot supported in hash spdb and vector memtablerep should avoid that (#802)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests
Projects
None yet
6 participants