-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting an iterator error on db_stress #802
Comments
|
the relevant dump is: based on the database dump, SFP 00000000000010A0 should return 0000000000001098 and now "NNPPPNNPPPP" (N=next P = Prev) should return 0000000000001093 (4 next and 7 prev = 3 prev ). need to check my logic and if items were added after the iterator started. |
Opening the database again pass the verification phase. hence the problem is probably on deciding which entries are part of the iterator. |
@hilikspdb 6ba2441 |
thanks , anyway the second test fail even when it starts on an empty database . |
I have simplified the test: this test pass ok with memtable=skip-list and in 2.6 while crushes in main with memtablerep=hash_spdb ... therefore we left we two options : bug in translation or a change in the behaviour of db_stress |
here is even simpler version: |
This commit : 6213b63 seems to introduce the issue |
this commit looks a force push commit that was done to the plugin which was remove later on. any way it looks that changes were done to my commit which i didnt do and forced push. also there is an issue with db stress and enable speedb features as i mentioned in the past since the shared options doesnt support hardening. currently there is a wrong way how the db stress handles it. i will attach a branch that handles both cases and you can proceed from there |
For 2.7 the error does not appear, so it has been revealed somewhere between 2.7 and main. |
@ofriedma - Are you referring to the rebased branch or to the official 2.7? |
official 2.7, It appears that it happened probably after the rebase, before v2.7 no errors(and the commit was there). |
@ayulas fixs are at branch /ayula/fix_bug |
@erez-speedb |
Here is a quick description of the bug and what can be done:
|
1. I suppose locking the region in db_stress defeats RocsDB PR's purpose of
testing simultaneous reads and writes?
2. Shouldn't db_stress have some mechanism of snapshotting the *expected
data* at the time they are creating the iterator? Or, alternatively, mark
data that is written as you describe above as "may be returned or may not
be returned, both are acceptable".
3. Does our hash_spdb support tailing iteration?
…On Thu, Jan 18, 2024 at 11:05 AM hilikspdb ***@***.***> wrote:
Here is a quick description of the bug and what can be done:
- write starts at time t and finishes at t+n
- iterators starts at time t+k where k < n
- seek is done at time t+x
- db_stress internal locking mechanism prevents the SEEK to start
before the end of the write.
- The skip list iterator uses the mutable copy and the iterator
sequence number (so data that were added to the memtable after the new
iterator is in)
- vector and hash-spdb are using an immutable copy (so data is not
added)
NOTE: this is not a bug in the memtable the definition of what to do
with data in flight is not clear .
There are two options to fix it:
1. fix db_bench so new iterator will also use lock of the region that
it is doing seek on
2. mimic skip-list behaviour add the last vector in the time of seek
and use the sequence number to decide if data is in or not.
—
Reply to this email directly, view it on GitHub
<#802 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZKU2DG7H3LWDLBSI3T3L73YPDQUDAVCNFSM6AAAAABBM5KDQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJYGA3DQNRUGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
the issue is a result of the Refresh iterator that by definition is not suppose to be supported in the immutable iterator. (hash_spdb, vectors memtablerep) |
How to reproduce:
mkdir /tmp/rocksdb_crashtest_expected_b1cqsg8_
run this command:
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --allow_wbm_stalls=0 --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=2 --block_size=16384 --bloom_bits=5.680994457101304 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_ttl=0 --compare_full_db_state_snapshot=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=lz4 --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --crash_test=1 --customopspercent=0 --data_block_index_type=0 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --db_write_buffer_size=0 --delpercent=17 --delrangepercent=5 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_speedb_features=1 --enable_thread_tracking=1 --expected_values_dir=/tmp/rocksdb_crashtest_expected_b1cqsg8_ --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=none --flush_one_in=1000000 --format_version=6 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=5 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=0 --initiate_wbm_flushes=1 --iterpercent=17 --key_len_percent_dist=100 --kill_random_test=888887 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=1000000 --long_running_snapshots=0 --manual_wal_flush_one_in=1000000 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_background_jobs=4 --max_bytes_for_level_base=67108864 --max_key=102400 --max_key_len=1 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=1048576 --memtable_max_range_deletions=1000 --memtable_prefix_bloom_size_ratio=0.1 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=False --nooverwritepercent=5 --num_file_reads_for_auto_readahead=0 --num_iterations=60 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --pinning_policy=scoped --prefix_size=1 --prefixpercent=6 --prepopulate_block_cache=0 --preserve_internal_time_seconds=36000 --progress_reports=0 --read_fault_one_in=32 --readahead_size=524288 --readpercent=20 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=999 --secondary_cache_fault_one_in=32 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --seed=2468986424 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --start_delay_percent=70 --stats_dump_period_sec=10 --subcompactions=2 --sync=0 --sync_fault_injection=0 --sync_wal_one_in=100000 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --total_ram_size=1073741824 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_dynamic_delay=1 --use_full_merge_v1=False --use_get_entity=0 --use_merge=1 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=False --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35
run this command:
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_wbm_stalls=0 --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=8 --block_protection_bytes_per_key=1 --block_size=16384 --bloom_bits=2 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_size=8388608 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=0 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_ttl=0 --compare_full_db_state_snapshot=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=2097151 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --crash_test=1 --customopspercent=0 --data_block_index_type=0 --db=/tmp/rocksdb_crashtest_whiteboxxvsyhqno --db_write_buffer_size=1048576 --delpercent=28 --delrangepercent=4 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_speedb_features=1 --enable_thread_tracking=0 --expected_values_dir=/tmp/rocksdb_crashtest_expected_b1cqsg8_ --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=3 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --initiate_wbm_flushes=0 --iterpercent=3 --key_len_percent_dist=100 --kill_exclude_prefixes=WritableFileWriter::Append,WritableFileWriter::WriteBuffered --kill_random_test=88889 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=1000000 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_background_jobs=8 --max_bytes_for_level_base=67108864 --max_key=102400 --max_key_len=1 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=1048576 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=1 --memtablerep=skip_list --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=5 --num_file_reads_for_auto_readahead=2 --num_iterations=19 --open_files=500000 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --pinning_policy=default --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=32 --readahead_size=524288 --readpercent=30 --recycle_log_file_num=0 --reopen=20 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --seed=2468986424 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --start_delay_percent=0 --stats_dump_period_sec=600 --subcompactions=3 --sync=1 --sync_fault_injection=0 --sync_wal_one_in=100000 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=1 --total_ram_size=536870912 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_dynamic_delay=1 --use_full_merge_v1=False --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=False --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --writepercent=35
You should get the error similar to this:
2023-12-28T14:37:27.4899757Z Expected state has key 0000000000005982, iterator is at key 0000000000005966
2023-12-28T14:37:27.4901103Z Column family: default, op_logs: S 000000000000597D N SFP 000000000000598F P S 0000000000005986 NPNPP
2023-12-28T14:37:27.4902028Z Verification failed :(
The text was updated successfully, but these errors were encountered: