Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

White box failing due to FileExistsError #159

Closed
assaf-speedb opened this issue Sep 15, 2022 · 1 comment · Fixed by #168
Closed

White box failing due to FileExistsError #159

assaf-speedb opened this issue Sep 15, 2022 · 1 comment · Fixed by #168
Assignees
Labels
Upstreamable can be upstreamed to RocksDB
Milestone

Comments

@assaf-speedb
Copy link
Contributor

assaf-speedb commented Sep 15, 2022

The white box crash test is failing in several branches.
According to @isaac-io - it is related to a debug code we added.
The command I run:

make clean; time make db_stress -j16; time make whitebox_crash_test

The crash command that fails (example):

[2022-09-14 12:20:39.838150] Running db_stress with pid=12552: ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=1 --async_io=0 --avoid_flush_during_recovery=1 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=8 --block_size=4096 --bloom_bits=28.895634897464433 --bottommost_compression_type=disable --cache_index_and_filter_blocks=1 --cache_size=8388608 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 --compare_full_db_state_snapshot=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=xpress --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --customopspercent=0 --data_block_hash_table_util_ratio=0.16 --data_block_index_type=0 --db=/dev/shm/rocksdb.wKWN/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --delpercent=20 --delrangepercent=0 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --expected_values_dir=/dev/shm/rocksdb.wKWN/rocksdb_crashtest_expected --fail_if_options_file_error=1 --file_checksum_impl=xxh64 --flush_one_in=1000000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=6 --index_type=3 --iterpercent=14 --key_len_percent_dist=50,37,9,4 --kill_random_test=888887 --level_compaction_dynamic_level_bytes=True --long_running_snapshots=1 --mark_for_compaction_one_file_in=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=10485760 --max_key_len=4 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.5 --memtable_whole_key_filtering=0 --memtablerep=speedb.HashSpdRepFactory --mmap_read=1 --mock_direct_io=False --nooverwritepercent=5 --num_iterations=0 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=7 --prefixpercent=9 --prepopulate_block_cache=1 --progress_reports=0 --read_fault_one_in=0 --readpercent=21 --recycle_log_file_num=1 --reopen=20 --reserve_table_reader_memory=1 --ribbon_starting_level=8 --secondary_cache_fault_one_in=32 --seed=673752403 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --subcompactions=3 --sync=False --sync_fault_injection=False --sync_wal_one_in=100000 --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=1 --top_level_index_pinning=1 --unpartitioned_pinning=1 --use_block_based_filter=0 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=False --use_merge=0 --use_multiget=1 --user_timestamp_size=0 --value_size_mult=32 --verify_before_write=False --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --wal_compression=zstd --write_buffer_size=1048576 --write_dbid_to_manifest=1 --writepercent=36

The error message:

Traceback (most recent call last):
  File "tools/db_crashtest.py", line 1110, in <module>
    main()
  File "tools/db_crashtest.py", line 1099, in main
    whitebox_crash_main(args, unknown_args)
  File "tools/db_crashtest.py", line 990, in whitebox_crash_main
    copy_tree_and_remove_old(counter, dbname)
  File "tools/db_crashtest.py", line 778, in copy_tree_and_remove_old
    shutil.copytree(dbname, dest)
  File "/usr/lib/python3.8/shutil.py", line 557, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
  File "/usr/lib/python3.8/shutil.py", line 458, in _copytree
    os.makedirs(dst, exist_ok=dirs_exist_ok)
  File "/usr/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/dev/shm/rocksdb.wKWN/rocksdb_crashtest_whitebox_0'
@isaac-io isaac-io assigned isaac-io and unassigned Yuval-Ariel Sep 19, 2022
isaac-io added a commit that referenced this issue Sep 19, 2022
Without creating a unique DB name for each run, we can't run concurrent
crash tests, and the debugging backup logic might break because a previous
run failed and made a backup that it now tries to overwrite.
@isaac-io isaac-io added the Upstreamable can be upstreamed to RocksDB label Sep 19, 2022
isaac-io added a commit that referenced this issue Sep 19, 2022
Without creating a unique DB name for each run, we can't run concurrent
crash tests, and the debugging backup logic might break because a previous
run failed and made a backup that it now tries to overwrite.
isaac-io added a commit that referenced this issue Sep 19, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.
isaac-io added a commit that referenced this issue Sep 19, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.
isaac-io added a commit that referenced this issue Sep 19, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
isaac-io added a commit that referenced this issue Sep 21, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
@isaac-io isaac-io added this to the v2.1.0 milestone Sep 21, 2022
@assaf-speedb
Copy link
Contributor Author

I ran multiple times on various branches and did nor reproduce

@assaf-speedb assaf-speedb removed their assignment Sep 22, 2022
isaac-io added a commit that referenced this issue Sep 22, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
isaac-io added a commit that referenced this issue Sep 22, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
isaac-io added a commit that referenced this issue Oct 19, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
isaac-io added a commit that referenced this issue Oct 19, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
isaac-io added a commit that referenced this issue Oct 19, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
isaac-io added a commit that referenced this issue Oct 19, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
Yuval-Ariel pushed a commit that referenced this issue Nov 23, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
Yuval-Ariel pushed a commit that referenced this issue Nov 25, 2022
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
Yuval-Ariel pushed a commit that referenced this issue Apr 30, 2023
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
udi-speedb pushed a commit that referenced this issue Oct 31, 2023
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
udi-speedb pushed a commit that referenced this issue Dec 3, 2023
The debugging backup logic in whitebox wasn't cleaning up DB copies on
timeout because the cleanup logic was inside the loop. Move it to the end
of the function to ensure that backups are cleaned up on success.

Also make sure to clean up DB backup copies in the narrow test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Upstreamable can be upstreamed to RocksDB
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants