-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support concurrent CF iteration and drop #6147
Conversation
It's easy to cause coredump when closing ColumnFamilyHandle with unreleased iterators, especially iterators release is controlled by java GC when using JNI. This patch fixed concurrent CF iteration and drop, we let iterators(actually SuperVersion) hold a ColumnFamilyData reference to prevent the CF from being released too early. fixed facebook#5982
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
I took a quick look and it seems to be pretty good to me. I'll take a detailed look and maybe run stress test for some while. If you have time, I suggest you run our crash test. If you have time, I suggest you run it too. It's simply "make crash_test -j" and "make crash_test_with_atomic_flush -j". They will run ours.
@siying Thanks for your review. I tried $ brew install gflags
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 2 taps (homebrew/cask and homebrew/core).
==> New Formulae
fastlane
Warning: gflags 2.2.2 is already installed and up-to-date
To reinstall 2.2.2, run `brew reinstall gflags`
$ make crash_test -j 2
$DEBUG_LEVEL is 1
Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production
GEN util/build_version.cc
python -u tools/db_crashtest.py --simple whitebox --random_kill_odd \
888887
python -u tools/db_crashtest.py --simple blackbox
Running blackbox-crash-test with
interval_between_crash=120
total-duration=6000
Running whitebox-crash-test with
total-duration=10000
Running:./db_stress --periodic_compaction_seconds=0 --max_background_compactions=1 --expected_values_path=/var/folders/fl/gc6hkww94db7sfk79_lnv9xjtdxht2/T/tmpog7U9u --format_version=4 --partition_filters=1 --max_write_buffer_number=3 --cache_index_and_filter_blocks=1 --reopen=20 --acquire_snapshot_one_in=10000 --delpercent=4 --log2_keys_per_lock=10 --compaction_ttl=0 --block_size=16384 --use_multiget=1 --allow_concurrent_memtable_write=0 --compact_files_one_in=1000000 --target_file_size_multiplier=1 --clear_column_family_one_in=0 --max_bytes_for_level_base=67108864 --use_full_merge_v1=1 --target_file_size_base=16777216 --checkpoint_one_in=1000000 --mmap_read=0 --compression_type=snappy --writepercent=35 --index_type=2 --readpercent=50 --pause_background_one_in=1000000 --subcompactions=1 --memtablerep=skip_list --prefix_size=-1 --test_batches_snapshots=0 --column_families=1 --db=/var/folders/fl/gc6hkww94db7sfk79_lnv9xjtdxht2/T/rocksdb_crashtest_whitebox5PPfwJ --use_direct_reads=0 --use_block_based_filter=0 --max_manifest_file_size=1073741824 --delrangepercent=1 --compact_range_one_in=1000000 --open_files=500000 --destroy_db_initially=0 --progress_reports=0 --compression_zstd_max_train_bytes=0 --snapshot_hold_ops=100000 --enable_pipelined_write=1 --nooverwritepercent=1 --compression_max_dict_bytes=0 --max_key=100000000 --prefixpercent=0 --use_merge=1 --flush_one_in=1000000 --ops_per_thread=20000000 --index_block_restart_interval=5 --kill_random_test=888887 --cache_size=1048576 --recycle_log_file_num=1 --use_direct_io_for_flush_and_compaction=0 --verify_checksum=1 --bloom_bits=4 --write_buffer_size=33554432
Running db_stress with pid=92750: ./db_stress --periodic_compaction_seconds=100 --max_background_compactions=1 --expected_values_path=/var/folders/fl/gc6hkww94db7sfk79_lnv9xjtdxht2/T/tmp6ZoVgi --format_version=3 --partition_filters=0 --max_write_buffer_number=3 --verify_checksum=1 --cache_index_and_filter_blocks=1 --reopen=20 --acquire_snapshot_one_in=10000 --delpercent=4 --compaction_ttl=0 --block_size=16384 --use_multiget=1 --allow_concurrent_memtable_write=1 --compact_files_one_in=1000000 --target_file_size_multiplier=1 --clear_column_family_one_in=0 --max_bytes_for_level_base=67108864 --use_full_merge_v1=1 --target_file_size_base=16777216 --checkpoint_one_in=1000000 --mmap_read=0 --compression_type=snappy --writepercent=35 --index_type=2 --readpercent=50 --pause_background_one_in=1000000 --subcompactions=1 --ops_per_thread=100000000 --memtablerep=skip_list --set_options_one_in=0 --prefix_size=-1 --test_batches_snapshots=0 --column_families=1 --db=/var/folders/fl/gc6hkww94db7sfk79_lnv9xjtdxht2/T/rocksdb_crashtest_blackboxN3ukOv --use_direct_reads=0 --max_manifest_file_size=1073741824 --compact_range_one_in=1000000 --open_files=-1 --destroy_db_initially=0 --progress_reports=0 --compression_zstd_max_train_bytes=0 --snapshot_hold_ops=100000 --enable_pipelined_write=0 --nooverwritepercent=1 --compression_max_dict_bytes=16384 --max_key=100000000 --prefixpercent=0 --flush_one_in=1000000 --delrangepercent=1 --index_block_restart_interval=6 --cache_size=1048576 --recycle_log_file_num=1 --use_direct_io_for_flush_and_compaction=0 --use_merge=0 --bloom_bits=1.27621733141 --write_buffer_size=33554432
check_mode=0, kill option=888887, exitcode=1
Please install gflags to run rocksdb tools
TEST FAILED. See kill option and exit code above!!!
make: *** [whitebox_crash_test] Error 1
make: *** Waiting for unfinished jobs....
WARNING: db_stress ended before kill: exitcode=1
stderr has error message:
***Please install gflags to run rocksdb tools***
make: *** [blackbox_crash_test] Error 2 |
@javeme never mind. I ran it on your behave and it passed. Going forward, to run db_stress, you need to install gflags. I found the easiest way to get a compatible gflag is to download gflag from github, make it and install. Maybe next time:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
I realized that we need to add a unit test for it to cover the case we want to cover. Since the time difference, to speed it up, I plan to merge it.
I ran it and it passed with or without ASAN. |
You can choose to send a separate PR for unit tests after I merge the PR. |
Also, update HISTORY.md to mention this new improvement. |
This pull request has been merged in c2029f9. |
1 similar comment
This pull request has been merged in c2029f9. |
@siying Thanks very much. I'm glad to add unit tests for both c++ and java, I will submit it after a while. |
Done |
Summary: It's easy to cause coredump when closing ColumnFamilyHandle with unreleased iterators, especially iterators release is controlled by java GC when using JNI. This patch fixed concurrent CF iteration and drop, we let iterators(actually SuperVersion) hold a ColumnFamilyData reference to prevent the CF from being released too early. fixed facebook#5982 Pull Request resolved: facebook#6147 Differential Revision: D18926378 fbshipit-source-id: 1dff6d068c603d012b81446812368bfee95a5e15
Summary: improve facebook#6147 Pull Request resolved: facebook#6180 Differential Revision: D19148936 fbshipit-source-id: f691c9879fd51d54e96c1a99670cf85ca4485a89
Summary: The `ColumnFamilyData::UnrefAndTryDelete` code currently on the trunk unlocks the DB mutex before destroying the `ThreadLocalPtr` holding the per-thread `SuperVersion` pointers when the only remaining reference is the back reference from `super_version_`. The idea behind this was to break the circular dependency between `ColumnFamilyData` and `SuperVersion`: when the penultimate reference goes away, `ColumnFamilyData` can clean up the `SuperVersion`, which can in turn clean up `ColumnFamilyData`. (Assuming there is a `SuperVersion` and it is not referenced by anything else.) However, unlocking the mutex throws a wrench in this plan by making it possible for another thread to jump in and take another reference to the `ColumnFamilyData`, keeping the object alive in a zombie `ThreadLocalPtr`-less state. This can cause issues like #8440 , #8382 , and might also explain the `was_last_ref` assertion failures from the `ColumnFamilySet` destructor we sometimes observe during close in our stress tests. Digging through the archives, this unlocking goes way back to 2014 (or earlier). The original rationale was that `SuperVersionUnrefHandle` used to lock the mutex so it can call `SuperVersion::Cleanup`; however, this logic turned out to be deadlock-prone. #3510 fixed the deadlock but left the unlocking in place. #6147 then introduced the circular dependency and associated cleanup logic described above (in order to enable iterators to keep the `ColumnFamilyData` for dropped column families alive), and moved the unlocking-relocking snippet to its present location in `UnrefAndTryDelete`. Finally, #7749 fixed a memory leak but apparently exacerbated the race by (otherwise correctly) switching to `UnrefAndTryDelete` in `SuperVersion::Cleanup`. The patch simply eliminates the unlocking and relocking, which has been unnecessary ever since #3510 made `SuperVersionUnrefHandle` lock-free. This closes the window during which another thread could increase the reference count, and hopefully fixes the issues above. Pull Request resolved: #8605 Test Plan: Ran `make check` and stress tests locally. Reviewed By: pdillinger Differential Revision: D30051035 Pulled By: ltamasi fbshipit-source-id: 8fe559e4b4ad69fc142579f8bc393ef525918528
Summary: The `ColumnFamilyData::UnrefAndTryDelete` code currently on the trunk unlocks the DB mutex before destroying the `ThreadLocalPtr` holding the per-thread `SuperVersion` pointers when the only remaining reference is the back reference from `super_version_`. The idea behind this was to break the circular dependency between `ColumnFamilyData` and `SuperVersion`: when the penultimate reference goes away, `ColumnFamilyData` can clean up the `SuperVersion`, which can in turn clean up `ColumnFamilyData`. (Assuming there is a `SuperVersion` and it is not referenced by anything else.) However, unlocking the mutex throws a wrench in this plan by making it possible for another thread to jump in and take another reference to the `ColumnFamilyData`, keeping the object alive in a zombie `ThreadLocalPtr`-less state. This can cause issues like #8440 , #8382 , and might also explain the `was_last_ref` assertion failures from the `ColumnFamilySet` destructor we sometimes observe during close in our stress tests. Digging through the archives, this unlocking goes way back to 2014 (or earlier). The original rationale was that `SuperVersionUnrefHandle` used to lock the mutex so it can call `SuperVersion::Cleanup`; however, this logic turned out to be deadlock-prone. #3510 fixed the deadlock but left the unlocking in place. #6147 then introduced the circular dependency and associated cleanup logic described above (in order to enable iterators to keep the `ColumnFamilyData` for dropped column families alive), and moved the unlocking-relocking snippet to its present location in `UnrefAndTryDelete`. Finally, #7749 fixed a memory leak but apparently exacerbated the race by (otherwise correctly) switching to `UnrefAndTryDelete` in `SuperVersion::Cleanup`. The patch simply eliminates the unlocking and relocking, which has been unnecessary ever since #3510 made `SuperVersionUnrefHandle` lock-free. This closes the window during which another thread could increase the reference count, and hopefully fixes the issues above. Pull Request resolved: #8605 Test Plan: Ran `make check` and stress tests locally. Reviewed By: pdillinger Differential Revision: D30051035 Pulled By: ltamasi fbshipit-source-id: 8fe559e4b4ad69fc142579f8bc393ef525918528
It's easy to cause coredump when closing ColumnFamilyHandle with unreleased iterators, especially iterators release is controlled by java GC when using JNI.
This patch fixed concurrent CF iteration and drop, we let iterators(actually SuperVersion) hold a ColumnFamilyData reference to prevent the CF from being released too early.
fixed #5982