-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rocksandra] support cassandra partition deletion #3874
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@wpc has updated the pull request. |
* add a merge operator for parition meta data (currently partition deletion info only) * read partition deletion in cassandra compaction filter and drop rows if it's partition has been deleted
@wpc has updated the pull request. |
update the PR make sure iterator on partition meta cf is deleted after use |
Summary: For supporting partition level deletion we create a partition meta cf in each rocksdb instance, and store partition deletion info into it. On rocksdb side compaction filter will read partition deletion info from this cf and drop data base on marked_for_delete_at. (facebook/rocksdb#3874) Streaming for partition meta data will be in a separated diff Test Plan: Fucntional ======== partition dump after deletion ``` --- metadata: 0x816099270c387b3c989d7ecf13053ec1 0x5b0254a800056cb0504844ab --- rows: 0x816099270c387b3c989d7ecf13053ec180000000238e8b0e 0x7fffffff8000000000000000000000056b954457139f00000010be10e78051a611e88080808080808080 0x816099270c387b3c989d7ecf13053ec18000000046a34f04 0x7fffffff8000000000000000000000056b955bb808da000000109a01d60051a711e88080808080808080 0x816099270c387b3c989d7ecf13053ec180000000c1b912cd 0x7fffffff8000000000000000000000056b952ddfd27100000010e51ae98051a511e88080808080808080 0x816099270c387b3c989d7ecf13053ec180000000db4233f7 0x7fffffff8000000000000000000000056b9533f4a2a3000000101a273c0051a611e88080808080808080 0x816099270c387b3c989d7ecf13053ec180000000f7465f43 0x7fffffff8000000000000000000000056b954625e8fa00000010d386118051a611e88080808080808080 0x816099270c387b3c989d7ecf13053ec18000000116af6e77 0x7fffffff8000000000000000000000056b959d045dbf000000103e85178051aa11e88080808080808080 ``` dump after full compaction finish ``` --- metadata: 0x816099270c387b3c989d7ecf13053ec1 0x5b0254a800056cb0504844ab --- rows: ``` Performance ========== No obvious CPU/IO regresssion https://fburl.com/ods/e9m1tdr2 https://fburl.com/ods/o5j049hb Reviewers: svemuri, dikang, sdev, #ig-cassandra Reviewed By: dikang Subscribers: fdeliege, trunkagent Differential Revision: https://phabricator.intern.facebook.com/D8063994 Signature: 8063994:1527988646:7d236751d82d4fee40e5b0ca3dd1da94d8e97e57
Summary: For supporting partition level deletion we create a partition meta cf in each rocksdb instance, and store partition deletion info into it. On rocksdb side compaction filter will read partition deletion info from this cf and drop data base on marked_for_delete_at. (facebook/rocksdb#3874) Streaming for partition meta data will be in a separated diff Test Plan: Fucntional ======== testing it in storyarchive cluster partition dump after deletion ``` [23:32:32 root@priv_prn/instagram/cassandra-data-storyarchiverocks/25 /var/log/cassandra]$ nodetool dumppartition storyarchive reel_media_viewer_by_ts_perm_compact_001 1773713256096284353 --- metadata: 0x816099270c387b3c989d7ecf13053ec1 0x5b0254a800056cb0504844ab --- rows: 0x816099270c387b3c989d7ecf13053ec180000000238e8b0e 0x7fffffff8000000000000000000000056b954457139f00000010be10e78051a611e88080808080808080 0x816099270c387b3c989d7ecf13053ec18000000046a34f04 0x7fffffff8000000000000000000000056b955bb808da000000109a01d60051a711e88080808080808080 0x816099270c387b3c989d7ecf13053ec180000000c1b912cd 0x7fffffff8000000000000000000000056b952ddfd27100000010e51ae98051a511e88080808080808080 0x816099270c387b3c989d7ecf13053ec180000000db4233f7 0x7fffffff8000000000000000000000056b9533f4a2a3000000101a273c0051a611e88080808080808080 0x816099270c387b3c989d7ecf13053ec180000000f7465f43 0x7fffffff8000000000000000000000056b954625e8fa00000010d386118051a611e88080808080808080 0x816099270c387b3c989d7ecf13053ec18000000116af6e77 0x7fffffff8000000000000000000000056b959d045dbf000000103e85178051aa11e88080808080808080 ``` dump after full compaction finish ``` [01:19:46 root@priv_prn/instagram/cassandra-data-storyarchiverocks/25 /var/log/cassandra]$ nodetool dumppartition storyarchive reel_media_viewer_by_ts_perm_compact_001 1773713256096284353 --- metadata: 0x816099270c387b3c989d7ecf13053ec1 0x5b0254a800056cb0504844ab --- rows: ``` Performance ========== Tested on priv_ftw/instagram/cassandra-data-feedviewstaterocks/15 (high compaction, no deletion), deploy at 5/31 2:54pm No obvious CPU/IO regresssion https://fburl.com/ods/e9m1tdr2 https://fburl.com/ods/o5j049hb Reviewers: svemuri, dikang, sdev, #ig-cassandra Reviewed By: dikang Subscribers: fdeliege, trunkagent Differential Revision: https://phabricator.intern.facebook.com/D8063994 Signature: 8063994:1527988646:7d236751d82d4fee40e5b0ca3dd1da94d8e97e57
@wpc Do we still need it? |
Yes, this is still needed. |
@wpc @cooldoger the change LGTM! I haven't gone into details of cassandra specific logic, hopefully someone on your team can review that. Please rebase and make sure all tests pass and I'd be happy to land it. |
To support partition deletion in Rocksandra, we created a separated partition meta cf in each database and passing db and cf handle into the compaction filter. The compaction filter is in charge of dropping the deleted data based on deletion info it read from the partition meta cf. This PR is the first step just for releasing the disk space. Next step would change in cassandra merge operator to convert partition deleted rows into tombstones.
deletion info only)
if it's partition has been deleted
make format
for cassandra related test files