Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): use exist_table_id to filter exist key when compaction #3038

Merged
merged 5 commits into from
Jun 9, 2022

Conversation

Li0k
Copy link
Contributor

@Li0k Li0k commented Jun 7, 2022

What's changed and what's your intention?

To support clean expired key when compaction

Please explain IN DETAIL what the changes are in this PR and why they are needed:

  • to get the all TableFragments exist_table_ids by fragment_manager
  • compact_task collect the exist_table_id via SSTableInfo
  • compactor reclaim the expired_key by compare the table_id ( which decode from key_prefix) with exist_table_id

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

@Li0k Li0k requested review from zwang28 and hzxa21 June 7, 2022 12:24
@Li0k Li0k requested a review from MrCroxx June 7, 2022 12:29
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some tests at least for the following cases:

  1. Data of all tables in an dropped MV will be cleared after compaction.
  2. Data of all tables in any existing MV will not be cleared after compaction.
  3. Shared buffer / local compaction works as usual

@@ -140,6 +140,9 @@ message CompactTask {
repeated common.ParallelUnitMapping vnode_mappings = 11;
// compaction group the task belongs to
uint64 compaction_group_id = 12;

// exist_table_id for compaction drop key
repeated uint32 exist_table_id = 13;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: exist_table_id -> existing_table_ids

Copy link
Contributor Author

@Li0k Li0k Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it.

and i add some test below

  1. check drop all keys without existing_table_id_set, we can see after compaction, no sstable produce
  2. check to compaction diffence keyspace (table_id prefix) with existing_table_id_set, we can see that the key which have drop_table_id prefix has been drop, and the other prefix (with exist_table_id prefix) remain. the test check all keys by hummock_scan, to avoid unexpectd data lost
  3. local compaction logic and workflow not change , i think we test it on above unit test and e2e

Comment on lines 783 to 784
// iter.next().await?;
// continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it

@@ -160,6 +160,7 @@ impl Compactor {
// VNode mappings are not required when compacting shared buffer to L0
vnode_mappings: vec![],
compaction_group_id: StaticCompactionGroupId::StateDefault.into(),
exist_table_id: vec![],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause shared buffer compaction producing empty SSTs. We should disable the check for dropped table ids for shared buffer compaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i think that we can distinguish shared_buffer_compation and normal_compaction by using strategy like CompactionFilter (next commit)

@Li0k Li0k force-pushed the li0k/drop_mv_from_storage branch 4 times, most recently from 818fec6 to 244f571 Compare June 8, 2022 11:53
@Li0k Li0k force-pushed the li0k/drop_mv_from_storage branch from 244f571 to 4b39ef2 Compare June 8, 2022 12:18
@Li0k Li0k marked this pull request as ready for review June 8, 2022 12:34
@Li0k Li0k changed the title [WIP] feat(storage): use exist_table_id to filter exist key when compaction feat(storage): use exist_table_id to filter exist key when compaction Jun 8, 2022
@codecov
Copy link

codecov bot commented Jun 8, 2022

Codecov Report

Merging #3038 (1649caa) into main (4f7206d) will increase coverage by 0.07%.
The diff coverage is 93.96%.

@@            Coverage Diff             @@
##             main    #3038      +/-   ##
==========================================
+ Coverage   73.49%   73.57%   +0.07%     
==========================================
  Files         733      733              
  Lines       99536    99784     +248     
==========================================
+ Hits        73157    73412     +255     
+ Misses      26379    26372       -7     
Flag Coverage Δ
rust 73.57% <93.96%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/meta/src/rpc/server.rs 0.00% <0.00%> (ø)
src/meta/src/stream/meta.rs 47.04% <60.00%> (+0.41%) ⬆️
src/storage/src/hummock/compactor_tests.rs 92.25% <94.67%> (+3.55%) ⬆️
src/meta/src/hummock/hummock_manager.rs 87.00% <96.00%> (+0.15%) ⬆️
src/storage/src/hummock/compactor.rs 73.61% <98.18%> (+2.53%) ⬆️
src/meta/src/hummock/compaction/mod.rs 81.50% <100.00%> (+1.85%) ⬆️
src/meta/src/hummock/compactor_manager.rs 98.81% <100.00%> (+<0.01%) ⬆️
src/meta/src/hummock/test_utils.rs 95.65% <100.00%> (+0.05%) ⬆️
src/meta/src/stream/stream_manager.rs 68.82% <100.00%> (+0.03%) ⬆️
src/meta/src/hummock/hummock_manager_tests.rs 89.23% <0.00%> (-0.85%) ⬇️
... and 11 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@Li0k Li0k force-pushed the li0k/drop_mv_from_storage branch from b046871 to 8980df5 Compare June 8, 2022 12:57
}
}

// in our design, frontend avoid to access keys which had be deleted, so we dont need to
// consider the epoch when the compaction_filter match (it means that mv had drop)
if !compaction_filter.filter(iter_key) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !compaction_filter.filter(iter_key) {
if !drop && !compaction_filter.filter(iter_key) {

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! LGTM.

@Li0k Li0k merged commit f5c62f8 into main Jun 9, 2022
@Li0k Li0k deleted the li0k/drop_mv_from_storage branch June 9, 2022 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants