Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the compaction filter logic to let periodic compaction go through custom compaction filter, to gc expired data #5447

Merged
merged 10 commits into from
Apr 4, 2023

Conversation

luyade
Copy link
Contributor

@luyade luyade commented Mar 28, 2023

Fix the issue that, many expired data in the bottommost level don't get garbage collected

What type of PR is this?

  • bug
  • feature
  • enhancement

What problem(s) does this PR solve?

Issue(s) number:

#5438

Description:

How do you solve it?

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Checklist:

Tests:

  • Unit test(positive and negative cases)
  • Function test
  • Performance test
  • N/A

Affects:

  • Documentation affected (Please add the label if documentation needs to be modified.)
  • Incompatibility (If it breaks the compatibility, please describe it and add the label.)
  • If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
  • Performance impacted: Consumes more CPU/Memory

Release notes:

Please confirm whether to be reflected in release notes and how to describe:

ex. Fixed the bug .....

…ugh custom compaction filter, to gc expired data
@critical27
Copy link
Contributor

critical27 commented Mar 28, 2023

However, during daily running, the only custom-compaction chance will most possibly be used by upper level data compaction, such as level0 => level1. So the default 30-days periodic compaction will go through the default minor compaction, without go through custom compaction filter. So the expired data will always be there.

I do agree what you said here, this is an issue. In theory, we could judge the input level and output level. For example, only >= L4, we use the custom compaction filter? But I didn't find way to do it in CompactionFilterFactory for now.

@luyade
Copy link
Contributor Author

luyade commented Mar 28, 2023

However, during daily running, the only custom-compaction chance will most possibly be used by upper level data compaction, such as level0 => level1. So the default 30-days periodic compaction will go through the default minor compaction, without go through custom compaction filter. So the expired data will always be there.

I do agree what you said here, this is an issue. In theory, we could judge the input level and output level. For example, only >= L4, we use the custom compaction filter? But I didn't find way to do it in CompactionFilterFactory for now.

Yes, we don't have a way to do compaction for only >=L4.

For now, according to my experience, as time proceeds, the space data get more and more larger with lots of expired data, which makes the performance very bad and also higher risk for single rocksdb instance with large data.

@critical27
Copy link
Contributor

Actually, there indeed is a level, see the code in KVCompactionFilter

  bool Filter(int level,
              const rocksdb::Slice& key,
              const rocksdb::Slice& val,
              std::string*,
              bool*) const override {
    UNUSED(level);
    return kvFilter_->filter(spaceId_,
                             folly::StringPiece(key.data(), key.size()),
                             folly::StringPiece(val.data(), val.size()));
  }

We can modify this way

@luyade
Copy link
Contributor Author

luyade commented Mar 29, 2023

Actually, there indeed is a level, see the code in KVCompactionFilter

  bool Filter(int level,
              const rocksdb::Slice& key,
              const rocksdb::Slice& val,
              std::string*,
              bool*) const override {
    UNUSED(level);
    return kvFilter_->filter(spaceId_,
                             folly::StringPiece(key.data(), key.size()),
                             folly::StringPiece(val.data(), val.size()));
  }

We can modify this way

updated

}
LOG(INFO) << "Do default minor compaction!";
return std::unique_ptr<rocksdb::CompactionFilter>(nullptr);
// No worry, by default flush will not go through the custom compaction filter.
Copy link
Contributor

@pengweisong pengweisong Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Maybe we should check level here.
  2. Besides, I think when level < 4, we should also do the custom compaction now and then, with low frequency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Maybe we should check level here.
  2. Besides, I think when level < 4, we should also do the custom compaction now and then, with low frequency.

But it is not possible to get the level here.

Copy link
Contributor

@pengweisong pengweisong Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in your code now, if the level < 4, filter function will also be called for each key, which is not a good way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is a tradeoff. Although this is not the perfect idea, it should be better to leave the expired data not-really handled. What do you think? If there is better idea, it is appreciated.

@Sophie-Xie Sophie-Xie added the ready-for-testing PR: ready for the CI test label Mar 30, 2023
critical27
critical27 previously approved these changes Apr 3, 2023
Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx~

@critical27
Copy link
Contributor

CI failed because of compile, please fix it~

@luyade
Copy link
Contributor Author

luyade commented Apr 3, 2023

CI failed because of compile, please fix it~

Fixed.

@critical27
Copy link
Contributor

The test case failed 😢 , probably related to the new gflag min_level_for_custom_filter, probably need to set it to 0 in test case.

@luyade
Copy link
Contributor Author

luyade commented Apr 3, 2023

The test case failed 😢 , probably related to the new gflag min_level_for_custom_filter, probably need to set it to 0 in test case.

Yes, I noticed that failure. Just updated again for the test case.

Copy link
Contributor

@critical27 critical27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job~~~ Thx again

@Sophie-Xie Sophie-Xie merged commit 16333f6 into vesoft-inc:master Apr 4, 2023
@luyade luyade deleted the fix_ttl_data_compaction branch April 4, 2023 05:35
Sophie-Xie pushed a commit that referenced this pull request Apr 6, 2023
…ugh custom compaction filter, to gc expired data (#5447)
dutor added a commit that referenced this pull request Apr 20, 2023
* refactor traverse output (#5464)

* refactor traverse output

* fix pruneproperties error & none_direct_dst

* fix test error

* fix shortest path

* Change the compaction filter logic to let periodic compaction go through custom compaction filter, to gc expired data (#5447)

* Push filter down cross join (#5473)

* fix comment

* push down filter through cross join

---------

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Fix shortest path crash (#5472)

* fix crash of geo (#5475)

* fix crash of geo

* change log(fatal) to log(error)

* fix miss arg $GITHUB_OUTPUT (#5478)

* Split optimizer rules (#5470)

Fix compile

small rename

Fix tck

Fix tck

fmt

Fix tck

Fix tck

* Enhancement/optimize edge all predicate (#5481)

* fix eval contains filter on storaged (#5485)

* fix eval contains filter on storaged

* add tck case

* add tck case

* fix tck

* fix lint

* fix lint

* Fix expression util function (#5487)

fmt

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* fix ContainsFilter random fail (#5489)

* Fixed graphd startup issue (#5493)

* fix prunproperties (#5494)

* stop the pushing down of not expressions that are not rewritten to proper forms. (#5502)

* Fix edge all predicate with rank function (#5503)

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* rewrite param in subgraph & path (#5500)

* check param in subgraph

* rewrite param in path

---------

Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com>

* Fix concurrent bug about session count (#5496)

* Fix regex expression (#5507)

* Update requirements.txt (#5512)

Solidified tomli version to solve centos7 compatibility issues

* Update cluster id (#5514)

---------

Co-authored-by: jimingquan <mingquan.ji@vesoft.com>
Co-authored-by: Ryan <ydlu1987@gmail.com>
Co-authored-by: Yee <2520865+yixinglu@users.noreply.github.com>
Co-authored-by: jie.wang <38901892+jievince@users.noreply.github.com>
Co-authored-by: George <58841610+Shinji-IkariG@users.noreply.github.com>
Co-authored-by: kyle.cao <kyle.cao@vesoft.com>
Co-authored-by: codesigner <codesigner.huang@vesoft.com>
Co-authored-by: dutor <440396+dutor@users.noreply.github.com>
Co-authored-by: Cheng Xuntao <7731943+xtcyclist@users.noreply.github.com>
Co-authored-by: Yichen Wang <18348405+Aiee@users.noreply.github.com>
Sophie-Xie added a commit that referenced this pull request Apr 21, 2023
… go through custom compaction filter, to gc expired data (#5447)"

This reverts commit 37a24f1.
Sophie-Xie added a commit that referenced this pull request Apr 21, 2023
… go through custom compaction filter, to gc expired data (#5447)"

This reverts commit 37a24f1.
critical27 pushed a commit that referenced this pull request Apr 23, 2023
… go through custom compaction filter, to gc expired data (#5447)" (#5522)

This reverts commit 37a24f1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-testing PR: ready for the CI test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants