-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the compaction filter logic to let periodic compaction go through custom compaction filter, to gc expired data #5447
Conversation
…ugh custom compaction filter, to gc expired data
I do agree what you said here, this is an issue. In theory, we could judge the input level and output level. For example, only >= L4, we use the custom compaction filter? But I didn't find way to do it in |
Yes, we don't have a way to do compaction for only >=L4. For now, according to my experience, as time proceeds, the space data get more and more larger with lots of expired data, which makes the performance very bad and also higher risk for single rocksdb instance with large data. |
Actually, there indeed is a level, see the code in
We can modify this way |
updated |
} | ||
LOG(INFO) << "Do default minor compaction!"; | ||
return std::unique_ptr<rocksdb::CompactionFilter>(nullptr); | ||
// No worry, by default flush will not go through the custom compaction filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Maybe we should check
level
here. - Besides, I think when level < 4, we should also do the custom compaction now and then, with low frequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Maybe we should check
level
here.- Besides, I think when level < 4, we should also do the custom compaction now and then, with low frequency.
But it is not possible to get the level here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in your code now, if the level < 4, filter
function will also be called for each key, which is not a good way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this is a tradeoff. Although this is not the perfect idea, it should be better to leave the expired data not-really handled. What do you think? If there is better idea, it is appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thx~
CI failed because of compile, please fix it~ |
Fixed. |
The test case failed 😢 , probably related to the new gflag |
Yes, I noticed that failure. Just updated again for the test case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job~~~ Thx again
…ugh custom compaction filter, to gc expired data (#5447)
* refactor traverse output (#5464) * refactor traverse output * fix pruneproperties error & none_direct_dst * fix test error * fix shortest path * Change the compaction filter logic to let periodic compaction go through custom compaction filter, to gc expired data (#5447) * Push filter down cross join (#5473) * fix comment * push down filter through cross join --------- Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com> * Fix shortest path crash (#5472) * fix crash of geo (#5475) * fix crash of geo * change log(fatal) to log(error) * fix miss arg $GITHUB_OUTPUT (#5478) * Split optimizer rules (#5470) Fix compile small rename Fix tck Fix tck fmt Fix tck Fix tck * Enhancement/optimize edge all predicate (#5481) * fix eval contains filter on storaged (#5485) * fix eval contains filter on storaged * add tck case * add tck case * fix tck * fix lint * fix lint * Fix expression util function (#5487) fmt Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com> * fix ContainsFilter random fail (#5489) * Fixed graphd startup issue (#5493) * fix prunproperties (#5494) * stop the pushing down of not expressions that are not rewritten to proper forms. (#5502) * Fix edge all predicate with rank function (#5503) Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com> * rewrite param in subgraph & path (#5500) * check param in subgraph * rewrite param in path --------- Co-authored-by: Sophie <84560950+Sophie-Xie@users.noreply.github.com> * Fix concurrent bug about session count (#5496) * Fix regex expression (#5507) * Update requirements.txt (#5512) Solidified tomli version to solve centos7 compatibility issues * Update cluster id (#5514) --------- Co-authored-by: jimingquan <mingquan.ji@vesoft.com> Co-authored-by: Ryan <ydlu1987@gmail.com> Co-authored-by: Yee <2520865+yixinglu@users.noreply.github.com> Co-authored-by: jie.wang <38901892+jievince@users.noreply.github.com> Co-authored-by: George <58841610+Shinji-IkariG@users.noreply.github.com> Co-authored-by: kyle.cao <kyle.cao@vesoft.com> Co-authored-by: codesigner <codesigner.huang@vesoft.com> Co-authored-by: dutor <440396+dutor@users.noreply.github.com> Co-authored-by: Cheng Xuntao <7731943+xtcyclist@users.noreply.github.com> Co-authored-by: Yichen Wang <18348405+Aiee@users.noreply.github.com>
Fix the issue that, many expired data in the bottommost level don't get garbage collected
What type of PR is this?
What problem(s) does this PR solve?
Issue(s) number:
#5438
Description:
How do you solve it?
Special notes for your reviewer, ex. impact of this fix, design document, etc:
Checklist:
Tests:
Affects:
Release notes:
Please confirm whether to be reflected in release notes and how to describe: