Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to scan keys with the suffix #2222

Closed
wants to merge 2 commits into from

Conversation

baobaomaomeng
Copy link

After modifying the code, I'm encountering some strange issues. I don't understand why this test is failing.

first

second

third
At the same time, there's a bit of a performance issue. I'm not sure about the optimal number of scans for substring matching and suffix matching, I haven't set a limit yet. In the next couple of days, I'll try to add support for HSCAN, SSCAN, and ZSCAN.

@jihuayu
Copy link
Member

jihuayu commented Apr 5, 2024

@baobaomaomeng You can add a config to set the optimal number. We can talk about the default value later.
If this test case doesn't your need, you can hide it in your local.
You can use ./x.py check format to format the code.

@baobaomaomeng
Copy link
Author

@baobaomaomeng You can add a config to set the optimal number. We can talk about the default value later. If this test case doesn't your need, you can hide it in your local. You can use ./x.py check format to format the code.

I've now added some tests,add support hsacn zscan and SSCAN, and the final step is to add configuration options. I'm not sure how to add configuration options, nor do I know what a good default number of search times would be. I feel like I need some help in this area.

@jihuayu
Copy link
Member

jihuayu commented Apr 14, 2024

@baobaomaomeng That's great, thank you for your efforts.

Regarding how to add configuration items, you can refer to this PR.

The default number may be any number(such as 100,000). We can have the Reviewers confirm the final value, this modification is quite simple.

@baobaomaomeng
Copy link
Author

@baobaomaomeng That's great, thank you for your efforts.

Regarding how to add configuration items, you can refer to this PR.

The default number may be any number(such as 100,000). We can have the Reviewers confirm the final value, this modification is quite simple.

One point that confuses me is why the limit in the scan operator is not a configuration item, and now I need to add the maximum number to scan at once into the configuration items.

@baobaomaomeng baobaomaomeng marked this pull request as ready for review April 23, 2024 15:56
@jihuayu
Copy link
Member

jihuayu commented Apr 24, 2024

@baobaomaomeng Hi, can you resolve the conflicts?

Copy link
Member

@jihuayu jihuayu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! Thank you for your contribution.
Can you please resolve the code conflicts first? I will continue to review the code after resolving the conflicts.

@@ -83,6 +89,7 @@ class CommandScanBase : public Commander {
std::string cursor_;
std::string prefix_;
int limit_ = 20;
int pm_ = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use enum instead of it?
Can you change and rename it to prefix_mode_?

@@ -878,7 +878,7 @@ class CommandScan : public CommandScanBase {

std::vector<std::string> keys;
std::string end_key;
auto s = redis_db.Scan(key_name, limit_, prefix_, &keys, &end_key);
auto s = redis_db.Scan(key_name, limit_, srv->GetConfig()->max_scan_num, prefix_, &keys, &end_key, pm_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks ugly. Can we have some good way to deal with config?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue about this, do you have any suggestions?

Comment on lines 400 to 402
if (cnt_false >= scan_false_limit) {
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put those codes to begin?

if (!scan_match_check()) {
    cnt_false++;
    if (cnt_false < scan_false_limit) {
        continue;
    } else {
        break;
    }
}

Comment on lines 410 to 397
if (keys->empty() && cnt_false >= scan_false_limit) {
end_cursor->append(user_key);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a situation where some keys have been found, but cnt_false >= scan_false_limit.

Comment on lines 330 to 332
rocksdb::Status Database::Scan(const std::string &cursor, uint64_t scan_success_limit, uint64_t scan_false_limit,
const std::string &fix, std::vector<std::string> *keys, std::string *end_cursor,
const int pm) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the param fix to a more meaningful name? Can you keep the name the same as the header file?

@baobaomaomeng baobaomaomeng force-pushed the unstable branch 2 times, most recently from 98cff58 to 65a8cd5 Compare May 4, 2024 05:56
@git-hulk git-hulk changed the title add scan substring,suffix Allow to scan keys with the suffix May 4, 2024
@git-hulk git-hulk requested review from jihuayu and git-hulk May 4, 2024 06:12
.gitignore Outdated
@@ -40,9 +40,13 @@ version.h
.idea
.vscode
.cache
.gitignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it's ignored?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, that was my oversight.

.gitignore Outdated
Comment on lines 46 to 52
test
testdb

build
cmake-build-*
debug-build
test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated test

@@ -187,7 +187,27 @@ func ScanTest(t *testing.T, rdb *redis.Client, ctx context.Context) {
require.Equal(t, []string{"1", "10", "foo", "foobar"}, keys)
})

t.Run("ZSCAN with PATTERN", func(t *testing.T) {
t.Run("HSCAN with subkey PATTERN", func(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we need more test cases

@@ -39,14 +41,20 @@ class CommandScanBase : public Commander {

return ParseAdditionalFlags<true>(parser);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't introduce random changes.

.gitignore Outdated
testdb

build
cmake-build-*
debug-build
test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate line, it has been added in previous section.

@@ -152,6 +153,7 @@ Config::Config() {
{"pidfile", true, new StringField(&pidfile, kDefaultPidfile)},
{"max-io-mb", false, new IntField(&max_io_mb, 0, 0, INT_MAX)},
{"max-bitmap-to-string-mb", false, new IntField(&max_bitmap_to_string_mb, 16, 0, INT_MAX)},
{"max-scan-num", false, new IntField(&max_scan_num, 2, 0, INT_MAX)},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems 2 is too smaller to be as the max_scan_num?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just for test easy,jihuayu say they will determine the final value.

@@ -309,10 +314,12 @@ rocksdb::Status Database::Keys(const std::string &prefix, std::vector<std::strin
return rocksdb::Status::OK();
}

rocksdb::Status Database::Scan(const std::string &cursor, uint64_t limit, const std::string &prefix,
std::vector<std::string> *keys, std::string *end_cursor, RedisType type) {
rocksdb::Status Database::Scan(const std::string &cursor, uint64_t scan_success_limit, uint64_t scan_false_limit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan_matched_limt and scan_mismatched_limit should be a better name in this scenario.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter of the SCAN command is too long now, it's time to group the scan parameters into a struct.

Comment on lines 317 to 319
rocksdb::Status Database::Scan(const std::string &cursor, uint64_t scan_success_limit, uint64_t scan_false_limit,
const std::string &match_str, std::vector<std::string> *keys, std::string *end_cursor,
RedisType type, const MatchType match_mode) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions about your design.

But before that, could you give a little introduction about your general design and supported patterns in the PR description?

Then we can have a detailed review on your code.

Refer to Redis glob-style patterns:

Supported glob-style patterns:

h?llo matches hello, hallo and hxllo
h*llo matches hllo and heeeello
h[ae]llo matches hello and hallo, but not hillo
h[^e]llo matches hallo, hbllo, ... but not hello
h[a-b]llo matches hallo and hbllo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added two matching patterns.
such as
*llo matches ello,hello,eeeello
l matches l,ll,llo,ello,hello

And,i add scan_false_time limit to config,beacasue if we use suffix,substring match, we need scan one by one, and the potentially long time may be unacceptable. so I add this argument in scan,zsan,sscan,hscan command。I found that the three scan methods to get the last scanned key are all different, which is not convenient for me to modify,so i add a string value in CommandSubkeyScanBase to represented the last key that was scanned。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about how to design a scalable scan, but I don't have any ideas. This might be a bit difficult for me. I think I need some guidance on the thought process right now.

@PragmaTwice
Copy link
Member

In my opinion, we need a well-designed solution for implementing Redis glob-style patterns that is scalable, rather than just handling specific cases. If the design is not scalable, we may have to remove all modifications when aiming to support the complete pattern cases.

@baobaomaomeng
Copy link
Author

In my opinion, we need a well-designed solution for implementing Redis glob-style patterns that is scalable, rather than just handling specific cases. If the design is not scalable, we may have to remove all modifications when aiming to support the complete pattern cases.

Thank you for your review, but this is not consistent with the goal I communicated at the beginning. I now need to rewrite my code, and I think I need to consider how to write it.Could you give me some reference suggestions?

@PragmaTwice
Copy link
Member

closed by #2608.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants