Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command parser #1032

Merged
merged 14 commits into from
Oct 30, 2022
Merged

Add command parser #1032

merged 14 commits into from
Oct 30, 2022

Conversation

PragmaTwice
Copy link
Member

@PragmaTwice PragmaTwice commented Oct 23, 2022

Partially follows #598.

The current CommandParser is a prototype that adds only a few methods that are really usable at the moment (but does not affect it being merged).

To demonstrate the use of CommandParser, I rewrote the Parse implementation for a few commands in redis_cmd.cc. It is easy to see that the amount of code is massively (multiplied) reduced, and the parsing logic can be expressed in just a few lines of code, especially when complex-syntax commands are encountered.

Other changes: macro GET_OR_RET is added to status.h to simplify status-related code flow via statement expression.

require.ErrorContains(t, rdb.Do(ctx, "SET", "foo", "bar", "exat", "0").Err(), "invalid expire time")
require.ErrorContains(t, rdb.Do(ctx, "SET", "foo", "bar", "pxat", "1234xyz").Err(), "not an integer")
require.ErrorContains(t, rdb.Do(ctx, "SET", "foo", "bar", "pxat", "0").Err(), "invalid expire time")
require.ErrorContains(t, rdb.Do(ctx, "SET", "foo", "bar", "ex", "1234xyz").Err(), "non-integer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PragmaTwice Really nice job!
These error messages (and others like ERR wrong number of arguments) were written that way to be consistent with the Redis protocol.

127.0.0.1:6379> set foo bar ex 1234tyg
(error) ERR value is not an integer or out of range
127.0.0.1:6379> set foo bar ex 0
(error) ERR invalid expire time in 'set' command
127.0.0.1:6379> 

So I'm not sure if it's correct to change them.

Copy link
Member Author

@PragmaTwice PragmaTwice Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is hard to keep all error messages same as redis (and there are currently lots of different error message between redis and kvrocks, including the two error message in your comments), and it may make the develop of kvrocks more and unnecessarily complex. And actually I think there is nearly no difference between "syntax error" and "wrong number of arguments", or "encounter non-integer characters" and "not a integer".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, it's not easy to keep identical error messages. Since Redis doesn't have error codes, I was wondering if Redis-clients parse error messages to get something useful from them or just signal error/no-error? Does error message be considered a part of Redis-protocol?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not possible to be compatible to all redis error, since there are already many errors that are different than redis, some of which are kvrocks-only error. So if they parse them, they cannot get the right message. And redis does not guarantee that they will keep old error message in new version, so I do not think it is necessary to keep error message identical to redis.

src/commands/command_parser.h Show resolved Hide resolved
src/commands/redis_cmd.cc Outdated Show resolved Hide resolved
Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments inline. After a walk-through, I still have some concerns about the macro trick. The rest generally looks good to me.

src/common/status.h Show resolved Hide resolved
@git-hulk
Copy link
Member

git-hulk commented Oct 25, 2022

@PragmaTwice After looking through the PR, I was a bit worry that it needs to take some time for most developers(include myself) to understand how to use it and how it works. And for Redis command arguments, there're only three argument types:

  • string
  • bool, like NX/EX/PX and so on
  • number(int/float) like TTL and score

So I'm wondering if we can simplify the parser API like below:

while(token = parse.next()) {
 switch tolower(token):
 case "ex":
    status = parser.expect<int>(&ttl)
 case "px":
    status = parser.expect<int64_t>(&ttl_ms)  
 ...
}

So that users can only care about the next token and what's next is expected.

@PragmaTwice
Copy link
Member Author

PragmaTwice commented Oct 25, 2022

@PragmaTwice After looking through the PR, I was a bit worry that it needs to take some time for most developers(include myself) to understand how to use it and how it works. And for Redis command arguments, there're only three argument types:

  • string
  • bool, like NX/EX/PX and so on
  • number(int/float) like TTL and score

So I'm wondering if we can simplify the parser API like below:

while(token = parse.next()) {
 switch tolower(token):
 case "ex":
    status = parser.expect<int>(&ttl)
 case "px":
    status = parser.expect<int64_t>(&ttl_ms)  
 ...
}

So that users can only concern token and what's next is expected.

I think, there are lots of problem we need to handle in the sample code:

  • We cannot always move next: for example, to parse (EX v1) | (PX v2) | v3, we need first peek the token (EX or PX), then we can move next, otherwise we may lose v3. For a parser, moving next at every step will severely damage its parsing ability.
  • We need a method to forward error: this is where the sample code is idealized, error handling needs to be abstracted
  • We need a method to prevent different flags in the same layer: for example, to parse [EX a | PX b] | [X | Y], we need to reject something like EX v PX v, X Y or EX v X PX v, and accept EX v EX v, EX v X or Y PX v.
  • If we still need a pattern like this:
int v;
int real_v;
status = parser.expectInt(&v);
if(!status) return ...;
real_v = handle(v);

rather than which this PR provides:

auto real_v = handle(GET_OR_RET(parser.TakeInt()));

then I think we may be hard to use many abstraction provided in modern C++.

Simplifying code means doing good abstraction, and of course good abstraction has a learning cost, but I still feel that the current abstraction is intuitive:

  • parser.Good(): to check if there is still element remain to parse
  • parser.EatICaseFlag(str, flag): to match a specific flag token, move next while sucessful. It can be learned from this example.
  • parser.TakeInt() or parser.TakeStr(): to eat a new integer or string

And I think there is a big question:

If we still want to extract a token and process it in a manual way (and handle every condition manually), then I think we do not need a parsing framework which formalize our command parsing procedure by some parsing techniques.

@git-hulk
Copy link
Member

git-hulk commented Oct 25, 2022

@PragmaTwice Thanks for your explanation.

We cannot always move next: for example, to parse (EX v1) | (PX v2) | v3, we need first peek the token (EX or PX), then we can move next, otherwise we may lose v3. For a parser, moving next at every step will severely damage its parsing ability.

Yes, I got your point. What if we use the parser to iterator all tokens instead of only flags. I will take ZADD command as example:

while(token = parser.next()) {
  case "NX":
    _flags = nx;
  case "INCR":
    _flags = incr;
  default:
    break;
}
while(parse.has_next()) {
   status = parser.expected<double>()
   parse.next()
   status = parser.expected<string>()
}

We need a method to forward error: this is where the sample code is idealized, error handling needs to be abstracted

Yes, it's just a rough idea which didn't think carefully.

We need a method to prevent different flags in the same layer: for example, to parse [EX a | PX b] | [X | Y], we need to reject something like EX v PX v, X Y or EX v X PX v, and accept EX v EX v, EX v X or Y PX v.

In my option, the parser should only care about how to iterator and the type(or range) is right. For whether those flags are exclusive or not, it'd better to handle outside the parser, or the parser will become more and more complex.

Simplifying code means doing good abstraction, and of course good abstraction has a learning cost

Agreed, what I think is if we have more intuitive way to achieve this, so that developers can use it with less learn cost.

@PragmaTwice
Copy link
Member Author

PragmaTwice commented Oct 25, 2022

In my option, the parser should only care about how to iterator and the type(or range) is right. For whether those flags are exclusive or not, it'd better to handle outside the parser, or the parser will become more and more complex.

In this PR, I added only about 5 lines of code to successfully solve this problem (it is so common in redis command, almost in every redis command with a optional flag), and simplified the code hugely (remove SO MANY duplicated code related to this logic). So I do not think it is unnecessary in the parsing framework.

I think a parser should care about every parsing logic, because every logic is related to whether the parser should move next or hold on.

@git-hulk
Copy link
Member

In my option, the parser should only care about how to iterator and the type(or range) is right. For whether those flags are exclusive or not, it'd better to handle outside the parser, or the parser will become more and more complex.

In this PR, I added only about 5 lines of code to successfully solve this problem (it is so common in redis command, almost in every redis command with a optional flag), and simplified the code hugely (remove SO MANY duplicated code related to this logic). So I do not think it is unnecessary in the parsing framework.

I think a parser should care about every parsing logic, because every logic is related to whether the parser should move next or hold on.

Yes, the parsing framework truly removes many duplicate codes. My proposition is whether we can reduce the learning cost if we expect all commands depend on it. And for the parsing framework should care about every logic or not, I have no the solid reason now, so I think we can leave as it be.

@PragmaTwice
Copy link
Member Author

PragmaTwice commented Oct 25, 2022

@git-hulk There is an example in unit tests which parses some command in the syntax [ HELLO i1 v1 | HI v2 ] [X i2 | Y] (where i1 i2 are integers and v1 v2 are strings), and I think it demonstrate how to use the CommandParser well. I think from this example, the interface provided by the current framework can be quickly understood.

@git-hulk
Copy link
Member

mand in the syntax [ HELLO i1 v1 | HI v2 ] [X i2 | Y] (where i1 i2 are integers and v1 v2 are strings), and I think it demonstrate how to use the CommandParser well. I think from this example, the interface provided by the current framework can be quickly understood.

@PragmaTwice Thank you! I'll have another pass again.

@git-hulk
Copy link
Member

To be honest, I'm still a bit hard understanding the implementation well(maybe I should learn more about C++ templates), especially in the part about the exclusive flag. I'm very happy to see this push forward if other folks feel good.

@PragmaTwice
Copy link
Member Author

To be honest, I'm still a bit hard understanding the implementation well(maybe I should learn more about C++ templates), especially in the part about the exclusive flag. I'm very happy to see this push forward if other folks feel good.

I think if the API is clear, intuitive and easy to understand enough, then maybe developers do not need to care and understand the implementation details.
A classic redis command parsing scenario is demonstrated in this code snippet below, and it can be seen that the parsing process is relatively intuitive.

https://github.com/apache/incubator-kvrocks/blob/85ae20ddff43bb71e8370b7a2b19c51c746b6871/tests/cppunit/command_parser_test.cc#L35-L50

@PragmaTwice
Copy link
Member Author

Hi everyone, any new thoughts on this PR?

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to go as long as @PragmaTwice you'll drive the development of the command parsing effort - perhaps the one filed as #794.

If anyone who later works on this domain has further thoughts, it's viable to make an enhancement proposal. This change is not a one-way decision.

Copy link
Contributor

@torwig torwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@PragmaTwice Thank you for your effort. Maybe later today I'll have a chance to use the new parser in action.

Copy link
Member

@tanruixiang tanruixiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. After understanding how to use it I think it is concise enough.

@PragmaTwice
Copy link
Member Author

Thanks all. Merging...

@PragmaTwice PragmaTwice merged commit b83212e into apache:unstable Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants