-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: Optimize slow log parsing's splitByColon function #54630
Conversation
Signed-off-by: yibin <huyibin@pingcap.cn>
Hi @yibin87. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
/cc @crazycs520 |
/cc @xzhangxian1008 |
// 1. Both field and value string contain only ANSI characters | ||
// 2. value string may be surrounded by brackets, allowed brackets includes "[]" and "{}", like {key: value,{key: value}} | ||
// "[]" can only be nested inside "[]"; "{}" can only be nested inside "{}" | ||
// 3. value string can't contain ' ' character unless it is inside brackets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is origin implementation also contains these restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, while current slow log satisfies these restrictions and it doesn't seem to be changed frequently in future. Previous implementation provides a broader functionality.
pkg/executor/slow_query.go
Outdated
|
||
// splitByColon split a line like "field: value field: value..." | ||
// Note: | ||
// 1. Both field and value string contain only ANSI characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is too restricted. E.g. tidb_redact_log may output non-ansi characters later sometime.
I am thinking if we could just use strings.Index(a, b)
to mimic the original regexp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this rule can be removed for current algorithm, I'll change it.
Simply use strings.Index(a, b) can't solve the ": " inside "{}" cases.
BTW, splitByColon is not the root entry for slow log parsing, for logs that may contain non-ansi characters we can handle it separately if we doneed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked that ascii characters can't be a part of a valid non-ascii utf8 character, since multiple byte charater always has 110/1110 prefix. Thus both field and value string can contain non ascii characters. Besides, using rune[] will affect performance a lot. So keep the previous algorithm, just update the restriction comments:
For field string, first character should be ascii letters or digits. For value string, whitespace can only be contained inside "{}"/"[]".
return -1 | ||
} | ||
|
||
func isLetterOrNumeric(b byte) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can put this function in pkg/util/stringutil file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leave it in the local file, since it may be changed for slow log parsing only.
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
/test check-dev2 |
@yibin87: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@yibin87: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test check-dev2 |
@yibin87: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
In response to a cherrypick label: new pull request created to branch |
In response to a cherrypick label: new pull request created to branch |
In response to a cherrypick label: new pull request created to branch |
In response to a cherrypick label: new pull request created to branch |
In response to a cherrypick label: new pull request created to branch |
What problem does this PR solve?
Issue Number: close #54538
Problem Summary:
What changed and how does it work?
Replace regexp matching with simple string comparison operations. Besides, previously, ":=" inside "{}" are not handled correctly, fix it in this PR also.
In local mannual test, performance will improve about 10x, from 24s to 2.2s for the following sql:
SELECT Digest, Query, Conn_ID, (UNIX_TIMESTAMP(Time) + 0E0) AS timestamp, Query_time, Mem_max, Process_keys FROM `INFORMATION_SCHEMA`.`CLUSTER_SLOW_QUERY` WHERE Time BETWEEN FROM_UNIXTIME(1720471890) AND FROM_UNIXTIME(1720515091) ORDER BY Query_time DESC LIMIT 100;
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.