Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: Optimize slow log parsing's splitByColon function #54630

Merged
merged 11 commits into from
Jul 16, 2024

Conversation

yibin87
Copy link
Contributor

@yibin87 yibin87 commented Jul 15, 2024

What problem does this PR solve?

Issue Number: close #54538

Problem Summary:

What changed and how does it work?

Replace regexp matching with simple string comparison operations. Besides, previously, ":=" inside "{}" are not handled correctly, fix it in this PR also.
In local mannual test, performance will improve about 10x, from 24s to 2.2s for the following sql:
SELECT Digest, Query, Conn_ID, (UNIX_TIMESTAMP(Time) + 0E0) AS timestamp, Query_time, Mem_max, Process_keys FROM `INFORMATION_SCHEMA`.`CLUSTER_SLOW_QUERY` WHERE Time BETWEEN FROM_UNIXTIME(1720471890) AND FROM_UNIXTIME(1720515091) ORDER BY Query_time DESC LIMIT 100;

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: yibin <huyibin@pingcap.cn>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 15, 2024
Copy link

tiprow bot commented Jul 15, 2024

Hi @yibin87. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
@yibin87 yibin87 changed the title executor,sessionctx: Optimize slow log parsing's splitByColon function executor: Optimize slow log parsing's splitByColon function Jul 15, 2024
@yibin87
Copy link
Contributor Author

yibin87 commented Jul 15, 2024

/cc @crazycs520

@ti-chi-bot ti-chi-bot bot requested a review from crazycs520 July 15, 2024 09:56
@yibin87
Copy link
Contributor Author

yibin87 commented Jul 16, 2024

/cc @xzhangxian1008

@ti-chi-bot ti-chi-bot bot requested a review from xzhangxian1008 July 16, 2024 01:07
// 1. Both field and value string contain only ANSI characters
// 2. value string may be surrounded by brackets, allowed brackets includes "[]" and "{}", like {key: value,{key: value}}
// "[]" can only be nested inside "[]"; "{}" can only be nested inside "{}"
// 3. value string can't contain ' ' character unless it is inside brackets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is origin implementation also contains these restriction?

Copy link
Contributor Author

@yibin87 yibin87 Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, while current slow log satisfies these restrictions and it doesn't seem to be changed frequently in future. Previous implementation provides a broader functionality.


// splitByColon split a line like "field: value field: value..."
// Note:
// 1. Both field and value string contain only ANSI characters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is too restricted. E.g. tidb_redact_log may output non-ansi characters later sometime.
I am thinking if we could just use strings.Index(a, b) to mimic the original regexp.

Copy link
Contributor Author

@yibin87 yibin87 Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this rule can be removed for current algorithm, I'll change it.
Simply use strings.Index(a, b) can't solve the ": " inside "{}" cases.
BTW, splitByColon is not the root entry for slow log parsing, for logs that may contain non-ansi characters we can handle it separately if we doneed it.

Copy link
Contributor Author

@yibin87 yibin87 Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked that ascii characters can't be a part of a valid non-ascii utf8 character, since multiple byte charater always has 110/1110 prefix. Thus both field and value string can contain non ascii characters. Besides, using rune[] will affect performance a lot. So keep the previous algorithm, just update the restriction comments:
For field string, first character should be ascii letters or digits. For value string, whitespace can only be contained inside "{}"/"[]".

pkg/executor/slow_query_test.go Show resolved Hide resolved
return -1
}

func isLetterOrNumeric(b byte) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can put this function in pkg/util/stringutil file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leave it in the local file, since it may be changed for slow log parsing only.

Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
@yibin87 yibin87 requested a review from xhebox July 16, 2024 03:24
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
Signed-off-by: yibin <huyibin@pingcap.cn>
@yibin87
Copy link
Contributor Author

yibin87 commented Jul 16, 2024

/test check-dev2

Copy link

tiprow bot commented Jul 16, 2024

@yibin87: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test check-dev2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 16, 2024
Copy link

tiprow bot commented Jul 16, 2024

@yibin87: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow b349e31 link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

ti-chi-bot bot commented Jul 16, 2024

@yibin87: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/check_dev_2 b349e31 link unknown /test check-dev2

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@yibin87
Copy link
Contributor Author

yibin87 commented Jul 16, 2024

/test check-dev2

Copy link

tiprow bot commented Jul 16, 2024

@yibin87: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test check-dev2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot merged commit de3aba2 into pingcap:master Jul 16, 2024
20 of 23 checks passed
@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. labels Sep 2, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #55794.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #55795.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #55796.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #55797.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.1: #55798.

@ti-chi-bot ti-chi-bot bot removed needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. labels Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow log parse performance improvement for splitByColon
4 participants