Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disttask: add metrics collection for dispatcher #47018

Merged
merged 37 commits into from
Sep 25, 2023

Conversation

JK1Zhang
Copy link
Contributor

@JK1Zhang JK1Zhang commented Sep 16, 2023

What problem does this PR solve?

Issue Number: close #47017

Problem Summary:

What is changed and how it works?

  • A line chart showing the change in the number of waiting/running/cancelling/reverting tasks over time
  • A line chart showing the waiting time of waiting tasks
  • A line chart showing the dispatch time of dispatching tasks

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test

Manual test

1. start TiDB in playground

cd tidb
make
tiup playground  --db 1 --db.binpath ./bin/tidb-server

2. test with mysql

mysql --comments --host 127.0.0.1 --port 4000 -u root

test cmd

drop database if exists test;
create database test;
use test;
set global tidb_enable_dist_task=1;
create table t(a bigint auto_random primary key) partition by hash(a) partitions 8;
insert into t values (), (), (), (), (), ();
insert into t values (), (), (), (), (), ();
insert into t values (), (), (), (), (), ();
insert into t values (), (), (), (), (), ();
create table t1(a bigint auto_random primary key);
insert into t1 values (), (), (), (), (), ();
insert into t1 values (), (), (), (), (), ();
insert into t1 values (), (), (), (), (), ();
insert into t1 values (), (), (), (), (), ();
create table t2(a bigint auto_random primary key) partition by hash(a) partitions 8;
insert into t2 values (), (), (), (), (), ();
insert into t2 values (), (), (), (), (), ();
insert into t2 values (), (), (), (), (), ();
insert into t2 values (), (), (), (), (), ();
create table t3(a bigint auto_random primary key);
insert into t3 values (), (), (), (), (), ();
insert into t3 values (), (), (), (), (), ();
insert into t3 values (), (), (), (), (), ();
insert into t3 values (), (), (), (), (), ();
split table t between (3) and (8646911284551352360) regions 50;
alter table t add index idx(a);
admin check index t idx;
split table t1 between (3) and (8646911284551352360) regions 50;
alter table t1 add index idx(a);
admin check index t1 idx;
split table t2 between (3) and (8646911284551352360) regions 50;
alter table t2 add index idx(a);
admin check index t idx;
split table t3 between (3) and (8646911284551352360) regions 50;
alter table t3 add index idx(a);
admin check index t3 idx;

3. grafana result

image

calculate in grafana (dispatching is happening too fast to see the data points )
panel "task dispatching time" metrics: "time() - tidb_disttask_dispatcher_start_time{status="dispatching"}/1000000"

calculate in dispatcher_manager.go
panel "task dispatching duration" metrics: tidb_disttask_dispatcher_duration{status="dispatching"}
Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 16, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 16, 2023

Hi @JK1Zhang. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added the needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. label Sep 16, 2023
@tiprow
Copy link

tiprow bot commented Sep 16, 2023

Hi @JK1Zhang. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@okJiang okJiang changed the title disttask: add metrics collection for dispatcher [WIP]disttask: add metrics collection for dispatcher Sep 18, 2023
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 18, 2023
Copy link
Member

@okJiang okJiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need more enhancement

disttask/framework/dispatcher/dispatcher_manager.go Outdated Show resolved Hide resolved
metrics/session.go Outdated Show resolved Hide resolved
metrics/session.go Outdated Show resolved Hide resolved
metrics/disttask.go Outdated Show resolved Hide resolved
metrics/disttask.go Outdated Show resolved Hide resolved
metrics/disttask.go Outdated Show resolved Hide resolved
metrics/disttask.go Outdated Show resolved Hide resolved
metrics/disttask.go Show resolved Hide resolved
disttask/framework/dispatcher/dispatcher_manager.go Outdated Show resolved Hide resolved
disttask/framework/dispatcher/dispatcher_manager.go Outdated Show resolved Hide resolved
JK1Zhang and others added 5 commits September 18, 2023 14:47
Co-authored-by: okJiang <jiangxianjie@pingcap.com>
Co-authored-by: okJiang <jiangxianjie@pingcap.com>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 20, 2023
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/needs-tests-checked labels Sep 21, 2023
disttask/framework/dispatcher/dispatcher_manager.go Outdated Show resolved Hide resolved
disttask/framework/dispatcher/dispatcher_manager.go Outdated Show resolved Hide resolved
disttask/framework/dispatcher/dispatcher_manager.go Outdated Show resolved Hide resolved
if dm.checkConcurrencyOverflow(cnt) {
break
}

// TODO: Consider getting these tasks, in addition to the task being worked on..
tasks, err := dm.taskMgr.GetGlobalTasksInStates(proto.TaskStatePending, proto.TaskStateRunning, proto.TaskStateReverting, proto.TaskStateCancelling)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about initializing the metrics with these tasks when the DispatcherManager starts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is precise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DistDDLTaskStarttimeGauge, I think it's not necessary to initialize here because it doesn't change.
For DistDDLTaskGauge, should I go through all the tasks in the tidb_global_task table?

disttask/framework/storage/task_table.go Outdated Show resolved Hide resolved
metrics/disttask.go Outdated Show resolved Hide resolved
metrics/disttask.go Outdated Show resolved Hide resolved
metrics/disttask.go Outdated Show resolved Hide resolved
metrics/disttask.go Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 22, 2023
@ywqzzy
Copy link
Contributor

ywqzzy commented Sep 22, 2023

/cc @ywqzzy

metrics/grafana/tidb.json Outdated Show resolved Hide resolved
Copy link
Contributor

@ywqzzy ywqzzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: EasonBall <592838129@qq.com>
Copy link
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm


// UpdateMetricsForDisptchTask update metrics when a task is added
func UpdateMetricsForDisptchTask(task *proto.Task) {
DistTaskGauge.WithLabelValues(task.Type, WaitingStatus).Set(float64(300))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does 300 means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless , deleted

metrics/disttask.go Outdated Show resolved Hide resolved
metrics/grafana/tidb.json Outdated Show resolved Hide resolved
metrics/grafana/tidb.json Outdated Show resolved Hide resolved
JK1Zhang and others added 4 commits September 25, 2023 21:07
Co-authored-by: EasonBall <592838129@qq.com>
Co-authored-by: D3Hunter <jujj603@gmail.com>
Co-authored-by: D3Hunter <jujj603@gmail.com>
metrics/grafana/tidb.json Outdated Show resolved Hide resolved
Copy link
Collaborator

@Benjamin2037 Benjamin2037 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Sep 25, 2023
@ti-chi-bot ti-chi-bot bot added the lgtm label Sep 25, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 25, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Benjamin2037, okJiang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 25, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 25, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-09-25 13:44:40.795615584 +0000 UTC m=+282470.513957800: ☑️ agreed by Benjamin2037.
  • 2023-09-25 13:57:37.628290687 +0000 UTC m=+283247.346632904: ☑️ agreed by okJiang.

@ti-chi-bot ti-chi-bot bot merged commit 84fe6be into pingcap:master Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

disttask: add metrics collection for dispatcher and scheduler
7 participants