-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve](routine load) add more metrics to observe the routine load job #48209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 31544 ms |
TPC-DS: Total hot run time: 184681 ms |
ClickBench: Total hot run time: 30.93 s |
|
run buildall |
TPC-H: Total hot run time: 31272 ms |
TPC-DS: Total hot run time: 191523 ms |
ClickBench: Total hot run time: 29.93 s |
|
TeamCity be ut coverage result: |
|
run buildall |
TPC-H: Total hot run time: 31319 ms |
TPC-DS: Total hot run time: 183975 ms |
ClickBench: Total hot run time: 30.21 s |
|
TeamCity be ut coverage result: |
b9113ba to
f0f61b8
Compare
|
run buildall |
TPC-H: Total hot run time: 31400 ms |
TPC-DS: Total hot run time: 183959 ms |
ClickBench: Total hot run time: 30.81 s |
|
TeamCity be ut coverage result: |
f0f61b8 to
3e118f0
Compare
|
run buildall |
TPC-H: Total hot run time: 31815 ms |
liaoxin01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…job (#48209) ### What problem does this PR solve? related #48511 Add more metrics to observe the routine load job: | Metrics | Module | Description | | ---------------------------------- | ------ | ------------------------------- | | routine_load_get_msg_latency | BE | Time to pull a Kafka message | | routine_load_get_msg_count | BE | Number of times pulling Kafka messages | | routine_load_consume_bytes | BE | Total data volume consumed from Kafka | | routine_load_consume_rows | BE | Total number of rows consumed from Kafka | | routine_load_task_execute_time | FE | Task execution time | | routine_load_task_execute_count | FE | Task execution count | | routine_load_get_meta_latency | FE | Delay in obtaining Kafka metadata | | routine_load_get_meta_count | FE | Number of times obtaining Kafka metadata | | routine_load_get_meta_fail_count | FE | Number of failures in obtaining metadata | | routine_load_received_bytes | FE | Total data volume consumed | | routine_load_received_rows | FE | Total number of rows consumed |
…job (#48209) ### What problem does this PR solve? related #48511 Add more metrics to observe the routine load job: | Metrics | Module | Description | | ---------------------------------- | ------ | ------------------------------- | | routine_load_get_msg_latency | BE | Time to pull a Kafka message | | routine_load_get_msg_count | BE | Number of times pulling Kafka messages | | routine_load_consume_bytes | BE | Total data volume consumed from Kafka | | routine_load_consume_rows | BE | Total number of rows consumed from Kafka | | routine_load_task_execute_time | FE | Task execution time | | routine_load_task_execute_count | FE | Task execution count | | routine_load_get_meta_latency | FE | Delay in obtaining Kafka metadata | | routine_load_get_meta_count | FE | Number of times obtaining Kafka metadata | | routine_load_get_meta_fail_count | FE | Number of failures in obtaining metadata | | routine_load_received_bytes | FE | Total data volume consumed | | routine_load_received_rows | FE | Total number of rows consumed |
…#48963) ### What problem does this PR solve? Part IV of #48511 doc apache/doris-website#2196 **Introduce routine load job statistic system table:** ``` mysql> show create table information_schema.routine_load_job\G *************************** 1. row *************************** Table: routine_load_job Create Table: CREATE TABLE `routine_load_job` ( `JOB_ID` text NULL, `JOB_NAME` text NULL, `CREATE_TIME` text NULL, `PAUSE_TIME` text NULL, `END_TIME` text NULL, `DB_NAME` text NULL, `TABLE_NAME` text NULL, `STATE` text NULL, `CURRENT_TASK_NUM` text NULL, `JOB_PROPERTIES` text NULL, `DATA_SOURCE_PROPERTIES` text NULL, `CUSTOM_PROPERTIES` text NULL, `STATISTIC` text NULL, `PROGRESS` text NULL, `LAG` text NULL, `REASON_OF_STATE_CHANGED` text NULL, `ERROR_LOG_URLS` text NULL, `USER_NAME` text NULL, `CURRENT_ABORT_TASK_NUM` int NULL, `IS_ABNORMAL_PAUSE` boolean NULL ) ENGINE=SCHEMA; 1 row in set (0.00 sec) ``` **There are some benefits to empower job with SQL query capability for statistical information:** - It can be used in conjunction with metrics add through #48209 to roughly locate abnormal jobs when Grafana alarms, and the following SQL can be used: ``` SELECT JOB_NAME FROM information_schema.routine_load_job_statistics WHERE CURRENT_ABORT_TASK_NUM > 0 OR IS_ABNORMAL_PAUSE = TRUE; ``` - User can use the `select * from information_schema.routine_load_job` instead of the `show routine load`. The advantage is that the `show routine load` can only be searched by name, but SQL can be very flexible in locating jobs
…apache#48963) Part IV of apache#48511 doc apache/doris-website#2196 **Introduce routine load job statistic system table:** ``` mysql> show create table information_schema.routine_load_job\G *************************** 1. row *************************** Table: routine_load_job Create Table: CREATE TABLE `routine_load_job` ( `JOB_ID` text NULL, `JOB_NAME` text NULL, `CREATE_TIME` text NULL, `PAUSE_TIME` text NULL, `END_TIME` text NULL, `DB_NAME` text NULL, `TABLE_NAME` text NULL, `STATE` text NULL, `CURRENT_TASK_NUM` text NULL, `JOB_PROPERTIES` text NULL, `DATA_SOURCE_PROPERTIES` text NULL, `CUSTOM_PROPERTIES` text NULL, `STATISTIC` text NULL, `PROGRESS` text NULL, `LAG` text NULL, `REASON_OF_STATE_CHANGED` text NULL, `ERROR_LOG_URLS` text NULL, `USER_NAME` text NULL, `CURRENT_ABORT_TASK_NUM` int NULL, `IS_ABNORMAL_PAUSE` boolean NULL ) ENGINE=SCHEMA; 1 row in set (0.00 sec) ``` **There are some benefits to empower job with SQL query capability for statistical information:** - It can be used in conjunction with metrics add through apache#48209 to roughly locate abnormal jobs when Grafana alarms, and the following SQL can be used: ``` SELECT JOB_NAME FROM information_schema.routine_load_job_statistics WHERE CURRENT_ABORT_TASK_NUM > 0 OR IS_ABNORMAL_PAUSE = TRUE; ``` - User can use the `select * from information_schema.routine_load_job` instead of the `show routine load`. The advantage is that the `show routine load` can only be searched by name, but SQL can be very flexible in locating jobs
…apache#48963) Part IV of apache#48511 doc apache/doris-website#2196 **Introduce routine load job statistic system table:** ``` mysql> show create table information_schema.routine_load_job\G *************************** 1. row *************************** Table: routine_load_job Create Table: CREATE TABLE `routine_load_job` ( `JOB_ID` text NULL, `JOB_NAME` text NULL, `CREATE_TIME` text NULL, `PAUSE_TIME` text NULL, `END_TIME` text NULL, `DB_NAME` text NULL, `TABLE_NAME` text NULL, `STATE` text NULL, `CURRENT_TASK_NUM` text NULL, `JOB_PROPERTIES` text NULL, `DATA_SOURCE_PROPERTIES` text NULL, `CUSTOM_PROPERTIES` text NULL, `STATISTIC` text NULL, `PROGRESS` text NULL, `LAG` text NULL, `REASON_OF_STATE_CHANGED` text NULL, `ERROR_LOG_URLS` text NULL, `USER_NAME` text NULL, `CURRENT_ABORT_TASK_NUM` int NULL, `IS_ABNORMAL_PAUSE` boolean NULL ) ENGINE=SCHEMA; 1 row in set (0.00 sec) ``` **There are some benefits to empower job with SQL query capability for statistical information:** - It can be used in conjunction with metrics add through apache#48209 to roughly locate abnormal jobs when Grafana alarms, and the following SQL can be used: ``` SELECT JOB_NAME FROM information_schema.routine_load_job_statistics WHERE CURRENT_ABORT_TASK_NUM > 0 OR IS_ABNORMAL_PAUSE = TRUE; ``` - User can use the `select * from information_schema.routine_load_job` instead of the `show routine load`. The advantage is that the `show routine load` can only be searched by name, but SQL can be very flexible in locating jobs
…#48963) (#49284) pick #48963 Part IV of #48511 doc apache/doris-website#2196 **Introduce routine load job statistic system table:** ``` mysql> show create table information_schema.routine_load_job\G *************************** 1. row *************************** Table: routine_load_job Create Table: CREATE TABLE `routine_load_job` ( `JOB_ID` text NULL, `JOB_NAME` text NULL, `CREATE_TIME` text NULL, `PAUSE_TIME` text NULL, `END_TIME` text NULL, `DB_NAME` text NULL, `TABLE_NAME` text NULL, `STATE` text NULL, `CURRENT_TASK_NUM` text NULL, `JOB_PROPERTIES` text NULL, `DATA_SOURCE_PROPERTIES` text NULL, `CUSTOM_PROPERTIES` text NULL, `STATISTIC` text NULL, `PROGRESS` text NULL, `LAG` text NULL, `REASON_OF_STATE_CHANGED` text NULL, `ERROR_LOG_URLS` text NULL, `USER_NAME` text NULL, `CURRENT_ABORT_TASK_NUM` int NULL, `IS_ABNORMAL_PAUSE` boolean NULL ) ENGINE=SCHEMA; 1 row in set (0.00 sec) ``` **There are some benefits to empower job with SQL query capability for statistical information:** - It can be used in conjunction with metrics add through #48209 to roughly locate abnormal jobs when Grafana alarms, and the following SQL can be used: ``` SELECT JOB_NAME FROM information_schema.routine_load_job_statistics WHERE CURRENT_ABORT_TASK_NUM > 0 OR IS_ABNORMAL_PAUSE = TRUE; ``` - User can use the `select * from information_schema.routine_load_job` instead of the `show routine load`. The advantage is that the `show routine load` can only be searched by name, but SQL can be very flexible in locating jobs ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…job (#48209) ### What problem does this PR solve? related #48511 Add more metrics to observe the routine load job: | Metrics | Module | Description | | ---------------------------------- | ------ | ------------------------------- | | routine_load_get_msg_latency | BE | Time to pull a Kafka message | | routine_load_get_msg_count | BE | Number of times pulling Kafka messages | | routine_load_consume_bytes | BE | Total data volume consumed from Kafka | | routine_load_consume_rows | BE | Total number of rows consumed from Kafka | | routine_load_task_execute_time | FE | Task execution time | | routine_load_task_execute_count | FE | Task execution count | | routine_load_get_meta_latency | FE | Delay in obtaining Kafka metadata | | routine_load_get_meta_count | FE | Number of times obtaining Kafka metadata | | routine_load_get_meta_fail_count | FE | Number of failures in obtaining metadata | | routine_load_received_bytes | FE | Total data volume consumed | | routine_load_received_rows | FE | Total number of rows consumed |
…#48963) (#49286) pick #48963 Part IV of #48511 doc apache/doris-website#2196 **Introduce routine load job statistic system table:** ``` mysql> show create table information_schema.routine_load_job\G *************************** 1. row *************************** Table: routine_load_job Create Table: CREATE TABLE `routine_load_job` ( `JOB_ID` text NULL, `JOB_NAME` text NULL, `CREATE_TIME` text NULL, `PAUSE_TIME` text NULL, `END_TIME` text NULL, `DB_NAME` text NULL, `TABLE_NAME` text NULL, `STATE` text NULL, `CURRENT_TASK_NUM` text NULL, `JOB_PROPERTIES` text NULL, `DATA_SOURCE_PROPERTIES` text NULL, `CUSTOM_PROPERTIES` text NULL, `STATISTIC` text NULL, `PROGRESS` text NULL, `LAG` text NULL, `REASON_OF_STATE_CHANGED` text NULL, `ERROR_LOG_URLS` text NULL, `USER_NAME` text NULL, `CURRENT_ABORT_TASK_NUM` int NULL, `IS_ABNORMAL_PAUSE` boolean NULL ) ENGINE=SCHEMA; 1 row in set (0.00 sec) ``` **There are some benefits to empower job with SQL query capability for statistical information:** - It can be used in conjunction with metrics add through #48209 to roughly locate abnormal jobs when Grafana alarms, and the following SQL can be used: ``` SELECT JOB_NAME FROM information_schema.routine_load_job_statistics WHERE CURRENT_ABORT_TASK_NUM > 0 OR IS_ABNORMAL_PAUSE = TRUE; ``` - User can use the `select * from information_schema.routine_load_job` instead of the `show routine load`. The advantage is that the `show routine load` can only be searched by name, but SQL can be very flexible in locating jobs ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…job (apache#48209) ### What problem does this PR solve? related apache#48511 Add more metrics to observe the routine load job: | Metrics | Module | Description | | ---------------------------------- | ------ | ------------------------------- | | routine_load_get_msg_latency | BE | Time to pull a Kafka message | | routine_load_get_msg_count | BE | Number of times pulling Kafka messages | | routine_load_consume_bytes | BE | Total data volume consumed from Kafka | | routine_load_consume_rows | BE | Total number of rows consumed from Kafka | | routine_load_task_execute_time | FE | Task execution time | | routine_load_task_execute_count | FE | Task execution count | | routine_load_get_meta_latency | FE | Delay in obtaining Kafka metadata | | routine_load_get_meta_count | FE | Number of times obtaining Kafka metadata | | routine_load_get_meta_fail_count | FE | Number of failures in obtaining metadata | | routine_load_received_bytes | FE | Total data volume consumed | | routine_load_received_rows | FE | Total number of rows consumed |
…apache#48963) ### What problem does this PR solve? Part IV of apache#48511 doc apache/doris-website#2196 **Introduce routine load job statistic system table:** ``` mysql> show create table information_schema.routine_load_job\G *************************** 1. row *************************** Table: routine_load_job Create Table: CREATE TABLE `routine_load_job` ( `JOB_ID` text NULL, `JOB_NAME` text NULL, `CREATE_TIME` text NULL, `PAUSE_TIME` text NULL, `END_TIME` text NULL, `DB_NAME` text NULL, `TABLE_NAME` text NULL, `STATE` text NULL, `CURRENT_TASK_NUM` text NULL, `JOB_PROPERTIES` text NULL, `DATA_SOURCE_PROPERTIES` text NULL, `CUSTOM_PROPERTIES` text NULL, `STATISTIC` text NULL, `PROGRESS` text NULL, `LAG` text NULL, `REASON_OF_STATE_CHANGED` text NULL, `ERROR_LOG_URLS` text NULL, `USER_NAME` text NULL, `CURRENT_ABORT_TASK_NUM` int NULL, `IS_ABNORMAL_PAUSE` boolean NULL ) ENGINE=SCHEMA; 1 row in set (0.00 sec) ``` **There are some benefits to empower job with SQL query capability for statistical information:** - It can be used in conjunction with metrics add through apache#48209 to roughly locate abnormal jobs when Grafana alarms, and the following SQL can be used: ``` SELECT JOB_NAME FROM information_schema.routine_load_job_statistics WHERE CURRENT_ABORT_TASK_NUM > 0 OR IS_ABNORMAL_PAUSE = TRUE; ``` - User can use the `select * from information_schema.routine_load_job` instead of the `show routine load`. The advantage is that the `show routine load` can only be searched by name, but SQL can be very flexible in locating jobs
### What problem does this PR solve? come from: #48209 ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
### What problem does this PR solve? come from: #48209 ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
What problem does this PR solve?
related #48511
Add more metrics to observe the routine load job:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)