-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobsprofiler: add observability into per processor progress #100488
Labels
A-disaster-recovery
A-jobs
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-jobs
Comments
adityamaru
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-jobs
labels
Apr 3, 2023
cc @cockroachdb/disaster-recovery |
exalate-issue-sync
bot
added
T-jobs
and removed
A-disaster-recovery
T-disaster-recovery
labels
Apr 3, 2023
cc @cockroachdb/disaster-recovery |
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Apr 26, 2023
This change does a couple of things: Previously, an ExportRequest of an empty span would not send a progress update over the progerss channel to the coordinator of the job. Even though the Export span was empty, it is accounted for when calculating the total number of spans that need to be exported in the backup. So, not sending a progress update meant that the fraction progress persisted by the job would not be updated accurately in the face of empty spans. This change also annotates each progress update with the NodeID and FlowID from which the progress update is being sent. Furthermore, we also populate the `CompletedFraction` map on each progress update with a mapping from the processor ID to the fraction of Export spans assigned to that processor that have been completed. These pieces of information will be persisted and used in a future commit to surface per node, per processor progress information. Informs: cockroachdb#100488 Release note: None
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Apr 26, 2023
This change teaches the coordinator node of the backup job to periodically persist the fraction progress per node and per processor on each node in the `system.job_info` table. This information will be surfaced in a future change via `SHOW JOB WITH EXECUTION DETAILS` to provide more observability into how much work each processor has remaining during a backup. This change also introduces and InfoKey interface that defines methods that should be implemented by all info keys used in the job_info table. This change does not go so far as changing the parameter type of the various job info storage methods to use the `InfoKey` interface. This will be done in a future change. Informs: cockroachdb#100488 Release note: None store per node, per processor
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
May 10, 2023
This change does a couple of things: Previously, an ExportRequest of an empty span would not send a progress update over the progerss channel to the coordinator of the job. Even though the Export span was empty, it is accounted for when calculating the total number of spans that need to be exported in the backup. So, not sending a progress update meant that the fraction progress persisted by the job would not be updated accurately in the face of empty spans. This change also annotates each progress update with the NodeID and FlowID from which the progress update is being sent. Furthermore, we also populate the `CompletedFraction` map on each progress update with a mapping from the processor ID to the fraction of Export spans assigned to that processor that have been completed. These pieces of information will be persisted and used in a future commit to surface per node, per processor progress information. Informs: cockroachdb#100488 Release note: None
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
May 11, 2023
This change teaches the coordinator node of the backup job to periodically persist the fraction progress per node and per processor on each node in the `system.job_info` table. This information will be surfaced in a future change via `SHOW JOB WITH EXECUTION DETAILS` to provide more observability into how much work each processor has remaining during a backup. This change also introduces and InfoKey interface that defines methods that should be implemented by all info keys used in the job_info table. This change does not go so far as changing the parameter type of the various job info storage methods to use the `InfoKey` interface. This will be done in a future change. Informs: cockroachdb#100488 Release note: None
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
May 11, 2023
This change teaches `SHOW JOB <jobID> WITH EXECUTION DETAILS` when run against a backup job to annotate the latest DSP diagram stored for the backup with per-node, per-processor progress information. This annotated diagram is then re-serialized and stored as the most up-to-date DSP diagram for that job. The backup job is the first of its kind to annotate a DSP diagram with per-node, per-proc progress information and so this change adds some logic to deserialize a flow diagram from a URL and annotate the diagram's processors with progress information. Informs: cockroachdb#100488 Release note (sql change): `SHOW JOB WITH EXECUTION DETAILS` for a backup job will regenerate the DistSQL plan diagram with per-node, per-processor progress information. This will help better understand the state of a running backup job.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
May 16, 2023
This change teaches the coordinator node of the backup job to periodically persist the fraction progress per node and per processor on each node in the `system.job_info` table. This information will be surfaced in a future change via `SHOW JOB WITH EXECUTION DETAILS` to provide more observability into how much work each processor has remaining during a backup. This change also introduces and InfoKey interface that defines methods that should be implemented by all info keys used in the job_info table. This change does not go so far as changing the parameter type of the various job info storage methods to use the `InfoKey` interface. This will be done in a future change. Informs: cockroachdb#100488 Release note: None
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
May 24, 2023
This change teaches `SHOW JOB <jobID> WITH EXECUTION DETAILS` when run against a backup job to annotate the latest DSP diagram stored for the backup with per-node, per-processor progress information. This annotated diagram is then re-serialized and stored as the most up-to-date DSP diagram for that job. The backup job is the first of its kind to annotate a DSP diagram with per-node, per-proc progress information and so this change adds some logic to deserialize a flow diagram from a URL and annotate the diagram's processors with progress information. Informs: cockroachdb#100488 Release note (sql change): `SHOW JOB WITH EXECUTION DETAILS` for a backup job will regenerate the DistSQL plan diagram with per-node, per-processor progress information. This will help better understand the state of a running backup job.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Jun 13, 2023
This change teaches `SHOW JOB <jobID> WITH EXECUTION DETAILS` when run against a backup job to annotate the latest DSP diagram stored for the backup with per-node, per-processor progress information. This annotated diagram is then re-serialized and stored as the most up-to-date DSP diagram for that job. The backup job is the first of its kind to annotate a DSP diagram with per-node, per-proc progress information and so this change adds some logic to deserialize a flow diagram from a URL and annotate the diagram's processors with progress information. Informs: cockroachdb#100488 Release note (sql change): `SHOW JOB WITH EXECUTION DETAILS` for a backup job will regenerate the DistSQL plan diagram with per-node, per-processor progress information. This will help better understand the state of a running backup job.
adityamaru
added a commit
to adityamaru/cockroach
that referenced
this issue
Jun 15, 2023
This change teaches `SHOW JOB <jobID> WITH EXECUTION DETAILS` when run against a backup job to annotate the latest DSP diagram stored for the backup with per-node, per-processor progress information. This annotated diagram is then re-serialized and stored as the most up-to-date DSP diagram for that job. The backup job is the first of its kind to annotate a DSP diagram with per-node, per-proc progress information and so this change adds some logic to deserialize a flow diagram from a URL and annotate the diagram's processors with progress information. Informs: cockroachdb#100488 Release note (sql change): `SHOW JOB WITH EXECUTION DETAILS` for a backup job will regenerate the DistSQL plan diagram with per-node, per-processor progress information. This will help better understand the state of a running backup job.
craig bot
pushed a commit
that referenced
this issue
Jun 15, 2023
103145: jobsprofiler: annotate backup DSP diagram with per-proc progress r=dt,yuzefovich a=adityamaru This change teaches `SHOW JOB <jobID> WITH EXECUTION DETAILS` when run against a backup job to annotate the latest DSP diagram stored for the backup with per-node, per-processor progress information. This annotated diagram is then re-serialized and stored as the most up-to-date DSP diagram for that job. The backup job is the first of its kind to annotate a DSP diagram with per-node, per-proc progress information and so this change adds some logic to deserialize a flow diagram from a URL and annotate the diagram's processors with progress information. Informs: #100488 Release note (sql change): `SHOW JOB WITH EXECUTION DETAILS` for a backup job will regenerate the DistSQL plan diagram with per-node, per-processor progress information. This will help better understand the state of a running backup job. 104892: build: upgrade `golang.org/x/exp/typeparams` r=knz,healthy-pod a=rickystewart Prepare to pull in the latest version of `honnef.co/go/tools`. Release note: None Epic: none 104997: build,ci: add a ci check to ensure SHA sums match r=rail a=rickystewart `bazel fetch `@distdir//:archives`` fetches all the `distdir` files. This has the side effect of checking all SHA sums. In this way we're sure that nobody's mis-typed a SHA. Epic: none Release note: None Co-authored-by: adityamaru <adityamaru@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Closing as done for backup. If we decide this is useful for restore, we should open a new issue to track that. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-disaster-recovery
A-jobs
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-jobs
This issue is to track an idea that @dt had with the recent work in #99522.
We now have the execution diagrams per job, which have every processor with its processor ID.
We have producer metas, sent by those processors, indicating their progress.
If we put the processor ID in the producer meta, and make sure it sends its local fraction complete in addition to the structured completion info (e.g key spans), we could store a job info key for each processor ID, with its most recent progress. Then we could pull them when we're asked to the flow diagram, to augment that diagram with annotations showing the completion per processor.
Epic: CRDB-8964
Jira issue: CRDB-26461
The text was updated successfully, but these errors were encountered: