Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobsprofiler: add observability into per processor progress #100488

Closed
adityamaru opened this issue Apr 3, 2023 · 3 comments
Closed

jobsprofiler: add observability into per processor progress #100488

adityamaru opened this issue Apr 3, 2023 · 3 comments
Assignees
Labels
A-disaster-recovery A-jobs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-jobs

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Apr 3, 2023

This issue is to track an idea that @dt had with the recent work in #99522.

We now have the execution diagrams per job, which have every processor with its processor ID.
We have producer metas, sent by those processors, indicating their progress.
If we put the processor ID in the producer meta, and make sure it sends its local fraction complete in addition to the structured completion info (e.g key spans), we could store a job info key for each processor ID, with its most recent progress. Then we could pull them when we're asked to the flow diagram, to augment that diagram with annotations showing the completion per processor.

Epic: CRDB-8964

Jira issue: CRDB-26461

@adityamaru adityamaru added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-jobs labels Apr 3, 2023
@blathers-crl blathers-crl bot added the T-jobs label Apr 3, 2023
@blathers-crl
Copy link

blathers-crl bot commented Apr 3, 2023

cc @cockroachdb/disaster-recovery

@blathers-crl
Copy link

blathers-crl bot commented Apr 3, 2023

cc @cockroachdb/disaster-recovery

adityamaru added a commit to adityamaru/cockroach that referenced this issue Apr 26, 2023
This change does a couple of things:

Previously, an ExportRequest of an empty span
would not send a progress update over the progerss channel
to the coordinator of the job. Even though the Export span was
empty, it is accounted for when calculating the total number of
spans that need to be exported in the backup. So, not sending a
progress update meant that the fraction progress persisted by the
job would not be updated accurately in the face of empty spans.

This change also annotates each progress update with the NodeID
and FlowID from which the progress update is being sent. Furthermore,
we also populate the `CompletedFraction` map on each progress update
with a mapping from the processor ID to the fraction of Export spans
assigned to that processor that have been completed.

These pieces of information will be persisted and used in a future
commit to surface per node, per processor progress information.

Informs: cockroachdb#100488

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Apr 26, 2023
This change teaches the coordinator node of the backup job to
periodically persist the fraction progress per node and per processor
on each node in the `system.job_info` table. This information will
be surfaced in a future change via `SHOW JOB WITH EXECUTION DETAILS`
to provide more observability into how much work each processor has
remaining during a backup.

This change also introduces and InfoKey interface that defines methods
that should be implemented by all info keys used in the job_info table.
This change does not go so far as changing the parameter type of the various
job info storage methods to use the `InfoKey` interface. This will be done in
a future change.

Informs: cockroachdb#100488
Release note: None
store per node, per processor
adityamaru added a commit to adityamaru/cockroach that referenced this issue May 10, 2023
This change does a couple of things:

Previously, an ExportRequest of an empty span
would not send a progress update over the progerss channel
to the coordinator of the job. Even though the Export span was
empty, it is accounted for when calculating the total number of
spans that need to be exported in the backup. So, not sending a
progress update meant that the fraction progress persisted by the
job would not be updated accurately in the face of empty spans.

This change also annotates each progress update with the NodeID
and FlowID from which the progress update is being sent. Furthermore,
we also populate the `CompletedFraction` map on each progress update
with a mapping from the processor ID to the fraction of Export spans
assigned to that processor that have been completed.

These pieces of information will be persisted and used in a future
commit to surface per node, per processor progress information.

Informs: cockroachdb#100488

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue May 11, 2023
This change teaches the coordinator node of the backup job to
periodically persist the fraction progress per node and per processor
on each node in the `system.job_info` table. This information will
be surfaced in a future change via `SHOW JOB WITH EXECUTION DETAILS`
to provide more observability into how much work each processor has
remaining during a backup.

This change also introduces and InfoKey interface that defines methods
that should be implemented by all info keys used in the job_info table.
This change does not go so far as changing the parameter type of the various
job info storage methods to use the `InfoKey` interface. This will be done in
a future change.

Informs: cockroachdb#100488
Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue May 11, 2023
This change teaches `SHOW JOB <jobID> WITH EXECUTION DETAILS`
when run against a backup job to annotate the latest DSP diagram
stored for the backup with per-node, per-processor progress
information. This annotated diagram is then re-serialized and
stored as the most up-to-date DSP diagram for that job.

The backup job is the first of its kind to annotate a DSP diagram
with per-node, per-proc progress information and so this change
adds some logic to deserialize a flow diagram from a URL and
annotate the diagram's processors with progress information.

Informs: cockroachdb#100488

Release note (sql change): `SHOW JOB WITH EXECUTION DETAILS` for
a backup job will regenerate the DistSQL plan diagram with per-node,
per-processor progress information. This will help better understand
the state of a running backup job.
adityamaru added a commit to adityamaru/cockroach that referenced this issue May 16, 2023
This change teaches the coordinator node of the backup job to
periodically persist the fraction progress per node and per processor
on each node in the `system.job_info` table. This information will
be surfaced in a future change via `SHOW JOB WITH EXECUTION DETAILS`
to provide more observability into how much work each processor has
remaining during a backup.

This change also introduces and InfoKey interface that defines methods
that should be implemented by all info keys used in the job_info table.
This change does not go so far as changing the parameter type of the various
job info storage methods to use the `InfoKey` interface. This will be done in
a future change.

Informs: cockroachdb#100488
Release note: None
craig bot pushed a commit that referenced this issue May 17, 2023
102308: jobs, backupccl: persist per node, per processor backup progress  r=dt a=adityamaru

Please refer to individual commits.

Informs: #100488
Release note: None

Co-authored-by: adityamaru <adityamaru@gmail.com>
adityamaru added a commit to adityamaru/cockroach that referenced this issue May 24, 2023
This change teaches `SHOW JOB <jobID> WITH EXECUTION DETAILS`
when run against a backup job to annotate the latest DSP diagram
stored for the backup with per-node, per-processor progress
information. This annotated diagram is then re-serialized and
stored as the most up-to-date DSP diagram for that job.

The backup job is the first of its kind to annotate a DSP diagram
with per-node, per-proc progress information and so this change
adds some logic to deserialize a flow diagram from a URL and
annotate the diagram's processors with progress information.

Informs: cockroachdb#100488

Release note (sql change): `SHOW JOB WITH EXECUTION DETAILS` for
a backup job will regenerate the DistSQL plan diagram with per-node,
per-processor progress information. This will help better understand
the state of a running backup job.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jun 13, 2023
This change teaches `SHOW JOB <jobID> WITH EXECUTION DETAILS`
when run against a backup job to annotate the latest DSP diagram
stored for the backup with per-node, per-processor progress
information. This annotated diagram is then re-serialized and
stored as the most up-to-date DSP diagram for that job.

The backup job is the first of its kind to annotate a DSP diagram
with per-node, per-proc progress information and so this change
adds some logic to deserialize a flow diagram from a URL and
annotate the diagram's processors with progress information.

Informs: cockroachdb#100488

Release note (sql change): `SHOW JOB WITH EXECUTION DETAILS` for
a backup job will regenerate the DistSQL plan diagram with per-node,
per-processor progress information. This will help better understand
the state of a running backup job.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jun 15, 2023
This change teaches `SHOW JOB <jobID> WITH EXECUTION DETAILS`
when run against a backup job to annotate the latest DSP diagram
stored for the backup with per-node, per-processor progress
information. This annotated diagram is then re-serialized and
stored as the most up-to-date DSP diagram for that job.

The backup job is the first of its kind to annotate a DSP diagram
with per-node, per-proc progress information and so this change
adds some logic to deserialize a flow diagram from a URL and
annotate the diagram's processors with progress information.

Informs: cockroachdb#100488

Release note (sql change): `SHOW JOB WITH EXECUTION DETAILS` for
a backup job will regenerate the DistSQL plan diagram with per-node,
per-processor progress information. This will help better understand
the state of a running backup job.
craig bot pushed a commit that referenced this issue Jun 15, 2023
103145: jobsprofiler: annotate backup DSP diagram with per-proc progress r=dt,yuzefovich a=adityamaru

This change teaches `SHOW JOB <jobID> WITH EXECUTION DETAILS`
when run against a backup job to annotate the latest DSP diagram
stored for the backup with per-node, per-processor progress
information. This annotated diagram is then re-serialized and
stored as the most up-to-date DSP diagram for that job.

The backup job is the first of its kind to annotate a DSP diagram
with per-node, per-proc progress information and so this change
adds some logic to deserialize a flow diagram from a URL and
annotate the diagram's processors with progress information.

Informs: #100488

Release note (sql change): `SHOW JOB WITH EXECUTION DETAILS` for
a backup job will regenerate the DistSQL plan diagram with per-node,
per-processor progress information. This will help better understand
the state of a running backup job.

104892: build: upgrade `golang.org/x/exp/typeparams` r=knz,healthy-pod a=rickystewart

Prepare to pull in the latest version of `honnef.co/go/tools`.

Release note: None
Epic: none

104997: build,ci: add a ci check to ensure SHA sums match r=rail a=rickystewart

`bazel fetch `@distdir//:archives`` fetches all the `distdir` files. This has the side effect of checking all SHA sums. In this way we're sure that nobody's mis-typed a SHA.

Epic: none
Release note: None

Co-authored-by: adityamaru <adityamaru@gmail.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
@adityamaru
Copy link
Contributor Author

adityamaru commented Oct 4, 2023

Closing as done for backup. If we decide this is useful for restore, we should open a new issue to track that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery A-jobs C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-jobs
Projects
No open projects
Archived in project
Development

No branches or pull requests

1 participant