Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

Add "WatchTask" operation to watch ODR launch tasks and expose more debug info #3306

Merged
merged 13 commits into from
May 2, 2022

Conversation

mitchellh
Copy link
Contributor

@mitchellh mitchellh commented May 2, 2022

Depends on: hashicorp/waypoint-plugin-sdk#71

This adds a new operation type WatchTask that runs on the static runner (the same place as StartTask and StopTask). The WatchTask operation is automatically created as part of the ODR wrapping process, resulting in a job dependency tree that looks like this for any job:

//           ┌────────────────┐
//           │   Start Task   │─────────────┐
//           └────────────────┘             │
//                    │                     │
//          ┌─────────┴─────────┐           │
//          ▼                   ▼           │
// ┌────────────────┐   ┌─ ── ── ── ── ──   │
// │   Watch Task   │   │      Job       │  │
// └────────────────┘   └ ── ── ── ── ── ┘  │
//          │                    │          │
//          └─────────┬──────────┘          │
//                    ▼                     │
//           ┌────────────────┐             │
//           │   Stop Task    │◀────────────┘
//           └────────────────┘

Why?

The primary motivator is pipelines but there are more general improvements as well.

For pipelines, this is going to let us execute non-Waypoint-runner tasks and (1) know when it exists and (2) stream the output back to the Waypoint server and track it with the job system. Currently, all tasks are run via a Waypoint runner so we lean on it doing the right things via runner APIs to let us know. This gives us a second source of information.

More generally, this exposes previously difficult-to-reach debug information. If the Waypoint runner failed to launch for any reason, it was very unclear what went wrong. With this, the watch task should still get the platform logs, so we should be able to see errors before or after the runner runs! This should dramatically limit the number of times Waypoint users/operators/maintainers need to reach into the platform directly to see "why isn't my task running".

Semantics

  • The watch task and source job (build, deploy, etc.) run concurrently.
  • The stop task depends on both watch and job so it will not stop until both are done.
  • The watch task is allowed to fail and it doesn't impact anything (therefore, we can merge this without implementing watchtask for all task launchers)

A Building Block

This PR doesn't expose the watch task information easily in any way. This PR is more about establishing the building block.

From here, we could discuss how we expose this information more readily in UIs, CLIs, etc.

TODO (doesn't block this PR)

This PR only implements task watching for Docker. We need to implement task watching for other platforms (esp K8S). But as noted above, if they are unimplemented, it doesn't change the current behavior -- watch tasks are allowed to fail currently.

Copy link
Member

@briancain briancain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me! I am requesting changes because the Trigger HTTP handler expects to receive a job triple with the source job being the middle job and this breaks that assumption. It should be an easy fix to filter those jobs properly on the trigger http handler so it keeps working on main.

We should remember to add WatchTask to pb.Task and the state machine driver at some point too, perhaps in a future PR as we finish out WatchTask for the remaining builtin plugins.

// ▼ │
// ┌────────────────┐ │
// │ Stop Task │◀────────────┘
// └────────────────┘
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love having these directly in the code docs! ✨ 📈

internal/runner/operation_task.go Outdated Show resolved Hide resolved
@mitchellh
Copy link
Contributor Author

Fixed triggers with @briancain. They no longer require hardcoded offsets and are resilient to all future changes we'd make to how we wrap jobs. 😄

Copy link
Member

@briancain briancain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 👍🏻

@@ -70,25 +70,27 @@ func (s *Service) queueJobMulti(
ctx context.Context,
req []*pb.QueueJobRequest,
) ([]*pb.QueueJobResponse, error) {
jobs := make([]*pb.Job, 0, len(req))
jobQueue := make([]*pb.Job, 0, len(req)*4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume *4 is just a guess to so there isn't a slice resize operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

Comment on lines +91 to +93
resp := make([]*pb.QueueJobResponse, len(jobIds))
for i, id := range jobIds {
resp[i] = &pb.QueueJobResponse{JobId: id}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great simplification of the logic to hide the ODR jobs.

Copy link
Contributor

@evanphx evanphx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note that this will create a lot more pressure on the job log system because every "normal" job in an ODR context (ie the common context) will generate double the output, as WatchTask will now contribute to the job log storage as well.

That's not a big issue, but we should revise what our limits are on the logs is all.

@mitchellh
Copy link
Contributor Author

Yeah, the log requirement for jobs will roughly double (since they're mostly duplicated). A lot we can do to mitigate this if this ends up being a problem.

@mitchellh mitchellh merged commit 191584c into main May 2, 2022
@mitchellh mitchellh deleted the f-watchtask branch May 2, 2022 21:53
Copy link
Contributor

@izaaklauer izaaklauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants