Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug: Add buildx debug command #2006

Merged
merged 7 commits into from
Oct 18, 2023
Merged

Conversation

ktock
Copy link
Collaborator

@ktock ktock commented Aug 17, 2023

This commit refactors the debugger flags by adding buildx debug command that can be used for launching the monitor.
This implements the following features proposed by @jedevc and @crazy-max (#1104 (comment))

  • buildx debug-shell would just become buildx debug shell or even just buildx debug.

  • buildx build would remain unchanged from the current user experience. We wouldn't have any debugging flags here, it's all about build (though we still should use the controller api).

  • buildx debug build would allow all the same options that are currently in build to be specified after the build component. However, we could add any generic debugging flags before the build component, such as --on, e.g.:

    $ buildx debug --on=error build . --target <target-in-dockerfile>
    

Examples

This launches a monitor after the build. /bin/sh will be invoked in the result image.

$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx debug --invoke /bin/sh build .

This launches a monitor after the build only when the build fails. /bin/sh will be invoked in the result image.

$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx debug --on=error --invoke=/bin/sh build .

The following immediately launches the monitor.

$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx debug

Notes

  • As the debugger-related flags are available only under buildx debug and this commit introduces buildx debug for launching the monitor directly, this commit removed buildx build --invoke=debug-shell flag. Please let me know if we shouldn't remove this.
  • I've implemented buildx debug shell for launching monitor but not buildx debug. Please let me know if we should have both of them. Implemented buildx debug based on the comment debug: Add buildx debug command #2006 (comment) .
  • As this PR focuses on basic features of buildx deubg, the following proposal isn't included in this PR.
    • That way, we split out the debugging flags from the build flags, and also allow integrating that more neatly with bake in the future:

      $ buildx debug --on=error bake <target>
      
    • buildx debug would allow "extra" args after a -- component, which would be run directly in the monitor, before stdin is connected. For example, to break on line 10, and drop into a shell:

      $ buildx debug build . --target <target-in-dockerfile> -- break 10\; exec sh
      

@ktock ktock force-pushed the debug-command branch 3 times, most recently from 80bb8e0 to 274c1ce Compare August 18, 2023 04:38
@ktock
Copy link
Collaborator Author

ktock commented Aug 18, 2023

It seems that we need to fix https://github.com/docker/docs as well to pass the CI?

docs-upstream / validate / run (pull_request) : https://github.com/docker/buildx/actions/runs/5898858287/job/16000556956?pr=2006#step:9:4159

 #23 238.2   Liquid Exception: undefined method `gsub' for nil:NilClass in engine/reference/commandline/buildx_debug-shell.md
#23 238.2                     ------------------------------------------------
#23 238.2       Jekyll 4.2.2   Please append `--trace` to the `build` command 
#23 238.2                      for any additional information or backtrace. 
#23 238.2                     ------------------------------------------------

@crazy-max
Copy link
Member

It seems that we need to fix https://github.com/docker/docs as well to pass the CI?

docs-upstream / validate / run (pull_request) : https://github.com/docker/buildx/actions/runs/5898858287/job/16000556956?pr=2006#step:9:4159

 #23 238.2   Liquid Exception: undefined method `gsub' for nil:NilClass in engine/reference/commandline/buildx_debug-shell.md
#23 238.2                     ------------------------------------------------
#23 238.2       Jekyll 4.2.2   Please append `--trace` to the `build` command 
#23 238.2                      for any additional information or backtrace. 
#23 238.2                     ------------------------------------------------

Ah right that's because we have a stub file upstream for debug-shell that is now removed: https://github.com/docker/docs/blob/main/engine/reference/commandline/buildx_debug-shell.md. We didn't expect commands to be removed when this was implemented tbh 😅

commands/root.go Outdated
@@ -87,8 +88,10 @@ func addCommands(cmd *cobra.Command, dockerCli command.Cli) {
imagetoolscmd.RootCmd(dockerCli, imagetoolscmd.RootOptions{Builder: &opts.builder}),
)
if isExperimental() {
cmd.AddCommand(debugcmd.RootCmd(dockerCli, func(dops debugcmd.DebugOptions) *cobra.Command {
return buildCmd(dockerCli, opts, dops)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are just debugging the build cmd so sounds good to have buildx debug defaulting to build atm but could be any root cmd in the future such as bake with buildx debug bake so we might consider passing cmd instead and have a "debug" interface for each command supporting debug. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! It sounds good to me. I've fixed the patch to pass *cobra.Command with the debugger interface to debugcmd.RootCmd. PTAL 🙏

@ktock ktock force-pushed the debug-command branch 2 times, most recently from 934e927 to 55691f3 Compare August 18, 2023 08:54
@jedevc jedevc self-requested a review August 18, 2023 10:28
var options control.ControlOptions
var progressMode string

cmd := &cobra.Command{
Use: "debug-shell",
Use: "shell",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we could just have shell be the top-level command, so docker buildx debug drops you directly into the shell.

Then we don't need a separate command for it, and it kind of acts as debug is your portal to all debugging stuff. The top-level debug command needs to do something anyways, and to me this makes the most sense.

Also, if we drop shell, it means that we could potentially have every command after debug can be run without the debug - debug build corresponds to build, debug bake corresponds to bake, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! Fixed the patch to launch the debugger directly by buildx debug command.

@jedevc
Copy link
Collaborator

jedevc commented Aug 18, 2023

Aha, thanks @ktock 🎉 This looks like a perfect start, just one real comment, otherwise looks good!


// DebugConfig is a user-specified configuration for the debugger.
type DebugConfig struct {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: empty line (also below)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic: is there any way we can get this linted / auto-formatted? I know that gofumpt has a rule for this, but there's some other rules in there that are a bit more controversial IMO. @crazy-max any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at gofumpt and wsl linters and don't see any rules for this case 🤔

Copy link
Collaborator

@jedevc jedevc Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah they have rules like this for functions at least in "No empty lines around function bodies" but not for generic blocks

@crazy-max
Copy link
Member

@ktock Following @dvdksn comment in docker/docs#17947 (comment), I have disabled the docs-upstream workflow. Can you force push to check if it works?

@ktock
Copy link
Collaborator Author

ktock commented Aug 21, 2023

Thanks!

@@ -515,8 +515,7 @@ func buildCmd(dockerCli command.Cli, rootOpts *rootOptions) *cobra.Command {
flags.StringVar(&options.provenance, "provenance", "", `Shorthand for "--attest=type=provenance"`)

if isExperimental() {
flags.StringVar(&invokeFlag, "invoke", "", "Invoke a command after the build")
flags.SetAnnotation("invoke", "experimentalCLI", nil)
// TODO: move this to debug command if needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

root, detach and server-config flags don't look necessary anymore for build command right? Looks to be already available in debug root cmd.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? Technically a non debugged build can still run on the remote controller, so maybe best to leave it here for now.

Copy link
Collaborator

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ktock!

This gets closer to what I was imagining 🎉

I wonder, (and maybe this is something to look into with the "extra" args after a -- component) if we should always drop into a shell when using a debug subcommand? To me, that would feel like a logical default.

PTAL @tonistiigi

@crazy-max crazy-max requested a review from tonistiigi August 24, 2023 10:40
@ktock ktock force-pushed the debug-command branch 2 times, most recently from e18f57d to da87cb4 Compare August 31, 2023 07:41
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When I run buildx debug (no arguments) and then exec sh I get
Process "i7xkjhw104urtoth9pboe0ug6" started. Press Ctrl-a-c to switch to that process.
(buildx) Switched IO
Switched IO

But there is no process, nor build context.

  • I'm not sure if this is specifically in this PR but enabling debugger breaks stacktraces. Without the opt-in I get.
Dockerfile:60
--------------------
  59 |     ARG TARGETPLATFORM
  60 | >>> RUN2 --mount=type=bind,target=. \
  61 | >>>   --mount=type=cache,target=/root/.cache \
  62 | >>>   --mount=type=cache,target=/go/pkg/mod \
  63 | >>>   --mount=type=bind,from=buildx-version,source=/buildx-version,target=/buildx-version <<EOT
  64 |       set -e
--------------------
ERROR: failed to solve: dockerfile parse error on line 60: unknown instruction: RUN2 (did you mean RUN?)

but with this only

ERROR: dockerfile parse error on line 60: unknown instruction: RUN2 (did you mean RUN?)

This is without any extra flags, just with BUILDX_EXPERIMANTAL=1 docker buildx debug build .

  • docker buildx debug --on=error build . with Dockerfile error I don't see any error printed at all, nor stacktrace. Only after I close the monitor I see the error printed.

  • I think --on should default to error on debug build. Atm. I don't see any differences between buildx build and buildx debug build.

  • Issuing rollback command always seems to return context canceled error.

Interactive container was restarted with process "4qa81kfpppzkoawwhh052bbn4". Press Ctrl-a-c to switch to the new container
(buildx) Switched IO
ERROR: failed to exec process: context canceled
ERROR: failed to exec process: context canceled
  • Follow-up. --on=error on a container error gets me in the container but there is no context of what happened, what was the last command etc. I think the help command or monitor messages should give context about what is the current interactive context (build result for specific target, error result from command) so there is context of what gets run on exec/reload/rollback.

  • Follow-up. We need ls command that would list the files in the current dir. If I get error like runc run failed: unable to start container process: exec: "sh": executable file not found in $PATH then I have no idea what I'm missing. This could also be exec of debug image what has the current mounts mounted somewhere.

ktock added 6 commits October 13, 2023 12:09
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
@ktock
Copy link
Collaborator Author

ktock commented Oct 13, 2023

@tonistiigi Thanks for the comments!

there is no process, nor build context.

buildx debug launches a monitor but it's not attached to any session in buildx so no process is available. (attach command allows attaching to other sessions from the monitor)
Fixed to return explicit error in these cases (ded91da).

In the future, we should create a new session for these cases with user-configured "default" image.

enabling debugger breaks stacktraces

Fixed (8da8ee2)

I don't see any error printed at all, nor stacktrace

Fixed(0dd89f6)

I think --on should default to error on debug build. Atm.

Fixed (5a0e4c1).

Issuing rollback command always seems to return context canceled error.

This is because rollback cancels the previously running process and creates the new one.
This is actually not an error to be reported to the user so I've changed this to be a debug level log (6db8569).

The following-ups will be separated PRs.

Follow-up. --on=error on a container error gets me in the container but there is no context of what happened, what was the last command etc. I think the help command or monitor messages should give context about what is the current interactive context (build result for specific target, error result from command) so there is context of what gets run on exec/reload/rollback.
Follow-up. We need ls command that would list the files in the current dir. If I get error like runc run failed: unable to start container process: exec: "sh": executable file not found in $PATH then I have no idea what I'm missing. This could also be exec of debug image what has the current mounts mounted somewhere.

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think we can merge this now in the current state, but some more follow-ups:

  • Let's say I have two different types of errors. One is wrong Dockerfile command and another is process error.
Dockerfile:60
--------------------
  59 |     ARG TARGETPLATFORM
  60 | >>> RUN2 --mount=type=bind,target=. \
  61 | >>>   --mount=type=cache,target=/root/.cache \
  62 | >>>   --mount=type=cache,target=/go/pkg/mod \
  63 | >>>   --mount=type=bind,from=buildx-version,source=/buildx-version,target=/buildx-version <<EOT
  64 |       set -e
--------------------
ERROR: dockerfile parse error on line 60: unknown instruction: RUN2 (did you mean RUN?)
[+] Building 0.0s (0/0)                                                                                                                                                                                                         docker:desktop-linux
Launching interactive container. Press Ctrl-a-c to switch to monitor console
Interactive container was restarted with process "o5e8x1ty9nn2j93m62b8zmhdn". Press Ctrl-a-c to switch to the new container
Switched IO
------
Dockerfile:60
--------------------
  59 |     ARG TARGETPLATFORM
  60 | >>> RUN --mount=type=bind,target=. \
  61 | >>>   --mount=type=cache,target=/root/.cache \
  62 | >>>   --mount=type=cache,target=/go/pkg/mod \
  63 | >>>   --mount=type=bind,from=buildx-version,source=/buildx-version,target=/buildx-version <<EOT
  64 | >>>   set -e
  65 | >>>   xx-go2 --wrap
  66 | >>>   DESTDIR=/usr/bin VERSION=$(cat /buildx-version/version) REVISION=$(cat /buildx-version/revision) GO_EXTRA_LDFLAGS="-s -w" ./hack/build
  67 | >>>   xx-verify --static /usr/bin/docker-buildx
  68 | >>> EOT
  69 |
--------------------
ERROR: process "/bin/sh -c   set -e\n  xx-go2 --wrap\n  DESTDIR=/usr/bin VERSION=$(cat /buildx-version/version) REVISION=$(cat /buildx-version/revision) GO_EXTRA_LDFLAGS=\"-s -w\" ./hack/build\n  xx-verify --static /usr/bin/docker-buildx\n" did not complete successfully: exit code: 127
[+] Building 0.0s (0/0)                                                                                                                                                                                                         docker:desktop-linux
Launching interactive container. Press Ctrl-a-c to switch to monitor console
Interactive container was restarted with process "l3x31fm3i08owokoxwtazeuuw". Press Ctrl-a-c to switch to the new container
/ #

As expected only the second one is debuggable (only second one opens shell as well). But from the output they print same messages about interactive containers and switching IO. It should be more clear that these are different types of errors, why first one does not create execution context and what runs in the shell of second one.

  • There is an error in Dockerfile command (eg. change RUN to RUN2). I get an error. Now I run reload. It builds up to the Dockerfile error again but this time I don't see any error. It just shows:
(buildx) reload
[+] Building 0.6s (4/4) FINISHED                                                                                                                                                                                                docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                                                                            0.0s
 => => transferring dockerfile: 4.74kB                                                                                                                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                               0.0s
 => => transferring context: 45B                                                                                                                                                                                                                0.0s
 => resolve image config for docker.io/docker/dockerfile:1                                                                                                                                                                                      0.5s
 => CACHED docker-image://docker.io/docker/dockerfile:1@sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021                                                                                                                 0.0s
Interactive container was restarted with process "p1zsu784bbr3orf8t1g5by9j7". Press Ctrl-a-c to switch to the new container
Switched IO
Switched IO
(buildx)
  • I run debug build . and get an error. I fix the error and issue reload. Now my build succeeds. I run some more interactive processes and eventually close the debugger. Now I get an error.
(buildx)
[+] Building 0.0s (0/0)                                                                                                                                                                                                         docker:desktop-linux
Dockerfile:60
--------------------
  59 |     ARG TARGETPLATFORM
  60 | >>> RUN2 --mount=type=bind,target=. \
  61 | >>>   --mount=type=cache,target=/root/.cache \
  62 | >>>   --mount=type=cache,target=/go/pkg/mod \
  63 | >>>   --mount=type=bind,from=buildx-version,source=/buildx-version,target=/buildx-version <<EOT
  64 |       set -e
--------------------
ERROR: dockerfile parse error on line 60: unknown instruction: RUN2 (did you mean RUN?)

What I think is the error from the very first build invocation.

  • (We can discuss this more in the issue before any actual work). When I reach an error I don't think there is a way to switch to the clean state before the failed command to try running it again. My options are to reload to rebuild from source or rollback that resets my local changes but I still seem to have the state that the command that failed generated before it errored.

@ktock
Copy link
Collaborator Author

ktock commented Oct 18, 2023

Thanks for the feedback.

The following will be fixed by #2086

  • There is an error in Dockerfile command (eg. change RUN to RUN2). I get an error. Now I run reload. It builds up to the Dockerfile error again but this time I don't see any error.
  • I run debug build . and get an error. I fix the error and issue reload. Now my build succeeds. I run some more interactive processes and eventually close the debugger. Now I get an error.

I think the following is already done by rollback --init

  • (We can discuss this more in the issue before any actual work). When I reach an error I don't think there is a way to switch to the clean state before the failed command to try running it again. My options are to reload to rebuild from source or rollback that resets my local changes but I still seem to have the state that the command that failed generated before it errored.

I've recorded Other following-up issues to #1104 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants