-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow gateway exec-ing into a failed solve with an exec op #1732
Conversation
ee65f89
to
d3b7dc6
Compare
This comment has been minimized.
This comment has been minimized.
8219604
to
ccf746b
Compare
if !ok { | ||
return nil, errors.Errorf("unexpected Ref type: %T", m.Ref) | ||
} | ||
|
||
res, err := refProxy.Result(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder if this should be parallelized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here: dfaf613
@hinshun Is this ready? |
@tonistiigi Yes, though I did punt on the errors before |
VertexError only contains digest though, not the proto definition. |
Yeah, I understand clients doesn't always have access to the proto definition to lookup the op via digest. Maybe introduce an |
What is the difference between |
@tonistiigi I rebased and added a commit that returns exec errors for errors returned before the executor run now: 1bbc7c2 |
var err error | ||
inputIDs, err = c.registerResultIDs(ee.Inputs...) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the edge case that the registerResultIDs
returns the unexpected type for result
error then we will lose the original solve error here. I wonder if these should be errors.Wrap(solveErr, err.Error())
instead to make sure the solve error is preserved? I am not sure if we would ever practically get the unexpected type for result
error though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we would ever practically get the unexpected type for result error though.
I think that would be an implementation error in Buildkit?
I'm okay with the errors.Wrap
approach, but unsure if it'll be useful if incomplete? You'll need a lot of safeguards in the client side for incomplete solve errors.
var err error | ||
inputIDs, err = lbf.registerResultIDs(ee.Inputs...) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue here, we will lose solveErr
client/build_test.go
Outdated
mounts = append(mounts, client.Mount{ | ||
Selector: mnt.Selector, | ||
Dest: mnt.Dest, | ||
ResultID: se.Solve.OutputIDs[mnt.Output], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be mnt.Input
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both work, they could use mnt.Input
but they would be the unmodified inputs. For this test, I was checking that I could access the mutated mounts rather than the original inputs.
In the LLB we have: echo %s > output && fail
, so this test was to exec into a mount where echo %s > output
succeeded but the fail
failed the execop overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic should be that both se.Solve.OutputIDs
and se.Solve.InputIDs
are indexed by mnt.Input
. mnt.Output
is a link to the next vertex. There is no need that output must be defined on a mount in order to be able to debug its error state.
op := se.Solve.Op | ||
opExec, ok := se.Solve.Op.Op.(*pb.Op_Exec) | ||
require.True(t, ok) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the count of items in OutputIDs
InputIDs
, as well as some meta properties like Args
} | ||
} | ||
|
||
err = errdefs.WithExecError(err, inputRes, outputRes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was quite confused about fileop returning ExecError
. Maybe a better name if we want to keep this structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exec
as opposed to CacheMap
, not the ExecOp
. I'm not sure what's a better name... maybe ResultError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess we don't need to block on that. It probably makes sense to rename the Exec
function in Op
as well to avoid confusion but that can be follow-up.
@hinshun Any update? Mainly on the ref indexing that I think needs changes as we discussed in slack. If we can get this over the line we could do |
@tonistiigi Sorry have been on my week long oncall at work. I don't expect the ref indexing take too long to implement, but I think I'll need to clarify some of the edge cases with you tomorrow. |
Signed-off-by: Edgar Lee <edgarl@netflix.com>
Signed-off-by: Edgar Lee <edgarl@netflix.com>
Signed-off-by: Edgar Lee <edgarl@netflix.com>
Signed-off-by: Edgar Lee <edgarl@netflix.com>
Signed-off-by: Edgar Lee <edgarl@netflix.com>
- Plumb default worker by adding GetDefault() to frontend.WorkerInfos - To avoid cyclic dependency, refactor frontend.WorkerInfos to worker.Infos - Refactor gateway.NewContainer to share code with llbsolver/ops/exec.go Signed-off-by: Edgar Lee <edgarl@netflix.com>
Signed-off-by: Edgar Lee <edgarl@netflix.com>
Signed-off-by: Edgar Lee <edgarl@netflix.com>
Signed-off-by: Edgar Lee <edgarl@netflix.com>
Signed-off-by: Edgar Lee <edgarl@netflix.com>
frontend/gateway/container.go
Outdated
} | ||
// if mount is based on input validate and load it | ||
if m.Input != opspb.Empty { | ||
if int(m.Input) > len(refs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now, but note this is an existing bug:
buildkit/solver/llbsolver/ops/exec.go
Line 242 in 9369d53
if int(m.Input) > len(inputs) { |
} | ||
mountable = active | ||
p.Actives = append(p.Actives, active) | ||
if m.Output != opspb.SkipOutput && ref != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if having output in here is even supported. Maybe just error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least client protects against this
Line 307 in c700580
if !m.noOutput && !m.readonly && m.cacheID == "" && !m.tmpfs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I haven't changed this conditional, it's the same before refactoring here:
buildkit/solver/llbsolver/ops/exec.go
Line 294 in 9369d53
if m.Output != pb.SkipOutput && ref != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yeah I missed that input is cloned to be output here. Still a weird case but no changes needed for this PR.
if mountable == nil { | ||
continue | ||
execOutputs := make([]solver.Result, len(e.op.Mounts)) | ||
for i, res := range results { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't quite understand how this works. We are mapping only mounts that had set output index, instead of all the mounts(that would then point to either mutable or same input if readonly). I think the test works because something always fills up output index there, even if it is unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed here: 1240dd7
"rootfs and readwrite mount", | ||
llb.Image("busybox:latest").Run( | ||
llb.Shlexf(`sh -c "echo %s > /data && echo %s > /rw/data && fail"`, id, id), | ||
llb.AddMount("/rw", llb.Scratch().File(llb.Mkfile("foo", 0700, []byte(id)))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you would set llb.ForceNoOutput()
in here I think this test would not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed it doesn't work, thanks. The MountID[1]
becomes scratch when setting llb.ForceNoOutput
.
Signed-off-by: Edgar Lee <edgarl@netflix.com>
PR comments have been addressed. I noticed there were edge cases of some kind that the digest from |
Signed-off-by: Edgar Lee <edgarl@netflix.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Sorry I missed the scratch mounting via NewContainer, thanks for fixing that.
For: #1472 (comment)
Evaluate bool
to theSolveRequest
, this explicitly evaluatesResultProxy
because they are currently lazy results (until returned, or a call likeref.ReadFile
is called). Otherwise, exec errors will only show up in the main solve request instead of insiderunGatewayCB
.SolveError
as aTypedErrorProto
. Clients can useerrors.As
to extract it.GetDefault() (worker.Worker, nil)
tofrontend.WorkerInfos
, that plumbs down the default worker.frontend.WorkerInfos
toworker.Infos
because of the cyclic dependencygateway.NewContainer
to share the exact same code assolver/llbsolver/ops/exec.go
, previously was 90% duplicated. The existing version ofgateway.NewContainer
also didn't allow for scratch mounts whereassolver/llbsolver/ops/exec.go
did.Example client code: https://gist.github.com/hinshun/e72f509121e022bc81ebba03fc2851c6
When an
pb.ExecOp
orpb.FileOp
fails to be solved, its protobuf definition, along with its solved inputs / outputs are wrapped in a typed error to thelbfBridgeForwarder
which holds the temporary IDs for references.The inputs and outputs have temporary uuids generated and sent back over as a
TypedErrorProto
which allows it to unmarshaled on the client side to reconstruct the arguments necessary to gateway exec back into a container process for the failed solve.