-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docker runtime for agent #730
Conversation
259e277
to
2e6111e
Compare
Codecov Report
@@ Coverage Diff @@
## main #730 +/- ##
==========================================
+ Coverage 55.32% 56.31% +0.99%
==========================================
Files 25 28 +3
Lines 1099 1243 +144
==========================================
+ Hits 608 700 +92
- Misses 475 511 +36
- Partials 16 32 +16
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
bbeda78
to
b3b3d4a
Compare
@@ -0,0 +1,90 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to rename the folder where this file is created to utils
and keep the file name to ptr.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tend to avoid top level util packages and sign up to every package having at least some scoped purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
b3b3d4a
to
076af5a
Compare
type DockerOption func(*Docker) | ||
|
||
// WithLogger returns an option to configure the logger on a Docker instance. | ||
func WithLogger(log logr.Logger) DockerOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've realized the logger is actually required because we log when a container couldn't be gracefully stopped. I'll tweak this when adding the other features as noted in the OP.
internal/agent/runtime/docker.go
Outdated
// Always try to remove the container on exit. | ||
defer func() { | ||
// We can't use the context passed to Run() as it may have been cancelled. | ||
err := d.client.ContainerRemove(context.Background(), create.ID, types.ContainerRemoveOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we enable all the ContainerRemoveOptions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type ContainerRemoveOptions struct {
RemoveVolumes bool
RemoveLinks bool
Force bool
}
- On Mac with Docker Desktop I've seen force leave objects lying on disk that couldn't be cleaned up by a following
docker system prune --volumes
(I had to manually remove files). I don't know how widespread that impact is, or how challenging it would be to figure out, so I'm not sure if we'd want that enabled, particularly given this is all in-memory and goes away on restart. Given action names are unique I don't foresee any conflicts with respect to launching more actions. - I don't think we want to remove volumes because we allow users to share data between actions via volumes.
- I don't see any issues with link removal.
With that in mind, what do you think about enabling link removal only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On container name uniqueness, this has made me realized the action ID is unique within the context of a workflow, but if a second workflow executes (i.e. the first didn't restart the machine) there could be a collision. I think we could fix that by committing to UUIDs for action IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, Chris.
You're right we probably shouldn't remove volumes.
Currently, the main use case for the agent is to run in Hook. The status-quo is to use Force: true
, so we know it works in Hook. I'm concerned about resource constrained machines where memory and disk space in Hook are an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, lets stick with the status quo and force delete. The volumes will likely be in-memory and cleaned on restart anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be addressed.
internal/agent/runtime/docker.go
Outdated
// Run satisfies agent.ContainerRuntime. | ||
func (d *Docker) Run(ctx context.Context, a workflow.Action) error { | ||
// We need the image to be available before we can create a container. | ||
image, err := d.client.ImagePull(ctx, a.Image, types.ImagePullOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might think about adding retries here. a transient issue to the registry could cause a whole workflow to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with github.com/avast/retry-go as it seems quite modular and has all the bits you'd expect from a retry library.
internal/agent/runtime/docker.go
Outdated
fn(o) | ||
} | ||
|
||
if o.log.GetSink() == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you put the logr.Discard
into o := &Docker{}
, i dont think you'll need this check. Same with the client setting below. These will be the defaults and then only overwritten if a consumer passes either one of them in as a DockerOption
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
6fbb52e
to
3db4cdf
Compare
internal/agent/runtime/docker.go
Outdated
// In rare cases this may create orphaned volumes that the Docker CLI won't clean up. | ||
opts := types.ContainerRemoveOptions{ | ||
Force: true, | ||
RemoveLinks: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is causing issues in tests and its not clear why. I no longer think I understand what it does so I'm disabling; we can enable it at a later date.
42a3072
to
1cb2e5d
Compare
} | ||
|
||
// NewDocker creates a new Docker instance. | ||
func NewDocker(opts ...DockerOption) (*Docker, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: variables, types, and New functions go at the top of the file.
Env: toDockerEnv(a.Env), | ||
} | ||
|
||
failureFiles, err := internal.NewFailureFiles() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this after a bit the interaction between a runtime implementation, the agent, the eventing system, and the action, etc being run in terms of failures feels difficult to follow with just the code. I think we should figure out a way to make this very understandable. docs, code comments, etc. I just want to be sure that all the different parts of the code are resilient and easy to understand how they fit into the whole failure notification system.
|
||
// Close closes all files tracked by f. | ||
func (f *FailureFiles) Close() error { | ||
os.Remove(f.reason.Name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the model is going to be to not capture the errors here, maybe we should change the interface to not return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the name Close
actually doesn't feel accurate here. deleting files is different than closing them. is this satisfying an interface method?
return strings.TrimRight(message.String(), "\n"), nil | ||
} | ||
|
||
func (f *FailureFiles) ToError() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we inject a data structure for failure files to runtimes I suspect this method will be unnecessary so defering.
// TODO: Support all the other things on the action such as volumes. | ||
cfg := container.Config{ | ||
Image: a.Image, | ||
Env: toDockerEnv(a.Env), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to set the failure file locations to be env vars for all containers. that way actions dont need to "know" where this location is and we can change it if needed. action writers just look up the env var and get the files.
In an effort to move this along we've decided to merge this. This comment serves as a summary of outstanding items:
|
1cb2e5d
to
8c2d736
Compare
The new agent requires a container runtime to execute workflow actions. This commit introduces a Docker runtime. Signed-off-by: Chris Doherty <chris.doherty4@gmail.com>
8c2d736
to
3e45b96
Compare
Summary
This PR introduces a Docker runtime that is usable by the agent to run actions. The code is intended to introduce a functional runtime, therefore it doesn't implement the full set of features available on actions such as volumes; the additional features will be implemented separately.
This PR also doesn't address debugability of actions - for example, action logs are discarded - though this may not be required if we continue with syslog and routing container logs to an external service.
Future work
The tests are integration tests that ensure the code is robust enough. Some functionality like ensuring volume configuration, environment variables, commands and args are all configured correctly would be better implemented as unit tests. A subsequent PR will look to introduce an interface so we can easily test these feature as they're added.