-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker support for pipeline runs #201
Docker support for pipeline runs #201
Conversation
0ab2143
to
195b8f8
Compare
Codecov Report
@@ Coverage Diff @@
## master #201 +/- ##
==========================================
- Coverage 62% 61.09% -0.91%
==========================================
Files 49 49
Lines 4066 4259 +193
==========================================
+ Hits 2521 2602 +81
- Misses 1132 1214 +82
- Partials 413 443 +30
Continue to review full report at Codecov.
|
@Skarlso This is ready for review. I still have to work on the documentation part which might be helpful for the review 😄 |
Great job! I'll review it. 😊 |
item := iter.Next() | ||
|
||
if item == nil { | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put this check into the for?
for item := iter.Next(); item != nil {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I will change that! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That actually doesn't work. item := iter.Next()
is only executed once before loop iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true I means to say item := node; item != nil; item = iter.Next
.
Sorry, only picking through this slowly. But I'll get there. :) |
if err := s.memDBService.InsertDockerWorker(worker); err != nil { | ||
gaia.Cfg.Logger.Error("failed to cache docker worker in memdb", "error", err) | ||
if err := worker.KillDockerWorker(); err != nil { | ||
gaia.Cfg.Logger.Error("failed to kill docker worker", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like simply logging this fact is weird. At the least we should wrap it and pass it up, don't you think? Or provide some context, I don't know. If we just log it, I feel like we simply don't care and might as well not log it. Especially since Gaia is very verbose now and this might be just missed. I might entirely miss this, unless I grep for it. And even then I might just not care since this already errorred out because of a different reason I'm trying to solve / fix which will probably fix the docker worker as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the least we should wrap it and pass it up, don't you think?
Can you provide an example? If you mean with "pass it up" returning the error then this is not possible. We are in the schedule function which is periodically triggered and doesn't allow a return parameter.
The reason why I added this output is only for debugging purposes. What would you expect to happen here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the scheduler never returns an error. I guess it's okay, it's already in a failing statement.
Related Docs PR: gaia-pipeline/docs#14 |
workers/docker/docker.go
Outdated
gaia.Cfg.Logger.Info("try to pull docker image and then try it again...") | ||
|
||
// Pull worker image | ||
_, err = cli.ImagePull(ctx, workerImage, types.ImagePullOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth mentioning that right now this only supports public docker registry and no private registries. I guess we can add that functionality later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Yes, exactly. I wouldn't prioritize this yet since we can easily add this at a later point in time.
|
||
// KillDockerWorker disconnects the existing docker worker and | ||
// kills the docker container. | ||
func (w *Worker) KillDockerWorker() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not start the container with --rm
? I'm sure the api can do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know --rm
does not work with detached containers (e.g. -d
). Additionally, I have to stop the container anyway so that the container would be removed by the --rm
.
@michelvocks So I wanted to test this now, and I found two bugs... First, when I created the pipeline and enabled the docker thing on the pipeline creation ui, it wasn't enabled for the pipeline. The indicator on the pipeline view said disabled and in fact it did not run in a container. Once I enabled this though... It failed to run unfortunately by containously spewing this:
Also, this made me realise that there is no timeout on this thing. :D This can keep failing and retrying indefinitely. Not sure what's wrong because |
Let me try to reproduce this.
It should wait for the pull to finish but I will test that again. However, please keep in mind that we didn't release a new version (and also not a new docker image). That means the current Gaia version in the
Right. I thought about that too and I personally think that we maybe don't want a time out? e.g. try it as long as possible strategy. |
The problem with True, the image wouldn't have the new worker stuff, but that's not the error. :) The error is that the image is not found. Which is weird because the image IS there. Did you try pushing your image up somewhere and deleting it locally and trying to run it like that? |
1889c8c
to
2d0b508
Compare
I tried to reproduce this but it somehow works for me. Can you just give it another try? Maybe you clicked in the create pipeline UI on the cancel button? 😄
There was a bug in the image pull code. Should now work 👍
I'm wondering if this is actually a problem in general and should be addressed in a different PR. I mean this can also happen for normal pipeline runs, right? I think we should add a "context" to a pipeline run with a timeout which makes sure after a certain time a pipeline will be canceled? |
yo. :) I'll re-try as soon as possible! :) |
Agreed! |
going to try this now on a fresh laptop. :) |
eh, couldn't build the image...
|
These new versions worked:
How do you build your image? |
Okay, obviously you can't just build this thing. :D
Ah I see. You expected it to be built from the main folder. |
Oh buuu
|
It feels like every 2-3 month the newest Java version is already deprecated again and it is not possible to build the docker image again. I think we just have to regularly bump the version. Usually, the docker image build process is started by goreleaser that is why it is expected to be in the root folder. |
@michelvocks I had a play with this last night and I feel like it's doing what it's supposed to do. :D The building is clunky, true. :/ I have yet to find a good solution for that. We can create a ticket for that maybe and track it? So, LGTM! :) |
@michelvocks How about we ditch the pinning to a specific Java 8 open jdk version? I don't see any problems with trying to use whatever latest stable is in there, do you? So this, works perfectly fine:
\o/
|
@Skarlso Sounds good. I will do that in a separate PR. |
No description provided.