-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for the Docker Python SDK #2158
Conversation
…ed amongst all tasks. also some linting
@apierleoni It would be great if you could separate "companies who use luigi" into another PR - that one can be merged quickly. Also, looks like you've had some Travis errors; could you take a look and resolve? Thanks! |
@dlstadther the travis errors are not ours and apply to every PR. @Tarrasch has already acknowledged that they have yet to be fixed (see comment above: june 23) on separating the PR: we have no problem waiting for this one to be merged for our "companies with luigi" link to show up. We'd rather keep them in the same PR for simplicity and just wait for the review for it to be merged. |
@elipapa Those errors were resolved elsewhere. Your current errors are flake8. |
flake8 errors fixed now, all tests passed on travis |
tox.ini
Outdated
@@ -10,6 +10,7 @@ deps= | |||
moto<1.0 | |||
HTTPretty==0.8.10 | |||
nose<2.0 | |||
docker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to specify a specific version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes thanks, well spotted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no objections, I'm ok to merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @apierleoni thanks a lot for all this work! I'm using this DockerTask for quite some time and it is working fine overall so far. Even though I had to adapt some parts to make it work properly for our use case. I will start with this first small review because I think the test are not working correctly when they are fixed. See my comments (volumes
to binds
) in the tests. After that I will add some additional comments and suggestions, for some later improvements or extensions.
test/contrib/docker_runner_test.py
Outdated
image = "busybox" | ||
name = "MountLocalFileAsVolume" | ||
# volumes= {'/tmp/local_file_test': {'bind': local_file.name, 'mode': 'rw'}} | ||
volumes = [local_file.name + ':/tmp/local_file_test'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be renamed to binds
?
test/contrib/docker_runner_test.py
Outdated
dummyopt = luigi.Parameter() | ||
image = "busybox" | ||
name = "MountLocalFileAsVolumeWithParam" | ||
volumes = [local_file.name + ':/tmp/local_file_test'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be renamed to binds
?
test/contrib/docker_runner_test.py
Outdated
name = "MountLocalFileAsVolumeWithParamRedef" | ||
|
||
@property | ||
def volumes(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be renamed to binds
?
# derive volumes (ie. list of container destination paths) from | ||
# specified binds | ||
self._volumes = [b.split(':')[1] for b in self._binds] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One addition I had to make to get this properly working in our scenario was to include a custom instance attribute self._complete = False
and set this to True
when the container exited.
**self.container_options) | ||
self._client.start(container['Id']) | ||
|
||
exit_status = self._client.wait(container['Id']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+self._complete = True
I'm not sure anymore if I included this simply to True
on purpose or if this should be set in subject to the exit_status
.
In our use case we're chaining multiple DockerTask
consecutive so that each task depend on the previous one and its output. As I'm quite new to working with luigi I don't have that much deep knowledge and experience so far, but I think I had some problems with the luigi scheduler while the scheduler was trying to resolve and start the DockerTask
process chain properly. After including the previous changes and overwriting the complete
methods, the chain worked fine for me. As far as I remember correctly, I've even got the tests to run fine after renaming volumes
to binds
. Perhaps this is use case specific, but maybe someone with better knowledge of luigi can overlook this better.
class ImportProcess(DockerTask):
....
def complete(self):
return self.output().exists() or self._complete
...
Of course this could also be implemented in the DockerTask
class directly with maybe some modification like this:
class DockerTask(luigi.Task):
....
def complete(self):
return super(DockerTask, self).complete() or self._complete
I do have two more smaller additions, but in order to keep this PR simple, I will create a PR after this one will be merged. Thanks again for |
@apierleoni Where does this PR stand given @drobakowski 's review and comments? |
@drobakowski Regarding the second issue, I am not 100% sure I understood what you want to do with the @dlstadther |
some tests are failing for an error that looks related to boto non importing the google cloud platform module. this might not be limited to this PR |
merged with current master, still failing for boto |
@dlstadther |
I guess the codecov issue is related with this https://github.com/codecov/support/issues/180 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'm not concerned about that codecov thing. The tests pass, so let's merge and allow others the goodness of Luigi v1 DockerTask support!
Hey @apierleoni , do you have any insight into why two of the Docker tests started failing recently? It'd be much appreciated if you could check it out! Thanks |
@dlstadther sorry for the late reply, is it still an issue? I cannot find the failing tests. |
@apierleoni They've since been fixed! |
Description
Hi there, we did some work to run a data processing pipeline using Docker containers with Luigi using the Docker Python SDK and we would like to contribute it back.
Motivation and Context
These PR allows to use the Docker Python SDK to run Docker containers instead of launching them from a subprocess.
The main advantage over calling an external process is that all the communications done with the Docker daemon are through the API and not a subprocess, being able to handle exceptions properly and potentially connect to remote APIs (we did not test this scenario)
It also solve non trivial problems with the docker SDK like mounting a single file, and makes sure that there is enough space in the container to run the job by mounting a temporary directory from the host , avoiding the container size limit of 10Gb.
Have you tested this? If so, how?
Unit tests are included. But they require a Docker daemon to run. Not sure it is available in Travis.
We are currently running this in our staging environment and we are ready to run production pipelines with this contrib.