-
Notifications
You must be signed in to change notification settings - Fork 787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a consistent way to specify container image #489
Comments
Good point! Do you think |
Yep, I was thinking
|
As I think more about it, maybe it is a good idea to borrow the terminology from k8s container spec? In theory, we may decide to support overriding |
Having a global setting
Seems the
Ignoring it seems a pretty good default until the
I'm also leaning towards using the same decorator. At the end it just specifies a name of the image regardless how the image is built.
IMO, it's not a big issue to have |
Just a second thought. Having a dedicated There is another observation though: Metaflow already has a separate As a third thought :) it could even make sense to put resources specification inside such |
We should move forward with A good question is whether we should allow |
For backward compatibility, we would have to ensure the correctness of |
Also, regarding |
I'm a fan of the image consistency across execution environments, but I'd vote to keep the local execution behavior decoupled. As an example where this could fall off the rails – if using a docker container as a dev environment already, it could become quite messy trying to control other local docker containers within another one. |
@russellbrooks good point on dev env in docker! Any thoughts on what the behavior should be in the local mode in the meanwhile? Ignore vs error out? |
Note: updated the proposal to use |
The proposal looks good to me! |
I like this proposal too but have a few questions:
|
@romain-intel Not sure if I understood your comment properly, but my expectation is that the priority order will be the reverse of your comment -
RE: warn/error - today |
I was thinking step decorator (overrides) flow decorator (overrides) global default. If you use both |
@oavdeev But then would you expect folks to not be able to execute |
Nice I didn't even realize you can do that! But should i be able to do Actually, even in your example, today.. if i have |
Currently, we pick the larger of resources specified via In terms of the actual implementation of Re: CLI - doesn't override what's specified in the code. |
@savingoyal , yes, sorry, I listed the list in the reverse order (ie: global image default overridden by flow specification overridden by etc) so I think we are saying the same thing. I think my point though is that we definitely need to think about these combinations a bit and be very clear on what is happening. Definitely agree with @oavdeev that if the code can't resolve things clearly, it's better to error out than to just "pick one". So, we should have a clear priority order and clear rules on what constitutes a conflict. |
Cool so to summarize: seems like the consensus is that it is not an error to have both |
Re: That brings me to the last open q: should If we had That makes me think that even before we have |
I would agree with that. i would also list the various priority and ordering (even if that is not 100% due to this proposal) so that it is clear in the future. I will also have to try what happens if you do Re the |
Interesting. Intuitively |
I see your point @oavdeev and I do see that |
Yeah I can see that too. Another thing that @savingoyal suggested to me is to tweak the original proposal like this (let's call it proposal v3):
...and just not add |
After some offline discussions, I've compiled a few usage scenarios/stories below. Anything else I'm missing? Container image management scenariosScenario 1. Uber-imageThere is one blessed image, maintained by your friendly ML platform team, blessed by security people. Most or all data science code uses the uber-image, so it is set via Metaflow configuration when the Flow is scheduled, so data scientists never have to think about what image to use. Scenario 2. Environment-specific uber-imageThis is similar to the Uber-image scenario, but complicated by the fact that Lambda requires a specially baked image that includes Lambda runtime. Also, maybe some tasks require a GPU and a special image as well. So you'd have multiple uber images, but the choice of the image is mostly based on compute env you're using, not the task itself. Scenario 3. An image per flowML engineers or Data Scientists build their own images for each flow, e.g. based on Scenario 4. An image per stepYou can imagine using Metaflow to compare multiple versions of the same ML library. Or maybe you borrowed one step from someone else's Flow, and they're using their own special container image, so you have to use it as well. In this case, users specify images for certain steps, but fall back to use either uber-image(1) or flow-image(3) otherwise. |
Also an updated proposal, v3.1:
In the end, the precedence is (using lambda as an example here, ⊱ means "overrides"): |
@oavdeev The proposal LGTM! |
Have a question regarding the |
Good question, I believe right now For the purposes of this thread, I'd leave the behavior as it is today. But it certainly worth the discussion if that's optimal. There even could be use cases for other decorator attributes (not |
Actually for naming consistency, I'd call it |
For posterity, updated naming, proposal v3.2:
In the end, the precedence is (using lambda as an example here, ⊱ means "overrides"): |
Shouldn't config value METAFLOW_CONTAINER_IMAGE override flow-level spec? The config could be also set as an envvar and envvar purpose was to change the default behaviour specified in the source. What do you think about this order?
Same rules are applied to other parameters, e.g. mem, cpu, iam_role, etc. |
Right, but the idea that envvar sets the default but whatever you specify in .py file takes precedence over that. Note that this is also how this works today for AWS Batch, In the use case you are describing, the recommendation would be to not specify the registry in the decorator neither in dev nor prod, and rely on config variables for that. In fact I think this use case is exactly why there is a separate setting for the registry in Metaflow in the first place: Netflix folks wanted users to be able to specify the image (e.g. version of python) but leave it to devops/platform team to set the registry to dev/prod. I agree there is indeed sometimes a legitimate need to override whatever is set in Python code, but changing the precedence here would require changing existing config behavior for Batch decorator, so I wouldn't go down this path. |
Hey guys, is this already in the public release? If so how could I use the decorator? |
Super interested in the described use case 3, where I can just use Poetry for package management and then point to a local dockerfile. Developing locally with the --environment=docker would be a beauty! |
Problem
Several existing and upcoming Metaflow plugins run your code in containers. They all allow you to specify a custom container image, but each does it in a slightly different way.
For AWS Batch, you can:
@batch(image='foo')
step decoratorMETAFLOW_BATCH_CONTAINER_IMAGE
@batch
use default public image (vanilla official python)For Argo you can
@argo(image='foo')
step decorator@argo_base(image='foo')
flow decoratorFor AWS Lambda, you can
@lambda(image='foo')
step decoratorMETAFLOW_LAMBDA_IMAGE_URI
Note that AWS Lambda is a bit special as it doesn't allow using public images. Also, the image has to be baked in a special way to include Lambda Runtime (it cannot be installed on the fly).
For
@kubernetes
, you can@kubernetes(image='foo')
step decoratorMETAFLOW_KUBERNETES_IMAGE_URI
Proposal v2
Unify all this by adding a new decorator,
@image(name="foo")
, and use it across all plugins that use containers. The new decorator can be used both at the step level and flow level.In addition to that, add a global setting called
METAFLOW_DEFAULT_IMAGE
to point to the global default image.Batch and other plugins would still support specifying the image directly for brevity (that is,
@batch(image='foo')
).Open question: what should
@image
do in local execution mode?Option A: Metaflow runs your step in docker.
Option B: Metaflow throws an error
Option C: Metaflow ignores it
Probably not going to implement (A) right away, so either (C) or (B)
Openquestion: should@awslambda
use the same decorator?Given that Lambda container images are special, and you cannot simply repurpose an AWS Batch or kubernetes image for AWS Lambda, would it be confusing to use the same decorator to specify the image?
Decision: use
@image
for Lambda tooDiscussion
How much of a problem is this?
Not a huge deal per se — I'd assume that the users aren't routinely switching back and forth between e.g. AWS Batch and k8s, so this inconsistency isn't really in your face as a user. However as Metaflow adds more plugins, this becomes more acute. You can see people using both Lambda and Batch, or both
@kubernetes
and@argo
.Would
@image
ever support any other params besides name= ?Probably not now. Hypothetically, you can imagine Metaflow doing other things like building container images for the user, in that case you could have something like
@image(dockerfile=...)
too. But there may be a better UX for that.You could also imagine allowing to override container entrypoint, for plugins that support that.
The text was updated successfully, but these errors were encountered: