Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for AWS Step functions #2

Closed
romain-intel opened this issue Dec 2, 2019 · 20 comments · Fixed by #202 or #211
Closed

Support for AWS Step functions #2

romain-intel opened this issue Dec 2, 2019 · 20 comments · Fixed by #202 or #211
Assignees
Labels
enhancement New feature or request

Comments

@romain-intel
Copy link
Contributor

Metaflow on AWS currently requires a human-in-the-loop to execute and cannot automatically be scheduled. Metaflow could be made to work with AWS Step functions to allow the orchestration of Metaflow steps to be done by AWS.

@romain-intel romain-intel added the enhancement New feature or request label Dec 2, 2019
@savingoyal savingoyal self-assigned this Dec 2, 2019
@gonzalodiaz
Copy link

I just arrived to Metaflow and I'm thrilled to give it a try in my company.
Currently we are using Airflow on Kubernetes to schedule workflows. I would like to hear if you analyzed the possibility of scheduling Metaflow over Airflow. And if it would be possible to use K8s as infrastructure to run the steps. Thanks!

@savingoyal
Copy link
Collaborator

Hi @gonzalodiaz
Thanks for giving Metaflow a try. We follow a plugins based architecture and it is indeed possible to schedule flows over Airflow and use K8s as the compute substrate and something we would like to offer in the near future. We welcome feature requests. Please open one.

@thundergolfer
Copy link

Is your team familiar with https://github.com/argoproj/argo? In theory you could compile your Flows down into Argo's workflow spec format (JSON/YAML) and then Argo could take care of execution.

@savingoyal
Copy link
Collaborator

savingoyal commented Dec 4, 2019

Thanks for the link. Yes I am familiar with argo but haven’t looked at it in depth.

@impredicative
Copy link

impredicative commented Dec 26, 2019

Metaflow on AWS currently requires a human-in-the-loop to execute and cannot automatically be scheduled. Metaflow could be made to work with AWS Step functions to allow the orchestration of Metaflow steps to be done by AWS.

Given that Metaflow is evidently seriously lacking a scheduler, either Step Functions or better yet an open source component of Metaflow itself can probably fill in the gap. Without a scheduler, indeed it seems to be an incomplete solution.

@NukaCody
Copy link

For step function integration, is it possible to incorporate https://github.com/aws/aws-step-functions-data-science-sdk-python?

@impredicative
Copy link

impredicative commented Jan 13, 2020

For step function integration, is it possible to incorporate https://github.com/aws/aws-step-functions-data-science-sdk-python?

As an observer, I don't see any need for AWS Step Functions integration since Metaflow should be able to manage workflow steps directly. Why pay extra for Step Functions?

@hgahlot
Copy link

hgahlot commented Jan 13, 2020

AWS Step Functions need to be scheduled through CloudWatch. They do not have an in-built scheduler. However, CloudWatch has a direct integration with Step Functions. It might be better to look into how CloudWatch + Lambda may be leveraged to act as a scheduler for Metaflow, separate from Step Functions.

Metaflow could be made to work with AWS Step functions to allow the orchestration of Metaflow steps to be done by AWS.

Metaflow is an orchestrator itself so I think the only missing piece is to figure out the scheduling aspect. Using Step Functions as an orchestrator just because we need it to schedule Metaflow workflows is an overkill, IMO.

@impredicative
Copy link

impredicative commented Jan 13, 2020

Metaflow could in principle then manage those Cloudwatch Events and Lambdas too using a single combined job+schedule definition. This would be the simplest scheduler integration assuming one cannot be built-in or integrated into Metaflow directly. I would still prefer the integration and use of an open source scheduler into Metaflow though to avoid the reliance on Cloudwatch Events and Lambdas.

@steveash
Copy link

Also maybe check out Glue Workflows which are a little more DAG-like compared to the Step Functions model https://docs.aws.amazon.com/glue/latest/dg/workflows_overview.html

@kylejmcintyre
Copy link

I'm confused about this statement:

Netflix uses an internal DAG scheduler to orchestrate most modeling and ETL pipelines in production. Metaflow flows can be deployed to the production scheduler with a single command. A similar integration could be provided e.g. for AWS Step Functions (Github issue)

Is this saying that there isn't yet a way to schedule a flow to run in production, or that there's no DAG schedulor/executor to actually run a flow in a production setting? Thank you.

@savingoyal
Copy link
Collaborator

@kylejmcintyre Internally we export metaflow flows to a DAG scheduler. A similar integration with AWS Step Functions is in the works.

@kylejmcintyre
Copy link

Thanks for your reply @savingoyal . Is what you do internally available to me as an open-source consumer? If so, is it considered a hidden/internal implementation detail currently that runs on my provisioned compute resources? Or is executing flows in a production setting not yet supported for folks outside of Netflix?

@savingoyal
Copy link
Collaborator

savingoyal commented Mar 9, 2020

@kylejmcintyre Given that the DAG scheduler (Meson) we use internally is not an open-source project, we are working on an equivalent integration with AWS Step Functions to offer similar capabilities in metaflow OSS as we speak.

@impredicative
Copy link

Why is AWS Step Functions even needed then? It's just going to increase the bill by doing something that open source software can do for free. The real hardware which is needed is provided by AWS Batch/EC2/ECS and similar services.

@savingoyal
Copy link
Collaborator

@impredicative There are not very many production-grade DAG schedulers (no SPOF, HA, scalable) with good adoption in the open-source community. AWS Step Functions offers the guarantees that we seek from a production-grade scheduler and our integration can serve as a reference implementation for integrations with other schedulers.

@joe153
Copy link

joe153 commented Apr 22, 2020

I am interested in using this project but the obvious blocker is the scheduler. @savingoyal: do you have a rough timeline when it could be available?

@impredicative
Copy link

impredicative commented Apr 25, 2020

@impredicative There are not very many production-grade DAG schedulers (no SPOF, HA, scalable) with good adoption in the open-source community. AWS Step Functions offers the guarantees that we seek from a production-grade scheduler and our integration can serve as a reference implementation for integrations with other schedulers.

As it has been noted in this issues before, Step Functions use Cloudwatch Events for scheduling. Has this changed? If not, why is Step Functions being referred to a scheduler? Does Metaflow then really still need Step Functions integration, or is it Cloudwatch Events integration that it needs?

@savingoyal savingoyal linked a pull request May 15, 2020 that will close this issue
@savingoyal
Copy link
Collaborator

savingoyal commented May 19, 2020

@impredicative AWS Step Functions is a scheduler as it schedules tasks on AWS Batch. The state machine by itself can be triggered by using CloudWatch Events. Metaflow is not meant to be a replacement for a production-grade scheduler and through our integrations, we advocate that users publish their production workflows onto a production scheduler.

@savingoyal savingoyal reopened this May 20, 2020
@savingoyal savingoyal linked a pull request May 20, 2020 that will close this issue
@savingoyal savingoyal linked a pull request Jun 4, 2020 that will close this issue
@savingoyal savingoyal removed a link to a pull request Jun 4, 2020
@savingoyal
Copy link
Collaborator

This feature is now generally available. The launch blog post is here and the documentation is here.

sappier pushed a commit to sappier/metaflow that referenced this issue Feb 5, 2021
emattia pushed a commit to emattia/metaflow that referenced this issue Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
10 participants