Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composable Flows and Steps #245

Open
talebzeghmi opened this issue Jul 2, 2020 · 8 comments · May be fixed by #612
Open

Composable Flows and Steps #245

talebzeghmi opened this issue Jul 2, 2020 · 8 comments · May be fixed by #612

Comments

@talebzeghmi
Copy link

talebzeghmi commented Jul 2, 2020

Large ML projects spanning teams reuse pipelines and models (ex: ensembles, feature engineering, etc).

There are two aspects of reuse:

  1. Reuse a whole Flow, to be able to compose a Flow of other Flows.
  2. Reuse a step (imagine it to be a feature engineering step). Steps currently do not have in parameters and return values making reuse more difficult.

A Use Case:

  • A large modeling project consistent of logical steps. (ex: feature engineering, imputation, models, stack models, meta model, smoothing, validation). Each of those steps may be a flow, and each would reuse feature engineering transforms from other teams.
  • It may be cumbersome to create a Flow for every feature engineering transform, rather than simple functions (steps?) that are easily reused.
  • Each logical step could be developed by its own team of applied scientists.

related: #144

@crk-codaio
Copy link
Contributor

We have been thinking about (1) [as graph composition] and hopefully will publish more details on the thoughts we have about it. cc @tuulos
For (2) - you could still get the sharing esp. for feature engineering transform as a library of functions (instead of steps); that can just be imported within your step. Some of our team internal to Netflix employ this route for sharing such business logic.

Also, for relatively common collection of transformations you could still use (1) if you want to even reduce the step boilerplate from being repeated.

@dpatschke
Copy link

@talebzeghmi Thank you for opening this issue! Your issue has articulated some of the exact metaflow architectural questions that our team has been having around productionizing/pipelining metaflow ... especially around the reusability of feature engineering code within multiple flows.

I don't want to have to copy and paste scikit-learn Transformer code to each new modeling flow especially when there is a lot of boilerplate/utility code that I've written around:

  1. leveraging pandas to protect against differing columns being passed in.
  2. pulling in a tagged 'production' model from a Run that is then reloaded for just the data 'transform' and not the 'fit' as well.

@seeravikiran Thanks for some of the recommendations regarding structuring and code reusability to address some of items presented in this issue. I will continue to investigate what that would look like on our end. In the meantime, I would like to point you to this post made on the metaflow community page that actually proposes a pretty interesting idea to the issue. I'm curious as to your thoughts on this (or something like this).

@tuulos
Copy link
Collaborator

tuulos commented Aug 7, 2020

As @seeravikiran pointed out above, we have plans for graph composition. Meanwhile, this form of subclassing is supported #144 (comment)

@dpatschke
Copy link

@tuulos Thanks for the response and the reference. This is extremely helpful and greatly appreciated!

@talebzeghmi
Copy link
Author

@tuulos, would you be able to share an RFC kind of document on how Metaflow would support composition? In this way we can give feedback from our Applied Scientists on it's usability, before the code is written.

thank you!

@tuulos
Copy link
Collaborator

tuulos commented Sep 2, 2020

@talebzeghmi yep, I have been writing a doc that I should be able to share this month. I will ping you when it is available. Thanks for your patience :)

@PertuyF
Copy link

PertuyF commented Oct 23, 2023

Hello @tuulos , any news since this doc you've been writing in 2020 regarding metaflow composable flows?

@DonIvanCorleone
Copy link

Hi @tuulos,

is there any progress with respect to this topic? Would be extremely helpful for our use business case we are having right now :) Any feedback appreciated.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants