Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Pythonic Inheritance of Flow Class #144

Closed
jensqin opened this issue Feb 27, 2020 · 2 comments
Closed

Support Pythonic Inheritance of Flow Class #144

jensqin opened this issue Feb 27, 2020 · 2 comments

Comments

@jensqin
Copy link

jensqin commented Feb 27, 2020

Sometimes a couple of steps, parameters or functions can be used by multiple flows. It will be ideal if metaflow supports python standard class inheritance.

For example, running this script:

from metaflow import FlowSpec, step


class BaseFlow(FlowSpec):
    @step
    def start(self):
        print("this is the start")
        self.next(self.step1)

    @step
    def step1(self):
        print("base step 1")
        self.next(self.end)

    @step
    def end(self):
        print("base step end.")


class SubFlow(BaseFlow):
    @step
    def step1(self):
        print("sub step 1")
        self.next(self.step2)

    @step
    def step2(self):
        print("sub step 2")
        self.next(self.end)


if __name__ == "__main__":
    SubFlow()

will get:

this is the start
sub step 1
sub step 2
base step end
@tuulos
Copy link
Collaborator

tuulos commented Feb 28, 2020

thanks @jensqin for the proposal. Just to be clear, the following type of inheritance is supported already:

class MyModel():
    def fit(self):
         self.model = train(self.data)

class MyFlow(FlowSpec, MyModel):
     @step
     def start(self):
          self.fit()

However, the graph defined by FlowSpec can't be spread across multiple classes, i.e. graph composition is not supported.

Most likely we will start supporting graph composition by allowing separate subgraphs, so you will be able to do self.next(SubFlow). This mix-in style approach should be even more flexible than subclassing.

@jensqin jensqin closed this as completed Feb 28, 2020
@jensqin
Copy link
Author

jensqin commented Feb 28, 2020

Thank you for responding! @tuulos . Just a few thoughts on it:

  1. personally I don't like the current subclassing practice primarily because a data science pipeline is not a subclass of a model intuitively. By 'inheritance' I actually meant I want to reuse a flow as a part of another flow rather than use a fixture for different flows.

  2. I like the idea of graph composition very much. However, what if I also want some sort of integrity of my subflow or child data pipeline? For example, suppose SubFlow is used by two data pipelines, when loading the artifacts I would like to get access to only the runs of the first data pipeline. Or in other words, without a 'name' of the whole pipeline or the top level class, how can I figure out other parts of the SubFlow?

I am just a newbie and trying to understand machine learning workflow. Hope these questions will not bother you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants