Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ✨ Change the Graph creation to allow subclassing #2086

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

simonwardjones
Copy link

Currently it is not possible to subclass a Flow as in the example snippet below.

This PR alters the graph construction that enables this flow subclassing. Instead of using the ast.NodeVisitor class to collect the steps and their ast.FunctionDef's. I have instead iterated on the dir(flow) and then extracted the function ast.FunctionDef's.

from metaflow import FlowSpec, step


class BaseFlow(FlowSpec):
    @step
    def start(self):
        print("this is the start")
        self.next(self.step1)

    @step
    def step1(self):
        print("base step 1")
        self.next(self.end)

    @step
    def end(self):
        print("base step end.")


class SubFlow(BaseFlow):
    @step
    def step1(self):
        print("sub step 1")
        self.next(self.step2)

    @step
    def step2(self):
        print("sub step 2")
        self.next(self.end)


if __name__ == "__main__":
    SubFlow()

As referenced in this issue: #144
and discussed in this issue: #245

I have not yet added tests for this PR as the approach to this might take some discussion. The current Test harness setup uses the FlowFormatter class to generate flows on the fly but all of these assume the flow directly inherits off the FlowSpec. How might we go about testing this kind of subclassing?

Note this PR was originally raised as #2035 but from the wrong branch

@nflx-mf-bot
Copy link
Collaborator

Testing[841] @ 0a964bd

@nflx-mf-bot
Copy link
Collaborator

Testing[841] @ 0a964bd had 52 FAILUREs.

@romain-intel
Copy link
Contributor

Alright, I finally got around to this and there is an issue that is minor but nonetheless:

The reason for this (I think) is that:

  • line number is always the same due to the new processing
  • so that causes the DAGNode to be compared (which does not work)
    Prior to this, the line number was properly computed and always not equal so we would never compare DAGNodes.

I suspect that this would make line-number incorrect so we should probably store line and file or something. This is used in the graph object and we actually rely on it as well for stuff like debug integration (see here: https://github.com/Netflix/metaflow-nflx-extensions/blob/1c534c567beb90de584db7cc2e2b624f08d8498d/metaflow_extensions/netflix_ext/cmd/debug/debug_script_generator.py#L395)

So we somehow need to get equivalent information (probably file and line number) and that should be ok.

@simonwardjones
Copy link
Author

simonwardjones commented Oct 24, 2024

Hi @romain-intel,

Previously it was the case that all steps were defined in the same module and the NodeVisitor was used to get the ast.FunctionDef for each "step". In this implementation the lineno for the step and for the tail of the step are stored.

In the implementation I used the inspect module to get the source code for each function in the dir(flow) and then call ast.parse on each function source to get the func_ast: ast.FunctionDef. However as you point out the func_ast.lineno in this case is relative only to the function and not the file. The image below shows line numbering lineno for the file and the func_ast.linenofor the step.

e.g.

from metaflow import FlowSpec, step # line 1
# line 2
...

# line 20
class BaseFlow(FlowSpec): # line 21
    @step # line 22 - line 1 in func_ast
    def start(self): # line 23 - line 2 in func_ast
        print("this is the start") # line 24 - line 3 in func_ast
        self.next(self.step1) # line 25 - line 4 in func_ast

To resolve the issue I updated the constructor of the DAGNode to take the source_file and the lineno of the function (in the example above this is line 22). There is one slight complication is that the line number we want is line 23 (the line with the def on it not the line where the decorators for the function start). To get this we can simply use lineno + func_ast.lineno - 1 which in words is the line number of the the start of function (including decorators) in the file + the number of lines between this and the def.

However by allowing subclassing of flows one step may be in a different file from another and hence we could still get a func_lineno clash which in turn would through the error you mentioned in https://github.com/Netflix/metaflow/blob/master/metaflow/cli.py#L173.

To resolve this I changed to iterate on the sorted_nodes of the graph based on traversing the Dag.

I also notice in the codebase we use the linenumber in a lot of LintWarn exceptions e.g.

if re.search("[^a-z0-9_]", node.name) or node.name[0] == "_":
            raise LintWarn(msg.format(node), node.func_lineno)

We might want to consider extending these warnings to log the source_file as well as the lineno to give more context. I am happy to add that in this PR but wanted to discuss with you first.

@romain-intel
Copy link
Contributor

OK, I like where this is going. Yes, I think you are right that we should indicate the file and line number. I think we also need to add the source file in the node_to_dict function so that we can pick it up in the debug integration. Looping @talsperre to check on this.

@simonwardjones
Copy link
Author

Hi @romain-intel, I am going on holiday for a couple of weeks now but I made a few more changes based on what you suggested:

  • Added source_file to the base MetaflowException class
  • Updated the lint warnings to include the node.source_file
  • Updated the cli method to display the filename in the warning as in the image below
  • Added the source_file to the node_to_dict method

image

Let me know what you think 😄

@simonwardjones
Copy link
Author

Hey @romain-intel , Just wondered if you had any time to get to this?

@romain-intel
Copy link
Contributor

Alright. Why are the tests not running here. I'll run them internally and check.

@nflx-mf-bot
Copy link
Collaborator

Testing[902] @ 3b1c1eb

@romain-intel
Copy link
Contributor

OK, the changes look good to me, I just need to probably make a few changes here: https://github.com/Netflix/metaflow-nflx-extensions/blob/main/metaflow_extensions/netflix_ext/cmd/debug/debug_script_generator.py#L409 (and probably a few other places) so that it doesn't break when this change merges. I should be able to test this out and make the requisite changes. @talsperre for viz as well.

@simonwardjones
Copy link
Author

OK, the changes look good to me, I just need to probably make a few changes here: https://github.com/Netflix/metaflow-nflx-extensions/blob/main/metaflow_extensions/netflix_ext/cmd/debug/debug_script_generator.py#L409 (and probably a few other places) so that it doesn't break when this change merges. I should be able to test this out and make the requisite changes. @talsperre for viz as well.

Great - thanks @romain-intel. Let me know if there are any outstanding tasks I can help with.

@nflx-mf-bot
Copy link
Collaborator

Testing[902] @ 8840bd2

@nflx-mf-bot
Copy link
Collaborator

Testing[902] @ 8840bd2 had 12 FAILUREs.

@simonwardjones
Copy link
Author

From the previous comment i can see that 12 failures happened somewhere. Is there anyway I can see these and fix them?

metaflow/graph.py Outdated Show resolved Hide resolved
Make compatible with configs.
@nflx-mf-bot
Copy link
Collaborator

Testing[3895224] @ bb83d0c

@romain-intel
Copy link
Contributor

Coincidentally, this is now going to be needed for configs too. I've made a tiny change to make it compatible and will merge as soon as the tests pass (or at least nothing unexpected happens). Yes, finally!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants