feat: ✨ Change the Graph creation to allow subclassing #2086

simonwardjones · 2024-10-08T11:00:52Z

Currently it is not possible to subclass a Flow as in the example snippet below.

This PR alters the graph construction that enables this flow subclassing. Instead of using the ast.NodeVisitor class to collect the steps and their ast.FunctionDef's. I have instead iterated on the dir(flow) and then extracted the function ast.FunctionDef's.

from metaflow import FlowSpec, step


class BaseFlow(FlowSpec):
    @step
    def start(self):
        print("this is the start")
        self.next(self.step1)

    @step
    def step1(self):
        print("base step 1")
        self.next(self.end)

    @step
    def end(self):
        print("base step end.")


class SubFlow(BaseFlow):
    @step
    def step1(self):
        print("sub step 1")
        self.next(self.step2)

    @step
    def step2(self):
        print("sub step 2")
        self.next(self.end)


if __name__ == "__main__":
    SubFlow()

As referenced in this issue: #144
and discussed in this issue: #245

I have not yet added tests for this PR as the approach to this might take some discussion. The current Test harness setup uses the FlowFormatter class to generate flows on the fly but all of these assume the flow directly inherits off the FlowSpec. How might we go about testing this kind of subclassing?

Note this PR was originally raised as #2035 but from the wrong branch

nflx-mf-bot · 2024-10-10T13:32:40Z

Testing[841] @ 0a964bd

nflx-mf-bot · 2024-10-10T16:29:17Z

Testing[841] @ 0a964bd had 52 FAILUREs.

romain-intel · 2024-10-23T21:55:14Z

Alright, I finally got around to this and there is an issue that is minor but nonetheless:

This line: https://github.com/Netflix/metaflow/blob/master/metaflow/cli.py#L173 breaks

The reason for this (I think) is that:

line number is always the same due to the new processing
so that causes the DAGNode to be compared (which does not work)
Prior to this, the line number was properly computed and always not equal so we would never compare DAGNodes.

I suspect that this would make line-number incorrect so we should probably store line and file or something. This is used in the graph object and we actually rely on it as well for stuff like debug integration (see here: https://github.com/Netflix/metaflow-nflx-extensions/blob/1c534c567beb90de584db7cc2e2b624f08d8498d/metaflow_extensions/netflix_ext/cmd/debug/debug_script_generator.py#L395)

So we somehow need to get equivalent information (probably file and line number) and that should be ok.

simonwardjones · 2024-10-24T13:13:26Z

Hi @romain-intel,

Previously it was the case that all steps were defined in the same module and the NodeVisitor was used to get the ast.FunctionDef for each "step". In this implementation the lineno for the step and for the tail of the step are stored.

In the implementation I used the inspect module to get the source code for each function in the dir(flow) and then call ast.parse on each function source to get the func_ast: ast.FunctionDef. However as you point out the func_ast.lineno in this case is relative only to the function and not the file. The image below shows line numbering lineno for the file and the func_ast.linenofor the step.

e.g.

from metaflow import FlowSpec, step # line 1
# line 2
...

# line 20
class BaseFlow(FlowSpec): # line 21
    @step # line 22 - line 1 in func_ast
    def start(self): # line 23 - line 2 in func_ast
        print("this is the start") # line 24 - line 3 in func_ast
        self.next(self.step1) # line 25 - line 4 in func_ast

To resolve the issue I updated the constructor of the DAGNode to take the source_file and the lineno of the function (in the example above this is line 22). There is one slight complication is that the line number we want is line 23 (the line with the def on it not the line where the decorators for the function start). To get this we can simply use lineno + func_ast.lineno - 1 which in words is the line number of the the start of function (including decorators) in the file + the number of lines between this and the def.

However by allowing subclassing of flows one step may be in a different file from another and hence we could still get a func_lineno clash which in turn would through the error you mentioned in https://github.com/Netflix/metaflow/blob/master/metaflow/cli.py#L173.

To resolve this I changed to iterate on the sorted_nodes of the graph based on traversing the Dag.

I also notice in the codebase we use the linenumber in a lot of LintWarn exceptions e.g.

if re.search("[^a-z0-9_]", node.name) or node.name[0] == "_":
            raise LintWarn(msg.format(node), node.func_lineno)

We might want to consider extending these warnings to log the source_file as well as the lineno to give more context. I am happy to add that in this PR but wanted to discuss with you first.

romain-intel · 2024-10-25T06:15:52Z

OK, I like where this is going. Yes, I think you are right that we should indicate the file and line number. I think we also need to add the source file in the node_to_dict function so that we can pick it up in the debug integration. Looping @talsperre to check on this.

simonwardjones · 2024-10-25T15:30:50Z

Hi @romain-intel, I am going on holiday for a couple of weeks now but I made a few more changes based on what you suggested:

Added source_file to the base MetaflowException class
Updated the lint warnings to include the node.source_file
Updated the cli method to display the filename in the warning as in the image below
Added the source_file to the node_to_dict method

Let me know what you think 😄

simonwardjones · 2024-11-12T09:11:18Z

Hey @romain-intel , Just wondered if you had any time to get to this?

romain-intel · 2024-12-02T08:13:17Z

Alright. Why are the tests not running here. I'll run them internally and check.

nflx-mf-bot · 2024-12-02T08:13:40Z

Testing[902] @ 3b1c1eb

romain-intel · 2024-12-02T08:21:51Z

OK, the changes look good to me, I just need to probably make a few changes here: https://github.com/Netflix/metaflow-nflx-extensions/blob/main/metaflow_extensions/netflix_ext/cmd/debug/debug_script_generator.py#L409 (and probably a few other places) so that it doesn't break when this change merges. I should be able to test this out and make the requisite changes. @talsperre for viz as well.

simonwardjones · 2024-12-02T15:35:41Z

OK, the changes look good to me, I just need to probably make a few changes here: https://github.com/Netflix/metaflow-nflx-extensions/blob/main/metaflow_extensions/netflix_ext/cmd/debug/debug_script_generator.py#L409 (and probably a few other places) so that it doesn't break when this change merges. I should be able to test this out and make the requisite changes. @talsperre for viz as well.

Great - thanks @romain-intel. Let me know if there are any outstanding tasks I can help with.

…ture/enable-subclassing-flows

nflx-mf-bot · 2024-12-03T10:41:54Z

Testing[902] @ 8840bd2

nflx-mf-bot · 2024-12-03T14:02:40Z

Testing[902] @ 8840bd2 had 12 FAILUREs.

simonwardjones · 2024-12-06T14:47:41Z

From the previous comment i can see that 12 failures happened somewhere. Is there anyway I can see these and fix them?

metaflow/graph.py

Make compatible with configs.

nflx-mf-bot · 2024-12-17T08:34:33Z

Testing[3895224] @ bb83d0c

romain-intel · 2024-12-17T08:35:46Z

Coincidentally, this is now going to be needed for configs too. I've made a tiny change to make it compatible and will merge as soon as the tests pass (or at least nothing unexpected happens). Yes, finally!

feat: ✨ Change the Graph creation to allow subclassing

0a964bd

saikonen added the testable label Oct 10, 2024

nflx-mf-bot removed the testable label Oct 10, 2024

romain-intel mentioned this pull request Oct 23, 2024

feat: ✨ Change the Graph creation to allow subclassing #2115

Open

simonwardjones added 2 commits October 24, 2024 10:23

Merge branch 'master' into feature/enable-subclassing-flows

74382e1

fix: 🐛 Fix the lineno for the function step

f0e60d9

simonwardjones added 2 commits October 24, 2024 14:13

fix: 🐛 Use sorted nodes in graph

95ef1d4

docs: 📝 Add source_file to the __str__

9315e08

simonwardjones added 4 commits October 25, 2024 16:25

feat: ✨ Show the file when printing exceptions anf fix node variable

49b6904

feat: ✨ Add source_file to the exception base class

82571bc

feat: ✨ Use sorted_nodes in graph __str__

c48ad28

feat: ✨ Update Linting to include source_file in exceptions raised

3b1c1eb

romain-intel added the testable label Dec 2, 2024

nflx-mf-bot removed the testable label Dec 2, 2024

Merge branch 'master' of https://github.com/Netflix/metaflow into fea…

8840bd2

…ture/enable-subclassing-flows

romain-intel added the testable label Dec 3, 2024

nflx-mf-bot removed the testable label Dec 3, 2024

romain-intel reviewed Dec 17, 2024

View reviewed changes

metaflow/graph.py Outdated Show resolved Hide resolved

Update metaflow/graph.py

bb83d0c

Make compatible with configs.

romain-intel added the testable label Dec 17, 2024

nflx-mf-bot removed the testable label Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ✨ Change the Graph creation to allow subclassing #2086

feat: ✨ Change the Graph creation to allow subclassing #2086

simonwardjones commented Oct 8, 2024

nflx-mf-bot commented Oct 10, 2024

nflx-mf-bot commented Oct 10, 2024

romain-intel commented Oct 23, 2024

simonwardjones commented Oct 24, 2024 •

edited

Loading

romain-intel commented Oct 25, 2024

simonwardjones commented Oct 25, 2024

simonwardjones commented Nov 12, 2024

romain-intel commented Dec 2, 2024

nflx-mf-bot commented Dec 2, 2024

romain-intel commented Dec 2, 2024

simonwardjones commented Dec 2, 2024

nflx-mf-bot commented Dec 3, 2024

nflx-mf-bot commented Dec 3, 2024

simonwardjones commented Dec 6, 2024

nflx-mf-bot commented Dec 17, 2024

romain-intel commented Dec 17, 2024

feat: ✨ Change the Graph creation to allow subclassing #2086

Are you sure you want to change the base?

feat: ✨ Change the Graph creation to allow subclassing #2086

Conversation

simonwardjones commented Oct 8, 2024

nflx-mf-bot commented Oct 10, 2024

nflx-mf-bot commented Oct 10, 2024

romain-intel commented Oct 23, 2024

simonwardjones commented Oct 24, 2024 • edited Loading

romain-intel commented Oct 25, 2024

simonwardjones commented Oct 25, 2024

simonwardjones commented Nov 12, 2024

romain-intel commented Dec 2, 2024

nflx-mf-bot commented Dec 2, 2024

romain-intel commented Dec 2, 2024

simonwardjones commented Dec 2, 2024

nflx-mf-bot commented Dec 3, 2024

nflx-mf-bot commented Dec 3, 2024

simonwardjones commented Dec 6, 2024

nflx-mf-bot commented Dec 17, 2024

romain-intel commented Dec 17, 2024

simonwardjones commented Oct 24, 2024 •

edited

Loading