-
Notifications
You must be signed in to change notification settings - Fork 779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New flow CLI, SDK #612
base: master
Are you sure you want to change the base?
New flow CLI, SDK #612
Conversation
8d3155e
to
1c14fa3
Compare
I can't figure out what failed in "R tests on macos-latest"; it just says cd R/tests
Rscript run_tests.R |
00873e5
to
db0a601
Compare
Some updates here:
Celsius has been using this in production (mostly in Batch mode) since August 2021, and some roadblocks to upstreaming have been lifted since then (notably, it looks like Python2 support is on the way out, which I never attempted to support here. I also didn't support R initially, but do now). I've rebased this a few times since opening it, and generally the only issue has been adding the new "entrypoint" ( It would be nice to check in about interest in upstreaming any of this! One simple change I'm considering here is renaming the new package I introduce from |
Hey folks! Is there a path on when (whether?) this PR will be merged? I love the graph composition and the new decorator-enabled syntax. However, I'm not technically apt to contribute, though I could lend help in testing if needed. |
previously this was mimicked by passing an explicit `args` tuple, which would result in SystemExit being caught
This is a large change, factored into 4 sequential groups of commits, addressing pain points I hit as a new user/developer of Metaflow; work leading up to this has been discussed in Slack:
metaflow flow <file[:flow_name]>
)self.next
callspytest
tests undermetaflow/tests
(in addition to existing tests undertest/core
)Everything should be backwards-compatible; flows can optionally be defined using a new API under
metaflow.api
; existing Metaflow public members work as they always have.Examples
See
metaflow/api/README.md
@dsl
for a detailed write-up and examples, but here is a short overview:Support multiple flows per file
Using new (optional)
metaflow flow
CLI entrypoint:The flow name (
[:MyFlow]
above) can be omitted if there is just one flow in the file.Reduce flow-definition boilerplate
start
andend
steps,self.next
calls, and__main__
handler are often superfluous:Define graph structure using decorators, not
self.next
callsMixing graph structure logic (
self.next
calls) into step implementations is clumsy; they are two different levels of abstraction and should be better delineated/separated:(this resolves #604, I believe)
Compose flows via inheritance
Flows are modeled as Python classes, but currently there's no way to compose them (via inheritance, or otherwise). This PR implements a simple inheritance-based composition scheme:
(resolves #245)
Other methods of composing flows may be desirable, but this solves many use cases. Celsius has been using this in production since August 2021.
PR Status
d0
is a merge of all of them, and the 4 "checkpoints" described below start from it as a base.master
@ 53ab9d4)PR Structure
Since there are many commits, I've broken out 4 "checkpoints"; each one basically stands alone, and should be able to be reviewed/merged in sequence:
d0...d1
(minor): nits, mostly related to graph constructiond1...d2
(major): new CLI (metaflow flow <file>[:<flow>] …
), multiple flow definitions per file, multiple FlowSpecs can be run within one Python process, morepytest
testsd2...d3
(minor): morepytest
testsd3...d4
(major): new FlowSpec SDK: graph structure in decorators, hideself.next
calls, support flow composition via inheritanceLet me know if a different decomposition would make more sense, or if this isn't clear.
Invariants
I'll preserve these as updates are made to this PR:
d0
is a merge of open/unrelated PRs, which the subsequent branches are rebased on top of. Currently:[test]
pip extra #665d4
is the same as thedsl
branch that defines this PRd1
,d2
,d3
,d4
)d0...d1
: misc nits, mostly related to graph constructionIntroduces an
IS_STEP
constant as a start to formalizing the ad hoc API around certain function attrs (e.g.is_step
) indicating they are "steps". I build more on this later.This chunk could be folded into #666, or the subsequent
d1...d2
chunk.d1...d2
: support multiple flows per {file, process} + new CLI (metaflow flow <file>[:<flow>] …
)This chunk of commits makes flows behave more like the Python classes they are modeled as:
if __name__ == '__main__': …
handler no longer requiredNew CLI:
metaflow flow <file>[:<flow>] …
To support multiple flow definitions in one file, a new, optional CLI entrypoint is implemented:
as a drop-in replacement for / alternative to:
If there is just one flow in a file, the
:<flow>
portion can be omitted:Users can still use the old form (
python <file> …
and a__name__ == '__main__'
handler; all relevant changes are backwards-compatible), but Metaflow internally uses the new form everywhere.Expanded
Parameter
/ "main flow" global bookkeepingIn addition to supporting multiple flow definitions per file, this change allows multiple flows to be invoked in one Python process.
This requires tracking a (global, mutable) "main" flow (and corresponding
Parameter
s that should be fed toclick
parsing). See changes toparameters.py
.d2...d3
: morepytest
tests3 commits, each adding a new test file:
test_simple_foreach.py
: test a flow (defined in the same file) that squares some integers and then sums them.test_foreach.py
: verify graph structure and data outputs from the02-statistics/stats.py
tutorial flow.test_joins.py
: test several kinds of branching/joining flowsd3...d4
: new FlowSpec SDK: graph structure in decorators, hideself.next
calls, flow composition via inheritanceThis introduces the
metaflow.api
package, containing aFlowSpec
base-class and@step
decorator (which are drop-in replacements formetaflow.{FlowSpec,step}
) as well as new@foreach
and@join
decorators.This alternate
FlowSpec
class and decorators allow constructing flows in a more ergonomic fashion:self.next
calls in steps' function-bodies)self.next
calls are still synthesized+injected under the hood, but abstracted from usersself.next
calls).start
/end
steps are optionalstart
/end
steps are still synthesized under the hood, if not explicitly defined, for minimal disruption to lower layersFlow composition via inheritance
Addresses #245: flows can be "mixed-in" to factor out common sets of steps; see
metaflow/tests/test_inheritance.py
/metaflow/tests/flows/inherited_flows.py
.There are some rough edges:
start
/end
would collide; otherwise this should Just Work)However, it allowed us (Celsius) to combine 5 production flows that were previously being shelled out to separately in sequence (also allowing
resume
to work across them, which previously didn't work), so hopefully it is a good start. We've been using this in production since August 2021.Longer term, I'd like to come up with syntax for directly "injecting" another flow into the body of a FlowSpec class (supporting direct composition instead of relying on the class-inheritance mechanism), and/or supporting
@step
decorators on top-level Python functions (not just methods ofFlowSpec
classes).