-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guide: ML Pipelines (1): Defining Pipelines & Stages #3414
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first reviews @iesahin ☝🏼
Thanks
Gatsby Cloud Build Reportdvc.org 🎉 Your build was successful! See the Deploy preview here. Build Details🕐 Build time: 1m PerformanceLighthouse report
|
I think it's mergeable @dberenbaum but needs an approval. |
Dismissing per #3414 (comment)
A stage represents individual data processes, including their input and | ||
resulting output which can be combined to build detailed machine learning | ||
pipelines. | ||
Stages capture the commands, scripts, or code that a DVC pipeline executes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capture -> represents (it was better after all I think)
that a DVC pipeline executes
it's like defining pipelines though pipelines. Can we do something like "that you would run as part of your project to get the result (e..g. train.py) ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in afe2ca5. PTAL
evaluate: ... # stage 3 definition | ||
``` | ||
|
||
- Capture other useful metadata such as runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capture and describe ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or specify ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "capture" is enough (high-level, more details in other sections/doc). But feel free to commit:
- Capture other useful metadata such as runtime | |
- Capture and describe other useful metadata such as runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or
- Capture other useful metadata such as runtime | |
- Specify other useful metadata such as runtime |
|
||
<admon type="info"> | ||
|
||
We call this file-based definition _codification_ (YAML format in our case). It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to have it in use case, but no here - not practical, doesn't have any useful information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you but the purpose of this sentence is to be able to get to the next one. I want to say that you can get on GitOps and that's enabled by the codification. Should we just throw the term "codify" somewhere in there without explaining it? Like
Codifying your pipeline with DVC has the added benefit of allowing you to develop pipelines on standard Git workflows (GitOps).
@jorgeorpinel it looks better, I still don't like it tbh. I think the whole pipelines and defining pipelines section should be focused on the first section of the page (where we describe the process). I feel that describing again formally different types of outs, deps, stage doesn't make sense here (at least because it overlaps with a formal definition). We should probably talk more about we should provide some example - actual pipepline files? mention VS Code as an editor that supports schema definition, etc Include things like Jupyter notebooks - how to make a pipeline out of it ... etc wdyt @dberenbaum @jorgeorpinel ? |
Your proposal makes sense to me @shcheklein . I think we can update #2883 based on that, merge #3899 and this, and follow up on that as well as remaining topics for this guide (Reproduction, Operationalizing, Experimentation -- I have drafts of all these docs).
That one I imagined should be a separate page (Experimenting with/ Experimental Pipelines), but we could def. mention |
Agree with @jorgeorpinel that the proposals from @shcheklein make sense but we can include them in future PRs. As long as this PR improves upon the current docs and there's nothing wrong/blocking in it, can we merge?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are not pending discussions, let's merge this.
* initial plan and some content * added content about stages * title and restyle fixes * added dag section * depend to -> depend on * added dependencies section * Update content/docs/user-guide/pipelines/index.md * Update content/docs/user-guide/pipelines/index.md * Restyled by prettier (#3532) Co-authored-by: Restyled.io <commits@restyled.io> * Update content/docs/user-guide/pipelines/index.md Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com> * added pipelines to sidebar * updated the title * fixed formatting * updating for dvc.yaml first * fed -> used * dvc.yaml-first * editing to tell dvc.yaml first * minor fix * url dependency * dvc lock example * section titles for deps * section titles for outputs * reproduction -> running * adding hyperparameters section * added experiments section * adding url dependencies * added outputs section content * minor * added running pipelines content * moved outputs below running * removed plots section header * guide: Defining Data Pipelines * guide: split up Data Pipelines section * Update content/docs/command-reference/plots/templates.md * guide: Data Pipes -> ML Pipes per #3414 (review) * guide: oops, remove op-pipes file * guide: remoge ML Pipes intro per #3414 (review) * guide: mention both imports in Def ML Pipes rel #3414 (review) * guide: move DAG info from cmd ref per #3414 (review) * guide: move all info and links about DAG to ML Pipes section about Dependency Graphs * guide: point from some Stage links to ML Pipes section on Stages (Defining Pipes) * guide: delete Running ML Pipes (for now) * nav: remove future ML Pipes guides * guide: remove ML Pipes/ Experimental Pipes * roll back unrelated changes... * guide: roll back dvc.yaml page changes per #3414 (review) * guide: link ML Pipes/ Defining Stages to dvc.yaml/stages spec * guide: link deps and params tooltip to ML Pipes/ Stages guide sections * guide: links from dvc.yaml doc to ML Pipes/ Stages * guide: more links * guide: oops, remove unused files * remove unrelated change * guide: move stage definition details to ML Pipes from cmd ref * guide: move stage command details into ML Pipes re-link from existing places * ref: roll back unrelated changes * . * ref: few more links to dependency graph in guide * ref: reorg exp init to include simple usage example in Desc per #3414 (review) * concept: reintroduce DAG in more places per #3414 (review) * guide: pipelines are not ML-specific per #3414 (review) * guide: more details for params fields per #3414 (review) * one word * guide: restructure Def Pipes and fix links rel #3414 (review) * guide: rewrite Def Pipes intro per #3414 (review) * guide: move DAG up in Def Pipes per #3414 (review) * guide: inner link in Def Pipes * guide: fix link and typos * start: revert DAG changes per #3414 (review) * guide: use typical ML stage names per #3414 (review) * guide: better flow in Pipes index per #3414 (comment) * glossary: high-level def of Pipes per #3414 (review) * guide: move Stage command to dvc.yaml ref per #3414 (review) * guide: remove abc mention * guide: edits to Defining Pipes * guide: improve Param deps in Def Pipes and remove details from other places * guide: add Outputs to Def Pipes and other edits * guide: update dep, param and out tooltips * guide: separate params in Pipes vs Exps per #3414 (comment) * ref: move Stage commands section of dvc.yaml up per #3414 (review) * guide: update Def Pipes and DAG per #3414 (review) and others * params: more separation of content and create small params file section in dvc.yaml ref * concept: rehash params per #3414 (review) * guide: more holistic pipelining info per #3414 (review) * guide: Pipe edits * params: roll back changes for now... * Revert "params: roll back changes for now..." This reverts commit 23cd9c6. * guide: more edits to Pipelines per #3414 (review) and beyond * params: do not define as "simple values" per #3899 (review) and #3899 (comment) * ref: better params index intro per #3899 (review) and #3899 (review) * ref: mention param groups in dvc.yaml per #3899 (review) * params: DVC can pass them via templating/dict unpacking per #3899 (review) Co-authored-by: iterative <iterative@protagoras.local> Co-authored-by: Emre Şahin <github@emresult.com> Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com> Co-authored-by: Restyled.io <commits@restyled.io>
* initial plan and some content * added content about stages * title and restyle fixes * added dag section * depend to -> depend on * added dependencies section * Update content/docs/user-guide/pipelines/index.md * Update content/docs/user-guide/pipelines/index.md * Restyled by prettier (#3532) Co-authored-by: Restyled.io <commits@restyled.io> * Update content/docs/user-guide/pipelines/index.md Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com> * added pipelines to sidebar * updated the title * fixed formatting * updating for dvc.yaml first * fed -> used * dvc.yaml-first * editing to tell dvc.yaml first * minor fix * url dependency * dvc lock example * section titles for deps * section titles for outputs * reproduction -> running * adding hyperparameters section * added experiments section * adding url dependencies * added outputs section content * minor * added running pipelines content * moved outputs below running * removed plots section header * guide: Defining Data Pipelines * guide: split up Data Pipelines section * Update content/docs/command-reference/plots/templates.md * guide: Data Pipes -> ML Pipes per #3414 (review) * guide: oops, remove op-pipes file * guide: remoge ML Pipes intro per #3414 (review) * guide: mention both imports in Def ML Pipes rel #3414 (review) * guide: move DAG info from cmd ref per #3414 (review) * guide: move all info and links about DAG to ML Pipes section about Dependency Graphs * guide: point from some Stage links to ML Pipes section on Stages (Defining Pipes) * guide: delete Running ML Pipes (for now) * nav: remove future ML Pipes guides * guide: remove ML Pipes/ Experimental Pipes * roll back unrelated changes... * guide: roll back dvc.yaml page changes per #3414 (review) * guide: link ML Pipes/ Defining Stages to dvc.yaml/stages spec * guide: some changes to start improving the dvc.yaml guide * guide: edit Stages section (dvc.yaml guide) and and move Stage entry spec to right after that. * guide: link deps and params tooltip to ML Pipes/ Stages guide sections * guide: links from dvc.yaml doc to ML Pipes/ Stages * guide: update dvc.yaml Templating spec * guide: a couple more admons for dvc.yaml doc * guide: more links * guide: oops, remove unused files * remove unrelated change * ref: update stage add/ run Descs * guide: move stage definition details to ML Pipes from cmd ref * guide: move stage definition details to ML Pipes from cmd ref * guide: move stage command details into ML Pipes re-link from existing places * guide; update Stage entries spec descs. * guide: admons for dvc.yaml page * guide: roll back wrong change * edits to dvc.yaml doc per #3730 (review) * ref: roll back unrelated changes * . * ref: few more links to dependency graph in guide * ref: refactor run, stage add, and repro a little * guide: drop old pipelines guide * ref: revert files one change moved to #4024 * pipelines: revert a bunch of files (for now) per #3789 (comment) * proper admons Co-authored-by: iterative <iterative@protagoras.local> Co-authored-by: Emre Şahin <github@emresult.com> Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com> Co-authored-by: Restyled.io <commits@restyled.io>
Related to #2883
dvc.yaml
introcmd
) detailsrun
,stage add
, andrepro
refs)In review app: https://dvc-org-iesahin-ug-pipe-qykfpz.herokuapp.com/doc/user-guide/data-pipelines
main
together).