-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guide: best-practices section #1748
Conversation
imhardikj
commented
Sep 1, 2020
•
edited by jorgeorpinel
Loading
edited by jorgeorpinel
- Creates the Best Practices guide (per guide: add "Best Practices" #72).
- Also creates an Tips and Tricks doc (not in nav) for practices that ended up being too small — should we dissolve that one into notes spread among other related docs?
## Experiments and tracking parameters | ||
|
||
You can use DVC for tuning [parameters](doc/command-reference/params), improving | ||
target [metrics](doc/command-reference/metrics) and visualizing the changes with | ||
[plots](doc/command-reference/plots). In the first step tune parameters in | ||
default `params.yaml` file and reproduce the pipeline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not bad! But per #1705 (review) a previous best practice about how to organize DVC project experiments with Git is needed. The 2 basic options are: as commits in a single branch (plus tags), and as multiple branches (one per experiment).
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Hey @imhardikj no need to worry about this one but please take notice at all of the improvements I did in my last commit, c30116b. Hopefully lots of that makes sense to you and you can learn from it. BTW, I fixed MANY basic grammar issues however and those I really need to emphasize we should avoid going forward. Please review them for future reference. Thanks |
## Matching source code to data | ||
|
||
One of DVC's basic uses is to avoid a disconnection between | ||
[revisions](https://git-scm.com/docs/revisions) of source code and | ||
[versions](/doc/use-cases/versioning-data-and-model-files) of data. DVC replaces | ||
large data files and directories, models, etc. with small |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really a best practice, just an intro to data tracking with DVC. I guess it could stay here... Not sure 🤔
## Managing and sharing large data | ||
|
||
Traditional or cloud storage can be used to store the project's data. You can | ||
share the entire 147 GB of your ML project, with all of its data sources, | ||
intermediate data files, and models with others by setting up DVC | ||
[remote storage](doc/command-reference/remote) (optional). | ||
|
||
This way you can share models trained in a GPU environment with colleagues who | ||
don't have access to GPUs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what's the best practice?
## Tracking experiments with Git | ||
|
||
If you are training different models on your data files in the same project, | ||
using Git commits, tags, or branches makes it easy to manage the project. | ||
|
||
<!-- TODO: needs much elaboration! --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto (in the TODO)
Going to close this as stale for now... But should be able to pick it up again in a week or 2. |