-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create DVC
experiment on live.end
.
#366
Conversation
58ca881
to
802c503
Compare
7714f8b
to
965de7d
Compare
Codecov ReportBase: 96.63% // Head: 95.23% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #366 +/- ##
==========================================
- Coverage 96.63% 95.23% -1.41%
==========================================
Files 36 37 +1
Lines 1811 1950 +139
Branches 160 151 -9
==========================================
+ Hits 1750 1857 +107
- Misses 35 66 +31
- Partials 26 27 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
DVC
experiment on live.end
.
self._inside_dvc_exp = True | ||
else: | ||
# `Python Only` execution | ||
# TODO: How to handle `dvc repro` execution? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dberenbaum what do you think we should do if DVCLive is used inside a dvc pipeline that has been executed with dvc repro
?
I think we should skip the experiment creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's skip it. We may still want to call make_dvcyaml
, but we can skip that also for now if it simplifies things.
assert load_yaml(live.dvc_file) == { | ||
"metrics": ["metrics.json"], | ||
"params": ["params.yaml"], | ||
"plots": ["plots"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up: configure sklearn plots
|
||
|
||
def make_dvcyaml(live): | ||
if not os.path.exists(live.dvc_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will dvc.yaml
get deleted each time a new live instance is created? Maybe we need some option to configure whether to overwrite it. In some cases, I may log new stuff and want to overwrite, but in others I might have added some custom config that I want to keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will dvc.yaml get deleted each time a new live instance is created?
It should not be deleted and it should preserve the manual modifications from users.
In some cases, I may log new stuff and want to overwrite, but in others I might have added some custom config that I want to keep.
What would be overwritten? The current version is extremely simple as it doesn't configure sklearn plots (added a follow-up #371)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Not a blocker.
What would be overwritten?
As you say, it will include sklearn plots config, which seems like the most complicated scenario, so let's discuss there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably just write to the root of the repo, not to its own file by default. It is the dvc way, and this is against the principle of having one file to rule them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 But the dvclive way is to create a self-contained folder 😄 . IMHO dvclive should not write to the user's root dvc.yaml
file, because it it makes it hard to share only the dvclive output, and it mixes machine-generated config with user-generated config. Also, if there is some conflict between the existing config and what dvclive generates, I would rather have dvclive run successfully and be able to later debug the conflicts between the files.
self._exp_name = random_exp_name( | ||
self._dvc_repo, self._baseline_rev | ||
) | ||
make_dvcyaml(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not as sure that we should skip make_dvcyaml
, but we can follow up on this separate from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another follow up: pass an experiment name to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And another: consider whether to include the dvclive user script/notebook in the tracked files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And another: consider whether to include the dvclive user script/notebook in the tracked files.
Do you mean to include it in the include_tracked
list passed to experiments save or other thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, whether to include it in include_tracked
. I'm not sure it's needed, so it was more of a discussion point than a request. There's something clean about the default being to only save the live dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far! Very cool to see this come to life.
Demo from @daavoo: exp-onboarding.mp4 |
Enable with `save_dvc_exp=True`. Defaults to `False`. Refactor `__init__` method. Split into private `_init_{component}` methods. Add `_` prefix to private properties. Use env vars from iterative/dvc#8630 to skip Studio `start` and `done` events. Use same env vars to skip creating DVC exp.
@@ -17,11 +17,16 @@ def __init__( | |||
dir: Optional[str] = None, # noqa pylint: disable=redefined-builtin | |||
resume: bool = False, | |||
report: Optional[str] = "auto", | |||
save_dvc_exp: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dberenbaum named the option like this. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it makes sense for the dvc.yaml
saving part and whether those should be coupled together, but I think it's fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though your concerns applied to both the git ref and the dvc.yaml
creation
Closes #310
Closes #311
Requires iterative/dvc#8599 and iterative/dvc#8529