-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
live integration: get rid of dvc from live #5466
Conversation
035ed8a
to
c374c79
Compare
tests/func/experiments/conftest.py
Outdated
@@ -67,7 +67,8 @@ def exp_stage(tmp_dir, scm, dvc): | |||
|
|||
|
|||
@pytest.fixture | |||
def checkpoint_stage(tmp_dir, scm, dvc): | |||
def checkpoint_stage(tmp_dir, scm, dvc, mocker): | |||
mocker.patch("dvc.stage.run.MonitorConfig.AWAIT", 0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+/- 40% time less for test_checkpoints.py
b5d73ba
to
49a9b64
Compare
dvc/stage/run.py
Outdated
def _checkpoint_run(stage, callback_func, done, proc, killed): | ||
"""Run callback_func whenever checkpoint signal file is present.""" | ||
signal_path = os.path.join(stage.repo.tmp_dir, CHECKPOINT_SIGNAL_FILE) | ||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we will need a scheduler for the Runner
soon. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just saying that all the logic in the stage/run
does not belong here.
We should think of decoupling all of the logic here and in the repro
to a Runner
and TaskScheduler
respectively. Runner
should support hooks (on_start
/on_end
/on_failure
/every(1s)
etc.) and actually running the scripts.
And, the TaskScheduler
that schedules stages to be run from the given DAG (which would help us parallelize in the future).
We'd likely come to it later though, just sharing my thoughts π . But this also looks good for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry That sounds like a great idea!
tests/func/test_live.py
Outdated
from copy import deepcopy | ||
from textwrap import dedent | ||
|
||
import pytest | ||
from funcy import first | ||
|
||
from dvc import stage as stage_module | ||
from dvc.exceptions import MetricsError | ||
from dvc.exceptions import MetricDoesNotExistError, MetricsError | ||
|
||
LIVE_SCRITP = dedent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the spelling be fixed here and where it's used below as long as other changes are being made?
LIVE_SCRITP = dedent( | |
LIVE_SCRIPT = dedent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure I understand what's happening here. You are not only making dvclive independent from dvc, but also keeping dvclive-like html output functionality implemented independently within dvc? Or do I have that completely wrong @pared ?
@dberenbaum You got it right. |
Thanks, @pared! So how does dvclive generate the html summary if no |
@dberenbaum Without |
Got it. Thanks, @pared! It seems like |
@dberenbaum |
dvc/stage/run.py
Outdated
@dataclass | ||
class MonitorConfig: | ||
name: str | ||
stage: "Stage" # type: ignore[name-defined] # noqa: F821 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be ignored, you just need
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from dvc.stage import Stage
...
stage: "Stage"
dvc/stage/run.py
Outdated
_kill(proc) | ||
killed.set() | ||
_kill(config.proc) | ||
config.killed.set() | ||
finally: | ||
logger.debug("Remove checkpoint signal file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message should probably be generalized now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also feels like the top level _...
methods here should go into the base Monitor
as class methods now (w/the checkpoint callback one in CheckpointMonitor
)
dvc/stage/run.py
Outdated
super().__exit__(exc_type, exc_val, exc_tb) | ||
|
||
|
||
def _monitor_loop(config: MonitorConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general design question, does it make sense for us to have separate threads watching for each of the checkpoint signal files and the dvclive signal files? Or should we have one monitor thread that checks for multiple signals (and runs their callbacks) sequentially like
while True:
for signal_path, task in signals_to_watch:
if os.path.exists(signal_path):
try:
task()
finally:
remove(signal_path)
I guess the real question is would running both callbacks at the same time in parallel threads potentially cause any issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the user creates the checkpoint signal, their code is supposed to block so that we know the user won't be touching the DVC workspace until we signal them to continue (by removing the signal file). This way we know that we can do whatever is needed in the workspace to create the checkpoint commits safely, and I'm assuming that dvclive works the same way?
In the event that a user creates both a checkpoint and dvclive signal file at the same time, I'm not sure that it is safe for dvclive to be doing anything while the workspace may be in the state where we are moving HEAD around and creating git commits during checkpoint creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmrowla Your proposition makes more sense, Ill introduce necessary changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redesigned so that we have single monitor with multiple tasks provided as needed, depending on stage content. Now Monitor and Tasks starts to look more and more similar to what @skshetry proposes.
ae1a368
to
e7bf95e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, changed the status on this, but I don't have anymore comments.
Thanks for review! After discussing with @efiop we decided that I will give it a try and try to prepare |
@pytest.mark.parametrize("summary", (True, False)) | ||
def test_export_config_tmp(tmp_dir, dvc, mocker, summary, report): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the test_export_config_tmp
test - now that live
is online test_export_config
tests the same things
Extracted monitor logic out of |
def create_summary(out): | ||
from dvc.utils.html import write | ||
|
||
assert out.live and out.live["html"] | ||
|
||
metrics, plots = out.repo.live.show(str(out.path_info)) | ||
|
||
html_path = out.path_info.with_suffix(".html") | ||
write(html_path, plots, metrics) | ||
logger.info(f"\nfile://{os.path.abspath(html_path)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like there's much to this HTML summary (either here or in https://github.com/iterative/dvc/blob/master/dvc/utils/html.py). Would it make sense to build out summary capability in dvclive? That would allow for complete dvclive functionality without dvc.
Also, could the summary output format be abstracted in dvclive so that non-HTML outputs could be built? For example, a matplotlib output that auto-updates, or even a very basic cli output. Different output types could enable users to see realtime updates for model performance (similar to a progress bar) without opening a separate page that needs to be manually refreshed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dberenbaum
This question has few levels.
-
Would it make sense to build out summary capability in dvclive? That would allow for complete dvclive functionality without dvc
I guess it would make sense, but it would mean copying not only html module and this particular function, but quite a lot of code from metrics
and plots
(live.show
is actually calling them) - so all the functionality related to vega-js would probably need to be copied too, the templates for example. Also having it in dvc allows us to use a command dvc live
to visualize the live
outputs. Its not too flashy as of today, but AFAIK, it is supposed to be iterated upon in the future.
non-HTML outputs could be built ?
Probably, but that leaves us with a lot of questions related to plots itself - how would it work on different os-es? What about some pipeline running in a server where we connect via SSH? Also, what to do with plots
command? Do we want to keep the HTML functionality? The plots were made in certain way in order to not have to support our own plotting framework, and use something that probably all of the users share - having browser. I think visualizing live
using different means will be hard both to maintain and explain to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including these features in dvclive doesn't need to impact dvc at all. No need to block this PR or the release.
For users to be able to get value out of dvclive without dvc, could we build a basic summary capability in dvclive? I was wondering whether we could adapt what's currently in https://github.com/iterative/dvc/blob/master/dvc/utils/html.py as a starting point, but it doesn't need to be if there's a simpler way. Looking at that file, it looked like the data structures to feed to HTML.write()
are reasonably simple, and the data is already being collected by dvclive.
Thoughts? I can open a new issue to discuss if it's worthwhile.
What is the intended user experience for including dvclive in a dvc stage? |
@dberenbaum I guess the main point is that if you use |
Is this being documented? I had completely missed this until now π . I see it's in the help for
for _ in epochs:
train()
metrics = evaluate()
with open("metrics.json", "w") as f:
json.dump(metrics, f)
for k, v in metrics.items():
dvclive.log(k, v)
dvclive.next_step()
make_checkpoint()
EDIT: Looks like this already works like I'm requesting. Awesome! Sorry, just playing around with this now. In that case, we just need to document. |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π
Important
Tests are failing because
dvclive
with corresponding change needs to be released. If change is accepted, please do not merge, I need to synchronizedvclive
withdvc
release to make the time window of incompatible versions as small as possible.