live integration: get rid of dvc from live #5466

pared · 2021-02-15T17:00:56Z

❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Important

Tests are failing because dvclive with corresponding change needs to be released. If change is accepted, please do not merge, I need to synchronize dvclive with dvc release to make the time window of incompatible versions as small as possible.

pared · 2021-02-19T11:29:20Z

tests/func/experiments/conftest.py

@@ -67,7 +67,8 @@ def exp_stage(tmp_dir, scm, dvc):


 @pytest.fixture
-def checkpoint_stage(tmp_dir, scm, dvc):
+def checkpoint_stage(tmp_dir, scm, dvc, mocker):
+    mocker.patch("dvc.stage.run.MonitorConfig.AWAIT", 0.01)


+/- 40% time less for test_checkpoints.py

skshetry · 2021-02-19T13:08:14Z

dvc/stage/run.py

-def _checkpoint_run(stage, callback_func, done, proc, killed):
-    """Run callback_func whenever checkpoint signal file is present."""
-    signal_path = os.path.join(stage.repo.tmp_dir, CHECKPOINT_SIGNAL_FILE)
+@dataclass


Looks like we will need a scheduler for the Runner soon. :)

I was just saying that all the logic in the stage/run does not belong here.

We should think of decoupling all of the logic here and in the repro to a Runner and TaskScheduler respectively. Runner should support hooks (on_start/on_end/on_failure/every(1s) etc.) and actually running the scripts.
And, the TaskScheduler that schedules stages to be run from the given DAG (which would help us parallelize in the future).

We'd likely come to it later though, just sharing my thoughts 🙂 . But this also looks good for now.

@skshetry That sounds like a great idea!

dberenbaum · 2021-02-19T22:26:54Z

tests/func/test_live.py

 from copy import deepcopy
 from textwrap import dedent

 import pytest
 from funcy import first

 from dvc import stage as stage_module
-from dvc.exceptions import MetricsError
+from dvc.exceptions import MetricDoesNotExistError, MetricsError

 LIVE_SCRITP = dedent(


Can the spelling be fixed here and where it's used below as long as other changes are being made?

Suggested change

LIVE_SCRITP = dedent(

LIVE_SCRIPT = dedent(

dberenbaum

I want to make sure I understand what's happening here. You are not only making dvclive independent from dvc, but also keeping dvclive-like html output functionality implemented independently within dvc? Or do I have that completely wrong @pared ?

pared · 2021-02-22T09:57:44Z

@dberenbaum You got it right.
We want to make live as lightweight as possible in its basic installation, so instead of adding dvc as a dependency, this change aims to introduce checkpoint-like behaviour.
live, if it finds .dvc dir, will create, under proper path kind of semafor file that dvc will be reacting to. If file is present, prepare summary and delete file, wait for another one and repeat.

dberenbaum · 2021-02-22T15:40:17Z

Thanks, @pared! So how does dvclive generate the html summary if no .dvc dir is found?

pared · 2021-02-22T16:21:10Z

@dberenbaum Without dvc, dvclive is unable to produce a summary. It can work only as a logger in that case.

dberenbaum · 2021-02-22T19:40:19Z

Got it. Thanks, @pared! It seems like live without dvc is not a particularly likely regular scenario, since the functionality will be limited to logging metrics to a file, which users can do pretty easily themselves. However, maybe it's a better code design to keep them separate?

pared · 2021-02-22T22:42:33Z

@dberenbaum
So the reasoning for the removal was to provide potential users with a logger that comes with no strings attached, and yet allows close integration with dvc. While it definitely does not sound like a popular scenario, keeping dependencies as scarce as possible was mainly for this purpose.

pmrowla · 2021-02-23T04:56:49Z

dvc/stage/run.py

+@dataclass
+class MonitorConfig:
+    name: str
+    stage: "Stage"  # type: ignore[name-defined] # noqa: F821


This doesn't need to be ignored, you just need

from typing import TYPE_CHECKING if TYPE_CHECKING: from dvc.stage import Stage ... stage: "Stage"

pmrowla · 2021-02-23T05:00:59Z

dvc/stage/run.py

-                _kill(proc)
-                killed.set()
+                _kill(config.proc)
+                config.killed.set()
            finally:
                logger.debug("Remove checkpoint signal file")


This message should probably be generalized now

Also feels like the top level _... methods here should go into the base Monitor as class methods now (w/the checkpoint callback one in CheckpointMonitor)

pmrowla · 2021-02-23T05:12:03Z

dvc/stage/run.py

+        super().__exit__(exc_type, exc_val, exc_tb)
+
+
+def _monitor_loop(config: MonitorConfig):


One general design question, does it make sense for us to have separate threads watching for each of the checkpoint signal files and the dvclive signal files? Or should we have one monitor thread that checks for multiple signals (and runs their callbacks) sequentially like

while True: for signal_path, task in signals_to_watch: if os.path.exists(signal_path): try: task() finally: remove(signal_path)

I guess the real question is would running both callbacks at the same time in parallel threads potentially cause any issues?

When the user creates the checkpoint signal, their code is supposed to block so that we know the user won't be touching the DVC workspace until we signal them to continue (by removing the signal file). This way we know that we can do whatever is needed in the workspace to create the checkpoint commits safely, and I'm assuming that dvclive works the same way?

In the event that a user creates both a checkpoint and dvclive signal file at the same time, I'm not sure that it is safe for dvclive to be doing anything while the workspace may be in the state where we are moving HEAD around and creating git commits during checkpoint creation

@pmrowla Your proposition makes more sense, Ill introduce necessary changes

Redesigned so that we have single monitor with multiple tasks provided as needed, depending on stage content. Now Monitor and Tasks starts to look more and more similar to what @skshetry proposes.

pmrowla

LGTM

dberenbaum

Sorry, changed the status on this, but I don't have anymore comments.

pared · 2021-02-24T20:46:47Z

Thanks for review! After discussing with @efiop we decided that I will give it a try and try to prepare TaskScheduler and Runner in this change.

skshetry · 2021-02-26T14:51:38Z

@pared, I'll share some libraries that I was looking for inspiration, might be helpful/useful: invoke, doit and prefect (mostly for experiment executors, generalizing it for repro).

pared · 2021-02-26T15:26:58Z

@skshetry thanks, though it was misunderstanding between me and @efiop - in this change we will just try to move Monitor logic out of cmd_run. So, definining Scheduler and Runner will not be a scope of this change.

pared · 2021-03-01T14:25:59Z

tests/func/test_live.py

 @pytest.mark.parametrize("summary", (True, False))
-def test_export_config_tmp(tmp_dir, dvc, mocker, summary, report):


remove the test_export_config_tmp test - now that live is online test_export_config tests the same things

pared · 2021-03-01T16:14:49Z

Extracted monitor logic out of stage/run.
Looking at problems that I will face releasing this small change, and looking at
this change I came to a conclusion that it would be better to have those tests inside dvclive. That leads to the need of having fixtures reused between projects. Similar problem in dvcx was solved by copying the fixtures. It seems that official way of handling this kind of issue is to create a pytest-plugin. I think we should prepare the plugin. It could already be used in dvcx, dvclive and potentially in dvc-bench, once we find a capacity to get back to making the benchmarks pytest-like.

dberenbaum · 2021-03-01T16:33:23Z

dvc/repo/live.py

+def create_summary(out):
+    from dvc.utils.html import write
+
+    assert out.live and out.live["html"]
+
+    metrics, plots = out.repo.live.show(str(out.path_info))
+
+    html_path = out.path_info.with_suffix(".html")
+    write(html_path, plots, metrics)
+    logger.info(f"\nfile://{os.path.abspath(html_path)}")


It doesn't look like there's much to this HTML summary (either here or in https://github.com/iterative/dvc/blob/master/dvc/utils/html.py). Would it make sense to build out summary capability in dvclive? That would allow for complete dvclive functionality without dvc.

Also, could the summary output format be abstracted in dvclive so that non-HTML outputs could be built? For example, a matplotlib output that auto-updates, or even a very basic cli output. Different output types could enable users to see realtime updates for model performance (similar to a progress bar) without opening a separate page that needs to be manually refreshed.

@dberenbaum
This question has few levels.

Would it make sense to build out summary capability in dvclive? That would allow for complete dvclive functionality without dvc

I guess it would make sense, but it would mean copying not only html module and this particular function, but quite a lot of code from metrics and plots (live.show is actually calling them) - so all the functionality related to vega-js would probably need to be copied too, the templates for example. Also having it in dvc allows us to use a command dvc live to visualize the live outputs. Its not too flashy as of today, but AFAIK, it is supposed to be iterated upon in the future.

non-HTML outputs could be built ?

Probably, but that leaves us with a lot of questions related to plots itself - how would it work on different os-es? What about some pipeline running in a server where we connect via SSH? Also, what to do with plots command? Do we want to keep the HTML functionality? The plots were made in certain way in order to not have to support our own plotting framework, and use something that probably all of the users share - having browser. I think visualizing live using different means will be hard both to maintain and explain to users.

Including these features in dvclive doesn't need to impact dvc at all. No need to block this PR or the release.

For users to be able to get value out of dvclive without dvc, could we build a basic summary capability in dvclive? I was wondering whether we could adapt what's currently in https://github.com/iterative/dvc/blob/master/dvc/utils/html.py as a starting point, but it doesn't need to be if there's a simpler way. Looking at that file, it looked like the data structures to feed to HTML.write() are reasonably simple, and the data is already being collected by dvclive.

Thoughts? I can open a new issue to discuss if it's worthwhile.

dberenbaum · 2021-03-01T17:25:15Z

What is the intended user experience for including dvclive in a dvc stage?

pared · 2021-03-01T21:18:45Z

@dberenbaum I guess the main point is that if you use dvclive in your code, you don't need to specify live logs directory inside the code. So, if you run dvc run ... --live logs - the information that logs is supposed to be te live logs dir will be passed to the code. And if code itself does not specify init method, provided directory will be used.

dberenbaum · 2021-03-01T21:50:35Z

if you use dvclive in your code, you don't need to specify live logs directory inside the code...

Is this being documented? I had completely missed this until now 😄 . I see it's in the help for dvc stage add and dvc run, but I don't see it in the command reference docs or anywhere else. CC @jorgeorpinel

~~As another feature request for the future, would it be possible to simplify how to write code when wanting to use both checkpoints and dvclive? Right now, I'm writing code like:~~

for _ in epochs:
    train()
    metrics = evaluate()
    with open("metrics.json", "w") as f:
        json.dump(metrics, f)
    for k, v in metrics.items():
        dvclive.log(k, v)
    dvclive.next_step()
    make_checkpoint()

~~This seems redundant. If I use a dvclive callback, I'm not even sure how I would write out the metrics or define the checkpoints for dvc since my code wouldn't iterate through each epoch.~~

EDIT: Looks like this already works like I'm requesting. Awesome! Sorry, just playing around with this now. In that case, we just need to document.

pared mentioned this pull request Feb 16, 2021

dvc integration: get rid of dvc iterative/dvclive#44

Merged

pared force-pushed the live_no_dvc branch from 1c36de1 to 1a205bb Compare February 17, 2021 00:31

This was referenced Feb 17, 2021

2.0 docs iterative/dvc.org#2026

Closed

docs: fix README and create documentation entry for dvclive iterative/dvclive#45

Closed

pared force-pushed the live_no_dvc branch 4 times, most recently from 035ed8a to c374c79 Compare February 19, 2021 11:22

pared commented Feb 19, 2021

View reviewed changes

pared force-pushed the live_no_dvc branch 2 times, most recently from b5d73ba to 49a9b64 Compare February 19, 2021 11:44

skshetry reviewed Feb 19, 2021

View reviewed changes

pared requested review from pmrowla and a team February 19, 2021 13:53

pared changed the title ~~[WIP] live integration: get rid of dvc from live~~ live integration: get rid of dvc from live Feb 19, 2021

dberenbaum reviewed Feb 19, 2021

View reviewed changes

pared force-pushed the live_no_dvc branch from 49a9b64 to 436de2e Compare February 22, 2021 16:27

pmrowla reviewed Feb 23, 2021

View reviewed changes

pared force-pushed the live_no_dvc branch 2 times, most recently from ae1a368 to e7bf95e Compare February 24, 2021 00:14

pared requested a review from pmrowla February 24, 2021 00:20

pmrowla approved these changes Feb 24, 2021

View reviewed changes

pared force-pushed the live_no_dvc branch from e7bf95e to 96d8fce Compare February 24, 2021 10:47

dberenbaum marked this pull request as ready for review February 24, 2021 19:52

dberenbaum approved these changes Feb 24, 2021

View reviewed changes

pared added 4 commits March 1, 2021 12:37

live integration: get rid of dvc from live

9fa1c1f

live: remove from api, add test for html generation during run

f31abf8

stage: run: convert monitors to class context managers

cd7e2de

run: make monitors run in single thread

a53fffe

pared changed the title ~~live integration: get rid of dvc from live~~ [WIP] live integration: get rid of dvc from live Mar 1, 2021

pared force-pushed the live_no_dvc branch from 96d8fce to c7f058e Compare March 1, 2021 12:18

run: move monitor logic out of run

5d7a401

pared force-pushed the live_no_dvc branch from c7f058e to 5d7a401 Compare March 1, 2021 13:41

pared commented Mar 1, 2021

View reviewed changes

pared changed the title ~~[WIP] live integration: get rid of dvc from live~~ live integration: get rid of dvc from live Mar 1, 2021

pared requested a review from efiop March 1, 2021 16:15

dberenbaum reviewed Mar 1, 2021

View reviewed changes

efiop requested a review from pmrowla March 2, 2021 14:13

pmrowla approved these changes Mar 2, 2021

View reviewed changes

pmrowla merged commit 4a8cb80 into iterative:master Mar 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

live integration: get rid of dvc from live #5466

live integration: get rid of dvc from live #5466

pared commented Feb 15, 2021 •

edited

Loading

pared Feb 19, 2021

skshetry Feb 19, 2021 •

edited

Loading

skshetry Feb 19, 2021 •

edited

Loading

pared Feb 19, 2021

dberenbaum Feb 19, 2021

dberenbaum left a comment

pared commented Feb 22, 2021

dberenbaum commented Feb 22, 2021

pared commented Feb 22, 2021

dberenbaum commented Feb 22, 2021

pared commented Feb 22, 2021 •

edited

Loading

pmrowla Feb 23, 2021

pmrowla Feb 23, 2021

pmrowla Feb 23, 2021

pmrowla Feb 23, 2021 •

edited

Loading

pmrowla Feb 23, 2021

pared Feb 23, 2021

pared Feb 24, 2021 •

edited

Loading

pmrowla left a comment

dberenbaum left a comment

pared commented Feb 24, 2021

skshetry commented Feb 26, 2021

pared commented Feb 26, 2021

pared Mar 1, 2021

pared commented Mar 1, 2021

dberenbaum Mar 1, 2021 •

edited

Loading

pared Mar 1, 2021

dberenbaum Mar 1, 2021

dberenbaum commented Mar 1, 2021

pared commented Mar 1, 2021

dberenbaum commented Mar 1, 2021 •

edited by jorgeorpinel

Loading

		super().__exit__(exc_type, exc_val, exc_tb)


		def _monitor_loop(config: MonitorConfig):

		@pytest.mark.parametrize("summary", (True, False))
		def test_export_config_tmp(tmp_dir, dvc, mocker, summary, report):

live integration: get rid of dvc from live #5466

live integration: get rid of dvc from live #5466

Conversation

pared commented Feb 15, 2021 • edited Loading

Important

Choose a reason for hiding this comment

skshetry Feb 19, 2021 • edited Loading

Choose a reason for hiding this comment

skshetry Feb 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dberenbaum left a comment

Choose a reason for hiding this comment

pared commented Feb 22, 2021

dberenbaum commented Feb 22, 2021

pared commented Feb 22, 2021

dberenbaum commented Feb 22, 2021

pared commented Feb 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmrowla Feb 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pared Feb 24, 2021 • edited Loading

Choose a reason for hiding this comment

pmrowla left a comment

Choose a reason for hiding this comment

dberenbaum left a comment

Choose a reason for hiding this comment

pared commented Feb 24, 2021

skshetry commented Feb 26, 2021

pared commented Feb 26, 2021

Choose a reason for hiding this comment

pared commented Mar 1, 2021

dberenbaum Mar 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dberenbaum commented Mar 1, 2021

pared commented Mar 1, 2021

dberenbaum commented Mar 1, 2021 • edited by jorgeorpinel Loading

pared commented Feb 15, 2021 •

edited

Loading

skshetry Feb 19, 2021 •

edited

Loading

skshetry Feb 19, 2021 •

edited

Loading

pared commented Feb 22, 2021 •

edited

Loading

pmrowla Feb 23, 2021 •

edited

Loading

pared Feb 24, 2021 •

edited

Loading

dberenbaum Mar 1, 2021 •

edited

Loading

dberenbaum commented Mar 1, 2021 •

edited by jorgeorpinel

Loading