-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: top level params and metrics #8529
Conversation
dvc/repo/params/show.py
Outdated
rel_param_path = os.path.join(relpath, *repo.fs.path.parts(fs_path)) | ||
if not repo.fs.isfile(fs_path): | ||
if repo.fs.isfile(rel_param_path): | ||
fs_path = rel_param_path | ||
else: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand how paths are supposed to work. :(
Codecov ReportBase: 94.13% // Head: 93.86% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #8529 +/- ##
==========================================
- Coverage 94.13% 93.86% -0.28%
==========================================
Files 435 435
Lines 33335 33459 +124
Branches 4690 4711 +21
==========================================
+ Hits 31379 31405 +26
- Misses 1528 1609 +81
- Partials 428 445 +17
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Looks good so far!
Let's leave it out for the initial implementation. Is the implementation connected to top-level plots? It would be good to have similar behavior between all the top-level keys. |
I duplicated the logic that’s in the plots. But we need to unify them which I’ll work next (which is the biggest part of this task).
This should unify them, and also fix the plots bug. |
@skshetry What's the status here? |
Still working on the above two things, should have something by end of next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the "Python Only" experiments, which I believe is currently the main driver for this, IMO the most critical TODOs are:
bug with top-level plots that dvcfiles are not discovered unless they have one stage
.- Collecting dvc.yaml files from other locations
However, both can be worked around on the DVCLive side, so it would be great to get a minimal version of this P.R. merged as soon as possible to unblock.
II think it's ok if there are many pending follow-ups, as long as they don't break already existing user workflows.
We can not document this until follow-ups are done and only use it "hidden" in the DVCLive context in the meantime.
Agreed @daavoo. Anything we can get merged would be helpful. |
qq - does it imply moving the logic into the Index? |
yes, they will be moved to the Index, at least that's the plan. |
This should be ready for review. Although this works, this is a temporary solution. Overall I am not happy at all with the changes here. I'll work on internal refactoring on index/loaders/dvcfile-api next. |
@@ -502,3 +502,104 @@ def is_out_or_ignored(root, directory): | |||
|
|||
def collect_repo(self, onerror: Callable[[str, Exception], None] = None): | |||
return list(self._collect_repo(onerror)) | |||
|
|||
def _load_file(self, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated from load_file
above that also provides params/metrics/plots. Will refactor in successive PRs. This method serves _collect_all_from_repo
below.
stages.params_data, | ||
) | ||
|
||
def _collect_all_from_repo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad naming I know, also duplicates logic from _collect_repo()
. Again, will be refactored in successive PRs.
self.metrics_data = self.data.get("metrics", []) | ||
self.params_data = self.data.get("params", []) | ||
self.plots_data = self.data.get("plots", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole PR is standing on these 3 lines. It feels weird to use StageLoader
to get other sections, but this was a quick-and-easy solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to rearchitect here.
def _top_metrics(self): | ||
self._collect() | ||
return self._metrics | ||
|
||
@cached_property | ||
def _top_plots(self): | ||
self._collect() | ||
return self._plots | ||
|
||
@cached_property | ||
def _top_params(self): | ||
self._collect() | ||
return self._params | ||
|
||
def _collect(self): | ||
if "stages" in self.__dict__: | ||
return self.stages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all might be temporary, we probably should unify top-level metrics and stage-level metrics. Hence, keeping them as private.
metrics, | ||
plots, | ||
params, | ||
) = self.stage_collector._collect_all_from_repo(onerror=onerror) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collection needs to be brought inside Index
from Repo.stage
. This seemed to require more change than I thought, as we have lots of dependencies on stage.collect()
, which probably also needs to be changed/refactored.
Thank you @skshetry! Let's be clear that this is a prototype right now. We wanted to get something merged so we can start using it internally, but it's understood that it still needs a lot of polishing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some QA for the DVCLive use case and works as expected both on DVC UI and VSCode extension, including the handling of errors on collection.
Didn't look into the code deeply as per the comments of expected reactors soon.
Opened for discussion. Currently, only top-level metrics/params are supported that are lists of strings.
dvc.yaml
files are only discovered when they are at the root of the repo or if they contain at least one stage inside (this was the bug with top-level plots that dvcfiles are not discovered unless they have one stage).TODOs
dvc.yaml
files from other locations