Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: top level params and metrics #8529

Merged
merged 4 commits into from
Dec 1, 2022
Merged

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Nov 4, 2022

Opened for discussion. Currently, only top-level metrics/params are supported that are lists of strings.
dvc.yaml files are only discovered when they are at the root of the repo or if they contain at least one stage inside (this was the bug with top-level plots that dvcfiles are not discovered unless they have one stage).

TODOs

  • Collecting dvc.yaml files from other locations
  • Collecting top-level metrics/params/plots while collecting stages.
  • Finalizing schema/format
  • Fighting with path manipulation (it's seriously messed up ☢️).
  • Fighting with error collection handler
  • Templating?

Comment on lines 122 to 127
rel_param_path = os.path.join(relpath, *repo.fs.path.parts(fs_path))
if not repo.fs.isfile(fs_path):
if repo.fs.isfile(rel_param_path):
fs_path = rel_param_path
else:
continue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand how paths are supposed to work. :(

@codecov
Copy link

codecov bot commented Nov 4, 2022

Codecov Report

Base: 94.13% // Head: 93.86% // Decreases project coverage by -0.27% ⚠️

Coverage data is based on head (8715d73) compared to base (aa2e830).
Patch coverage: 98.41% of modified lines in pull request are covered.

❗ Current head 8715d73 differs from pull request most recent head 3683530. Consider uploading reports for the commit 3683530 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8529      +/-   ##
==========================================
- Coverage   94.13%   93.86%   -0.28%     
==========================================
  Files         435      435              
  Lines       33335    33459     +124     
  Branches     4690     4711      +21     
==========================================
+ Hits        31379    31405      +26     
- Misses       1528     1609      +81     
- Partials      428      445      +17     
Impacted Files Coverage Δ
dvc/schema.py 100.00% <ø> (ø)
dvc/repo/stage.py 91.44% <96.29%> (-1.11%) ⬇️
dvc/repo/index.py 95.60% <100.00%> (+0.46%) ⬆️
dvc/repo/metrics/show.py 95.34% <100.00%> (+0.47%) ⬆️
dvc/repo/params/show.py 91.91% <100.00%> (+0.71%) ⬆️
dvc/stage/loader.py 99.24% <100.00%> (+0.03%) ⬆️
tests/func/metrics/test_diff.py 100.00% <100.00%> (ø)
tests/func/params/test_diff.py 100.00% <100.00%> (ø)
tests/unit/command/test_status.py 100.00% <100.00%> (ø)
tests/func/test_unprotect.py 78.57% <0.00%> (-21.43%) ⬇️
... and 19 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dberenbaum
Copy link
Collaborator

Looks good so far!

  • Templating?

Let's leave it out for the initial implementation.

Is the implementation connected to top-level plots? It would be good to have similar behavior between all the top-level keys.

@skshetry
Copy link
Member Author

skshetry commented Nov 5, 2022

Is the implementation connected to top-level plots? It would be good to have similar behavior between all the top-level keys.

I duplicated the logic that’s in the plots. But we need to unify them which I’ll work next (which is the biggest part of this task).

  • Collecting dvc.yaml files from other locations
  • Collecting top-level metrics/params/plots while collecting stages

This should unify them, and also fix the plots bug.

@dberenbaum
Copy link
Collaborator

@skshetry What's the status here?

@skshetry
Copy link
Member Author

Still working on the above two things, should have something by end of next week.

Copy link
Contributor

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dberenbaum @skshetry

For the "Python Only" experiments, which I believe is currently the main driver for this, IMO the most critical TODOs are:

  • bug with top-level plots that dvcfiles are not discovered unless they have one stage.
  • Collecting dvc.yaml files from other locations

However, both can be worked around on the DVCLive side, so it would be great to get a minimal version of this P.R. merged as soon as possible to unblock.

II think it's ok if there are many pending follow-ups, as long as they don't break already existing user workflows.
We can not document this until follow-ups are done and only use it "hidden" in the DVCLive context in the meantime.

@dberenbaum
Copy link
Collaborator

Agreed @daavoo. Anything we can get merged would be helpful.

@shcheklein
Copy link
Member

Collecting top-level metrics/params/plots while collecting stages

qq - does it imply moving the logic into the Index?

@skshetry
Copy link
Member Author

Collecting top-level metrics/params/plots while collecting stages

qq - does it imply moving the logic into the Index?

yes, they will be moved to the Index, at least that's the plan.

@skshetry skshetry changed the title wip: top level params and metrics feat: top level params and metrics Dec 1, 2022
@skshetry
Copy link
Member Author

skshetry commented Dec 1, 2022

This should be ready for review. Although this works, this is a temporary solution. Overall I am not happy at all with the changes here.

I'll work on internal refactoring on index/loaders/dvcfile-api next.

@@ -502,3 +502,104 @@ def is_out_or_ignored(root, directory):

def collect_repo(self, onerror: Callable[[str, Exception], None] = None):
return list(self._collect_repo(onerror))

def _load_file(self, path):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated from load_file above that also provides params/metrics/plots. Will refactor in successive PRs. This method serves _collect_all_from_repo below.

stages.params_data,
)

def _collect_all_from_repo(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad naming I know, also duplicates logic from _collect_repo(). Again, will be refactored in successive PRs.

Comment on lines +27 to +29
self.metrics_data = self.data.get("metrics", [])
self.params_data = self.data.get("params", [])
self.plots_data = self.data.get("plots", {})
Copy link
Member Author

@skshetry skshetry Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole PR is standing on these 3 lines. It feels weird to use StageLoader to get other sections, but this was a quick-and-easy solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to rearchitect here.

Comment on lines +83 to +99
def _top_metrics(self):
self._collect()
return self._metrics

@cached_property
def _top_plots(self):
self._collect()
return self._plots

@cached_property
def _top_params(self):
self._collect()
return self._params

def _collect(self):
if "stages" in self.__dict__:
return self.stages
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These all might be temporary, we probably should unify top-level metrics and stage-level metrics. Hence, keeping them as private.

metrics,
plots,
params,
) = self.stage_collector._collect_all_from_repo(onerror=onerror)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collection needs to be brought inside Index from Repo.stage. This seemed to require more change than I thought, as we have lots of dependencies on stage.collect(), which probably also needs to be changed/refactored.

@dberenbaum
Copy link
Collaborator

Thank you @skshetry!

Let's be clear that this is a prototype right now. We wanted to get something merged so we can start using it internally, but it's understood that it still needs a lot of polishing.

Copy link
Contributor

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some QA for the DVCLive use case and works as expected both on DVC UI and VSCode extension, including the handling of errors on collection.

Didn't look into the code deeply as per the comments of expected reactors soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants