-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plots: fix plots nested configuration #10320
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #10320 +/- ##
==========================================
+ Coverage 90.58% 90.59% +0.01%
==========================================
Files 499 500 +1
Lines 38547 38589 +42
Branches 5567 5572 +5
==========================================
+ Hits 34916 34958 +42
Misses 2986 2986
Partials 645 645 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test
tests/func/plots/test_collect.py
Outdated
from dvc.repo.plots import Plots | ||
|
||
|
||
def test_plots_definitions_works_with_nested_plots(tmp_dir, scm, dvc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good.
[C] A few things from this test for me:
- breadcrumbs would be helpful, i.e. naming the variables to reflect what we are trying to test here. My understanding is that we are verifying that a plot definition from a sub-directory is not being overridden by its parent.
- This would be a good test to have in
tests/integration/plots/test_plots.py
and ensure that nothing untoward happens to the data later in the system. For example, doesstep
get correctly assigned tox
when nox
is present? - If I remember correctly the
plot_id
can be a string like "acc vs loss". I don't think that information breaks the solution but I would like to see that tested as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Does this work for y
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review Matt :)
I tried to take your comments into account, let me know if we understood each others, or if I get you wrong.
Yes it works for y
as well, I added a test condition to show you that ;)
dvc/repo/plots/__init__.py
Outdated
# sort by key length to process the most general paths first | ||
# then override them with more specific ones. | ||
for plot_id, plot_props in sorted( | ||
definitions.items(), key=lambda item: len(item[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[C] Whilst this is an elegant solution the chances of it coming back to bite us is (IMO) high because we are mutating data inside this dense loop. In my experience explicit is better than implicit and explicitly removing child paths from parent definitions would be a better way to go. I will leave this up to you.
# sort by key length to process the most general paths first # then override them with more specific ones.
I would change this to sort by key length to process the parent paths first and override definitions with more specific (child) definitions later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let you compare the two implementations. While I agree in general explicit >> implicit, in that specific case I needed to introduce a nasty little logic. Let me know what is the more clear between the two options (implicit + comment vs explicit) ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things to consider but overall very good. Thanks.
dvc/repo/plots/__init__.py
Outdated
@@ -519,7 +524,7 @@ def unpack_if_dir(fs, path, props: dict[str, str], onerror: Optional[Callable] = | |||
|
|||
if "data" in unpacked: | |||
for subpath in unpacked["data"]: | |||
result["data"].update({subpath: props}) | |||
result["data"].update({subpath: {} | props}) # avoid shared reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use .copy()
to make your intentions clearer for doing a shallow copy. No need of a comment.
result["data"].update({subpath: {} | props}) # avoid shared reference | |
result["data"].update({subpath: props.copy()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is shallow copy needed at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion :)
Yeah we need to copy this props
reference. Otherwise I ended up with cases like this:
architecture:
metrics/
subdir/
wonderful_plot.tsv
super_plot.tsv
dvc.yaml
plots:
metrics:
x: step
metrics/subdir:
x: timestamp
The first call to this function set all the plot configurations for the metrics
folder (wonderful and super plots). Which is what we want. The second call, you expect that it only changes the configs for the content of metrics/subdir
(in that case super_plot.tsv
. It does so 🎉 , but as he mutate the same dictionary reference (props
), the wonderful_plot
configuration ends up changes to x: timestamp
too. 😿
When I was working on system metrics on DVClive I did some tests:
I had some .tsv files organized this way:
I also wanted the plots for
epochs.tsv
andloss.tsv
to have step number as the x-axis andcpu.tsv
andram.tsv
to have timestamp as the x-axis.So I did this in the dvc.yaml:
It felt like a natural extension of the documentation, but I realized this didn't work.
Two things were going on:
1 - a mutable reference to a dict was messing with the configuration of the plot (fix in this PR)
2 - we collect plots in the order mentioned in dvc.yaml. In that case, the x-axis for the folder
dvclive/plots/metrics
overrides the values set by a subdirectory. To avoid that, I now sort the paths from the shortest to the longest.It works wonderfully on my demo project, but I need to add some unit tests.
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏