-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plots diff: don't make one plot break all the others #9025
Comments
That doesn't sound right |
I think it's intentional -- in the happy path where the underlying data structure doesn't change, it's common to want to propagate a new axis label or template to all prior revisions. |
We merge all of the plot props, so it's not just limited to workspace dvc.yaml file it seems. Line 19 in 37761a2
|
So it merges all revisions into a single set of properties, preferring the more recent revisions when they conflict, right? |
It does not merge in terms of recency, but in the order of revisions passed, so the revision on the left side is preferred (except I guess the arguments are usually passed in the order of latest to oldest, so in that case, it will prefer recent revisions. |
@pmrowla mentioned that it was discussed whether we should be merging all the plots properties. I looked through some of the old issues and PRs and can't find a good rationale for why we do this. It does seem like it would be much simpler to use a single set of properties (either the leftmost revision or workspace in cases where it's implied). Seems like that would resolve some related issues like #9188 (since it would not try to render the old custom template) and #7193. |
I would say, let's agree on expected behavior and implement it. To clarify, when I said:
I meant that it didn't sound like what I thought the code was doing (because of #9025 (comment)), not that I was strongly against the idea of having a single source of truth for properties. |
This comment is also related to #8786. The extension caches revision data (images + datapoints) which means that we should only call
It seems like revision data is actually mutable depending on the combination of revisions passed to
So after a fair bit of deliberation triggered by the ongoing work with errors I think that the extension should only attempt to display templates/images that are available in the workspace's Even doing this we will still get edge cases where the workspace's If that means every time we call Ideally, we would get Please LMK what you think. |
I would say that even when |
@pmrowla Makes sense and is simpler, although I have a couple hesitations:
I'm still not sure those outweigh the simplicity of always using the workspace but wanted to raise them for discussion. |
Thanks for the link to https://github.com/iterative/studio/issues/2574. Took me a while but I think I got up to speed. In a way, Studio is lucky in that it doesn't have to deal with the workspace which makes the situation less dynamic. If we going to continue to attempt to cache revisions in the extension then we need some kind of spine to hold everything together. As the extension is meant for the local/dev experience I would expect that to be the workspace as opposed to HEAD. One alternative that I see (vs using workspace + caching) is to decide on the implementation of mutable revisions for DVC and have the extension call Note: depending on the implementation for iterative/vscode-dvc#1966 we could end up having to try and compare plots between branches in the extension as well.
We could encourage |
Let's move forward with always using the workspace config as @pmrowla suggested. @mattseddon WDYT about trying to always call |
what would be the reason for this, folks? I would still expect a substantial difference for this, no? |
@skshetry Do you have capacity to take on always using the workspace config this sprint?
My reason would be to avoid premature optimization. I have nothing against prioritizing it if it's needed, but I figured we could wait to see the performance first. If it's just as easy to use the existing caching we have, that's great, but I was under the impression from #9025 (comment) that it was broken and would require work to incorporate the new changes. |
Discussed at length in the VS Code planning call this morning. The problem boils down to this: Changes made to a template can lead to changes in the pre-processed datapoints that we receive from plots diff. This makes the datapoints mutable as opposed to immutable. We have the same problem with errors being mutable. We can start by always calling plots diff with the workspace revision and invalidating the cache whenever any |
A few questions:
If we collect config from the workspace, that means the config and hence the plots data might change, no? Unless we calculate hash based on that workspace config. |
Not sure I follow. AFAIK the only config for images is the path.
I thought that using the workspace config would be a small change that would improve usability and reduce errors, but I agree that it's not a blocker/critical on the DVC CLI side. @mattseddon How much does would it help you to know that the config is always defined by the workspace?
Yep, I think that's why @mattseddon proposed that he will invalidate the VS Code cache when |
In an attempt to put a bit more structure to this and clarify the thread for myself a bit and make sure that we are all on the same page :) We have three components:
When we talk about applying something from workspace I think we need to clarify this in regards to all three components to have a full picture from the product perspective. I think @skshetry was specifically asking about definitions (in case of images we don't have a spec yet). And from the product perspective the question is - do we collect definitions across all commits/revisions/whatever or not? Even of the answer for specs is to take to left-most, or the workspace, the question is valid for definitions I think. A user scenario - I have a branch where I added a new directory with images. I added a branch into the table of experiments. I ask the extension or Studio to plot it. And I don't see it if we literally apply definitions and specs from the workspace. It might be confusing. |
I had another discussion about this today with @shcheklein. We decided that we will try dropping the cache data altogether from the extension to see whether or not the performance is acceptable. If performance is not acceptable then we can always try caching revision combinations. My plan is to always call I think this means we can shelf the idea of always applying the workspace config. Some questions that I have on the DVC implementation:
|
In iterative/vscode-dvc#3222, one mis-configured plot for one revision breaks the entire
dvc plots diff
command. in #5984, we made it so that an invaliddvc.yaml
doesn't break other revisions indvc plots diff
. However, even if thedvc.yaml
is valid, a single revision or plot within one revision can breakdvc plots diff
if there are other errors encountered while rendering one of the plots.To do:
Related: #7787
Edit: The problem occurs because the workspace
dvc.yaml
config is applied to all other revisions, and if the data structure has changed, it may be impossible to render an old revision with the currentdvc.yaml
config.The text was updated successfully, but these errors were encountered: