-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to compare plots with different definitions or specs #3676
Comments
We always give the workspace the top priority and then order revisions based on their timestamp. |
@mattseddon Sounds good, thanks. Any thoughts about what priority to give running experiments in the queue/tmp dir? |
this is only if workspace is selected, right? |
I would give them a lower priority than the workspace (if selected) but higher than all completed experiments.
Yes |
@mattseddon Does this make sense? |
Can we expand on this please, how that would be different @dberenbaum from what you suggest?
It means that plots won't be stable, right? Depending on if workspace is selected or not for example, we'll see the different result for all other commits? Not saying it's wrong, trying to wrap my mind around it once again. |
How does DVC do it? Specifically, how does it merge specs (properties) if they differ? What does it mean to merge plot definitions? |
By most recent do you mean by the timestamp of an experiment/commit? Makes sense to me though. This is what I would expect. A superset of all the available plots. |
Studio will create an additional plot for every different plot spec. For example, if I change the
Correct, that's the downside vs the Studio approach. I can't think of anyway to satisfy all the "requirements" I listed above in one approach unfortunately.
I think it's easiest to refer to iterative/dvc#9025 (comment).
👍 I think so. Still trying to work through it myself, but this seems reasonable to me. |
So, the difference between what you suggest and DVC would be that we show in DVC some older plots with some older specs?
Can we please outline the logic using this basic example, just to be sure that we are on the same page? |
|
okay, how about VS Code now and DVC now and Studio now? |
Here's an example based on https://github.com/iterative/lstm_seq2seq/tree/plots. The changes over the last 3 commits:
full log$ git log -p
commit 2f52e2b57de94e8c6f302f6b1ad4b2863076aa92 (HEAD -> plots)
commit 2f52e2b57de94e8c6f302f6b1ad4b2863076aa92 (HEAD -> plots)
Author: dberenbaum <dave@iterative.ai>
Date: Thu Apr 20 13:47:16 2023 -0400
clean up plots
diff --git a/dvc.yaml b/dvc.yaml
index dfee14b..52c4100 100644
--- a/dvc.yaml
+++ b/dvc.yaml
@@ -36,4 +36,4 @@ plots:
y:
results/plots/metrics/train/epoch/loss.tsv: loss
results/plots/metrics/val/loss.tsv: loss
- y_label: Accuracy
+ y_label: Loss
diff --git a/results/dvc.yaml b/results/dvc.yaml
index 8f5a7e9..4930788 100644
--- a/results/dvc.yaml
+++ b/results/dvc.yaml
@@ -1,5 +1,2 @@
metrics:
- metrics.json
-plots:
-- plots/metrics:
- x: step
commit 41fe94db95799844746b55d4af983b662e678c94
Author: dberenbaum <dave@iterative.ai>
Date: Thu Apr 20 13:44:22 2023 -0400
add top-level plots
diff --git a/dvc.yaml b/dvc.yaml
index bfd4e38..dfee14b 100644
--- a/dvc.yaml
+++ b/dvc.yaml
@@ -24,3 +24,16 @@ stages:
- results/plots:
cache: false
persist: true
+plots:
+- Accuracy:
+ x: step
+ y:
+ results/plots/metrics/train/epoch/acc.tsv: acc
+ results/plots/metrics/val/acc.tsv: acc
+ y_label: Accuracy
+- Loss:
+ x: step
+ y:
+ results/plots/metrics/train/epoch/loss.tsv: loss
+ results/plots/metrics/val/loss.tsv: loss
+ y_label: Accuracy
commit 67ba6c070c75e81b5e8ca0e35070e1016ff00109 (origin/main, origin/HEAD, main)
Author: Olivaw[bot] <olivaw@iterative.ai>
Date: Thu Apr 13 14:22:16 2023 +0000
... Plots in VS Code: Notes:
Plots in DVC CLI: In DVC CLI, I can manipulate the order to get different results. For example, by reversing the order to Notes:
Plots in Studio: Notes:
I think the VS Code logic is reasonable and we should mostly try to keep it as is. If we agree, I see a few action points:
|
@dberenbaum since you have it in your head, could you please translate it into that W, commit A, p1, p2, p3 model please? It would be way easier for me personally to wrap my mind around this then. Or do you feel it's not possible, we would miss some details? |
I think it's not totally possible. There's no workspace in Studio, for example, and I think it's more clear to show a working example than for me to take your hypothetical and explain how I think it would look in each product. Happy to do anything else to summarize the example better so you don't have to spend a lot of time figuring out the context. |
@dberenbaum I agree on the direction for DVC, VS Code - and it feels we don't need that much, it's already done the way it should be?
I'm not sure I understand this. Is there a possibility that it's just hidden because data is the exactly the same? I would deprioritize touching Studio for now tbh. Let's create a ticket and get there when we have a bit more time. Feels like it can be involved? |
Yes, I think we only truly need the first action point above.
No, I have verified that it's not the case. It's a separate issue already tracked with DVC, so if VS Code continues to rely on the DVC logic, we can track it in #7913. I think I would need it as a user since my plots will evolve over time, but we can wait until we hear from users about it.
Yup, will do. Edit: Let's not even open an issue yet. There's enough noise in Studio. |
Perhaps we can tackle both of these things on the DVC side at the same time? I'd still be happy to get involved. |
We can discuss it during sprint meetings tomorrow and I'll let you know. |
Getting the datapoints for this should be easy, but not sure yet how to get the spec for live experiments. Need to also consider how to collect live image data. |
I think this part expands the scope too much, so let's cut it and strictly focus on incorporating live results from experiments running in the queue. |
Opened iterative/dvc#9369, so I think we can close this one. |
Related: iterative/dvc#9025
We need to decide how to compare plots across the workspace, queue temp dirs, branches, and other revisions when the plots have different IDs/definitions or properties/specs. Let's decide what works for VS Code, and then we can decide whether this makes sense in DVC CLI.
Users may select and deselect experiments in the UI, and the plots should remain stable (like the colors):
At the same time, the plots should also be flexible:
Thoughts on this approach?
Questions:
For reference, how it works today in each product:
dvc plots diff rev1 rev2 ...
.dvc plots diff ...
. How do we decide the order @mattseddon? VS Code also overrides properties like color to keep them stable.The text was updated successfully, but these errors were encountered: