-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Multi File Select on Plot Wizard #4748
Conversation
@@ -110,5 +151,5 @@ export const pickPlotConfiguration = async (): Promise< | |||
return | |||
} | |||
|
|||
return { ...templateAndFields, dataFile: file } | |||
return { ...templateAndFields } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid too many return
statements within this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks like it's doing many things too. It might be good to break it down into subfunctions like validateExtensions
, validateFilesData
... It would also clear this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we could also branch the logic depending on the number of files selected.
Looks great @julieg18! Some thoughts on what else we could do:
|
We could, though we might want to keep in mind that each of these options would involve adding an extra quick pick. If we have too many quick picks, it could lead to plot creation being tedious.
Yes, a single x and single y field. I wasn't 100% sure if every plot type would work with multiple entries so I decided to keep it as single for now. We could allow multiple entries in a followup :) |
extension/src/fileSystem/index.ts
Outdated
@@ -214,21 +214,38 @@ const loadYamlAsDoc = ( | |||
} | |||
} | |||
|
|||
const getPlotYamlObj = (cwd: string, plot: PlotConfigData) => { | |||
const { x, y, template } = plot | |||
const usesSingleFile = x.file === y.file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check for multiple files being used and create the plot object accordingly:
plots:
# two files
- scatter_plot:
template: scatter
x:
props.json: acc
y:
values.json: prob
# single files
- probs.json:
x: actual
y: prob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Do we need this split? Can we make the creation of plots
entries a bit more opinionated from the wizard and reduce complexity? Otherwise, when we come to add in custom titles we will have to complicate the differentiation further.
From our demo dvc.yaml
:
- Loss:
x: step
y:
training/plots/metrics/train/loss.tsv: loss
training/plots/metrics/test/loss.tsv: loss
y_label: loss
- Confusion matrix:
template: confusion
x: actual
y:
training/plots/sklearn/confusion_matrix.json: predicted
- hist.csv:
x: preds
y: digit
template: bar_horizontal
title: Histogram of Predictions
Maybe we want to stick to the first two entry types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the two plot types to be more similar:
plots:
# two files
- scatter_plot:
template: scatter
x:
props.json: acc
y:
values.json: prob
# single files
- simple_plot:
template: simple
x: actual
y:
probs.json: prob
'Failed to parse the requested file. Does the file contain data and follow the DVC plot guidelines for [JSON/YAML](https://dvc.org/doc/command-reference/plots/show#example-hierarchical-data) or [CSV/TSV](https://dvc.org/doc/command-reference/plots/show#example-tabular-data) files?' | ||
) | ||
if (fileExts.size > 1) { | ||
return Toast.showError('Files must of the same type.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our error handling has gotten more complex with the addition of multiple files.
Original:
- Check if file can be parsed.
- Check if file holds at least two field options
- Go on to plot template picker
Multiple Files:
- Check if all files have the same extension
- Check if all files can be parsed
- Check if all files contain at least one field
- Check if we have at least two fields total among all files
- Go on to plot template picker
We could also filter out invalid files instead of failing entirely but I chose to just fail since I thought it would be less confusing. Any ideas for simplifying the logic are welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should be checking for things that make DVC fail and providing as much information as possible up front. Remember the main reason we're doing this is for onboarding to plots/simplification of the process.
And (as previously stated) if we fail the process because of a single file then we need to let the user know which file made the process fail and why.
@@ -110,5 +151,5 @@ export const pickPlotConfiguration = async (): Promise< | |||
return | |||
} | |||
|
|||
return { ...templateAndFields, dataFile: file } | |||
return { ...templateAndFields } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks like it's doing many things too. It might be good to break it down into subfunctions like validateExtensions
, validateFilesData
... It would also clear this error.
mockedPickFile.mockResolvedValueOnce('file.csv') | ||
mockedLoadDataFile.mockReturnValueOnce(undefined) | ||
it('should show a toast message if the files are not the same data type', async () => { | ||
mockedPickFiles.mockResolvedValueOnce(['file.json', 'file.csv']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Is this a dvc constraint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, DVC needs the files to be the same type.
for (const { file, data } of dataArr) { | ||
const fields = getFieldsFromValue(data) | ||
|
||
if (fields.length === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Should we do this check after we've collected all of the keys? If not I think we need to specify which of the files was the one that caused the process to reject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Should we do this check after we've collected all of the keys? If not I think we need to specify which of the files was the one that caused the process to reject.
Makes sense! I can adjust the toast message to mention which file is failing.
'Failed to parse the requested file. Does the file contain data and follow the DVC plot guidelines for [JSON/YAML](https://dvc.org/doc/command-reference/plots/show#example-hierarchical-data) or [CSV/TSV](https://dvc.org/doc/command-reference/plots/show#example-tabular-data) files?' | ||
) | ||
if (fileExts.size > 1) { | ||
return Toast.showError('Files must of the same type.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should be checking for things that make DVC fail and providing as much information as possible up front. Remember the main reason we're doing this is for onboarding to plots/simplification of the process.
And (as previously stated) if we fail the process because of a single file then we need to let the user know which file made the process fail and why.
@@ -110,5 +151,5 @@ export const pickPlotConfiguration = async (): Promise< | |||
return | |||
} | |||
|
|||
return { ...templateAndFields, dataFile: file } | |||
return { ...templateAndFields } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we could also branch the logic depending on the number of files selected.
* simplify plot object * simplify resourcePicker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, resolved some of the comments. What's left:
- Improve
pickPlotConfiguration
by breaking into sub functions and branching logic (Allow Multi File Select on Plot Wizard #4748 (comment)) - Add check for file array lengths (Allow Multi File Select on Plot Wizard #4748 (comment))
- State what files are causing failures in Toast messages (Allow Multi File Select on Plot Wizard #4748 (comment))
Taking care of these in #4770, but I think this pr is good enough to merge as a first iteration :)
Code Climate has analyzed commit a651084 and detected 2 issues on this pull request. Here's the issue category breakdown:
The test coverage on the diff in this pull request is 94.5% (85% is the threshold). This pull request will bring the total coverage in the repository to 95.1% (0.0% change). View more on Code Climate. |
Demo
https://github.com/iterative/vscode-dvc/assets/43496356/6a98c334-b10e-4c48-8ad2-40178a9d62e9Screen.Recording.2023-10-05.at.7.24.21.AM.mov
Part of #4654