Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Multi File Select on Plot Wizard #4748

Merged
merged 13 commits into from
Oct 5, 2023

Conversation

julieg18
Copy link
Contributor

@julieg18 julieg18 commented Oct 2, 2023

@julieg18 julieg18 added the product PR that affects product label Oct 2, 2023
@julieg18 julieg18 self-assigned this Oct 2, 2023
@@ -110,5 +151,5 @@ export const pickPlotConfiguration = async (): Promise<
return
}

return { ...templateAndFields, dataFile: file }
return { ...templateAndFields }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid too many return statements within this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks like it's doing many things too. It might be good to break it down into subfunctions like validateExtensions, validateFilesData... It would also clear this error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we could also branch the logic depending on the number of files selected.

@dberenbaum
Copy link
Contributor

dberenbaum commented Oct 2, 2023

Looks great @julieg18!

Some thoughts on what else we could do:

@julieg18
Copy link
Contributor Author

julieg18 commented Oct 2, 2023

Can we add title (you could skip inferring a name for the plot if we do this), x_label, and y_label (see https://dvc.org/doc/user-guide/project-structure/dvcyaml-files#available-configuration-fields)?

We could, though we might want to keep in mind that each of these options would involve adding an extra quick pick. If we have too many quick picks, it could lead to plot creation being tedious.

Does it allow for a single x and a single y right now? Most of the value is likely when we allow multiple entries for each.

Yes, a single x and single y field. I wasn't 100% sure if every plot type would work with multiple entries so I decided to keep it as single for now. We could allow multiple entries in a followup :)

extension/src/vscode/resourcePicker.ts Show resolved Hide resolved
@@ -214,21 +214,38 @@ const loadYamlAsDoc = (
}
}

const getPlotYamlObj = (cwd: string, plot: PlotConfigData) => {
const { x, y, template } = plot
const usesSingleFile = x.file === y.file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check for multiple files being used and create the plot object accordingly:

plots:
  # two files
  - scatter_plot:
      template: scatter
      x:
        props.json: acc
      y:
        values.json: prob
  # single files
  - probs.json:
      x: actual
      y: prob

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Do we need this split? Can we make the creation of plots entries a bit more opinionated from the wizard and reduce complexity? Otherwise, when we come to add in custom titles we will have to complicate the differentiation further.

From our demo dvc.yaml:

  - Loss:
      x: step
      y:
        training/plots/metrics/train/loss.tsv: loss
        training/plots/metrics/test/loss.tsv: loss
      y_label: loss
  - Confusion matrix:
      template: confusion
      x: actual
      y:
        training/plots/sklearn/confusion_matrix.json: predicted
  - hist.csv:
      x: preds
      y: digit
      template: bar_horizontal
      title: Histogram of Predictions

Maybe we want to stick to the first two entry types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the two plot types to be more similar:

plots:
  # two files
  - scatter_plot:
      template: scatter
      x:
        props.json: acc
      y:
        values.json: prob
  # single files
  - simple_plot:
      template: simple
      x: actual
      y:
        probs.json: prob

'Failed to parse the requested file. Does the file contain data and follow the DVC plot guidelines for [JSON/YAML](https://dvc.org/doc/command-reference/plots/show#example-hierarchical-data) or [CSV/TSV](https://dvc.org/doc/command-reference/plots/show#example-tabular-data) files?'
)
if (fileExts.size > 1) {
return Toast.showError('Files must of the same type.')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our error handling has gotten more complex with the addition of multiple files.

Original:

  1. Check if file can be parsed.
  2. Check if file holds at least two field options
  3. Go on to plot template picker

Multiple Files:

  1. Check if all files have the same extension
  2. Check if all files can be parsed
  3. Check if all files contain at least one field
  4. Check if we have at least two fields total among all files
  5. Go on to plot template picker

We could also filter out invalid files instead of failing entirely but I chose to just fail since I thought it would be less confusing. Any ideas for simplifying the logic are welcome!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, all the files need to have an array of the same length but I chose not to check for that. DVC fails with this error:

image

We could add a check for it though if we think it would be needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should be checking for things that make DVC fail and providing as much information as possible up front. Remember the main reason we're doing this is for onboarding to plots/simplification of the process.

And (as previously stated) if we fail the process because of a single file then we need to let the user know which file made the process fail and why.

extension/src/pipeline/quickPick.ts Outdated Show resolved Hide resolved
@julieg18 julieg18 marked this pull request as ready for review October 3, 2023 15:25
extension/src/vscode/resourcePicker.ts Show resolved Hide resolved
extension/src/pipeline/quickPick.ts Outdated Show resolved Hide resolved
@@ -110,5 +151,5 @@ export const pickPlotConfiguration = async (): Promise<
return
}

return { ...templateAndFields, dataFile: file }
return { ...templateAndFields }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks like it's doing many things too. It might be good to break it down into subfunctions like validateExtensions, validateFilesData... It would also clear this error.

@mattseddon
Copy link
Member

image

It is not obvious from the workflow that multiple files can be selected for the y-axis. I think we should update this text. If the multi-select is shift+click then that needs to be stated.

mockedPickFile.mockResolvedValueOnce('file.csv')
mockedLoadDataFile.mockReturnValueOnce(undefined)
it('should show a toast message if the files are not the same data type', async () => {
mockedPickFiles.mockResolvedValueOnce(['file.json', 'file.csv'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Is this a dvc constraint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, DVC needs the files to be the same type.

for (const { file, data } of dataArr) {
const fields = getFieldsFromValue(data)

if (fields.length === 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Should we do this check after we've collected all of the keys? If not I think we need to specify which of the files was the one that caused the process to reject.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Should we do this check after we've collected all of the keys? If not I think we need to specify which of the files was the one that caused the process to reject.

Makes sense! I can adjust the toast message to mention which file is failing.

'Failed to parse the requested file. Does the file contain data and follow the DVC plot guidelines for [JSON/YAML](https://dvc.org/doc/command-reference/plots/show#example-hierarchical-data) or [CSV/TSV](https://dvc.org/doc/command-reference/plots/show#example-tabular-data) files?'
)
if (fileExts.size > 1) {
return Toast.showError('Files must of the same type.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should be checking for things that make DVC fail and providing as much information as possible up front. Remember the main reason we're doing this is for onboarding to plots/simplification of the process.

And (as previously stated) if we fail the process because of a single file then we need to let the user know which file made the process fail and why.

@@ -110,5 +151,5 @@ export const pickPlotConfiguration = async (): Promise<
return
}

return { ...templateAndFields, dataFile: file }
return { ...templateAndFields }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we could also branch the logic depending on the number of files selected.

* simplify plot object
* simplify resourcePicker
Copy link
Contributor Author

@julieg18 julieg18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, resolved some of the comments. What's left:

Taking care of these in #4770, but I think this pr is good enough to merge as a first iteration :)

@julieg18 julieg18 enabled auto-merge (squash) October 5, 2023 12:40
@codeclimate
Copy link

codeclimate bot commented Oct 5, 2023

Code Climate has analyzed commit a651084 and detected 2 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 2

The test coverage on the diff in this pull request is 94.5% (85% is the threshold).

This pull request will bring the total coverage in the repository to 95.1% (0.0% change).

View more on Code Climate.

@julieg18 julieg18 merged commit 2ea74b3 into main Oct 5, 2023
3 checks passed
@julieg18 julieg18 deleted the allow-multi-file-select-on-plot-wizard branch October 5, 2023 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product PR that affects product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants