Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp show: Include data files. #6434

Closed
daavoo opened this issue Aug 16, 2021 · 14 comments · Fixed by #7089
Closed

exp show: Include data files. #6434

daavoo opened this issue Aug 16, 2021 · 14 comments · Fixed by #7089
Assignees
Labels
A: experiments Related to dvc exp diff/show Related to the diff/show feature discussion requires active participation to reach a conclusion feature request Requesting a new feature ui user interface / interaction

Comments

@daavoo
Copy link
Contributor

daavoo commented Aug 16, 2021

If an user modifies data files in between dvc exp run calls, the change is not reflected in the table generated by dvc exp show.

In Studio users can view information of the data files associated to each experiment. I think it would be nice to include similar information in the table generated by dvc exp show.

Ideally, the feature would have the same options available for metrics and params (i.e. --include-files/--exclude-files).

@daavoo daavoo added A: experiments Related to dvc exp feature request Requesting a new feature ui user interface / interaction labels Aug 16, 2021
@shcheklein
Copy link
Member

+1 on this. I'll be important for VS Code. On a separate note. We haven't touched source files in the Studio yet, but it would be great to have signal in the table from which you can see that there are changes to the source files between experiments.

@pmrowla
Copy link
Contributor

pmrowla commented Aug 17, 2021

What should the table actually show? I don't think actually including a list of changed files in the CLI table is feasible due to the size constraints.

So should it just be a true/false column to note that there is a change to data files (so users know they should check the changes with dvc diff? Or maybe columns for added/modified/removed with counts that would match the summary totals you get from dvc diff?

@karajan1001
Copy link
Contributor

But if we use added/modified/removed, there comes another question. These notions only represent the status compare to default/workspace version. Unlike with the numbers it is hard to know the differences between two modified exps.

@pmrowla
Copy link
Contributor

pmrowla commented Aug 17, 2021

For experiments in the table it would have to be the diff between the experiment and the original baseline git commit (not the diff of one experiment to another experiment)

To get the difference between two experiments you would just have to use dvc diff exp1 exp2

@daavoo
Copy link
Contributor Author

daavoo commented Aug 19, 2021

+1 on this. I'll be important for VS Code. On a separate note. We haven't touched source files in the Studio yet, but it would be great to have signal in the table from which you can see that there are changes to the source files between experiments.

Any source files or just source files that are deps of a stage?

@dberenbaum
Copy link
Collaborator

What is meaningful info to show about data files? Just whether they have changed from the baseline commit? Size? Number of files? Maybe there is some tag/label/metadata that would be helpful?

Also, what files should be included (dependencies, outputs, dvc-tracked files, all files)?

Ideally, the feature would have the same options available for metrics and params (i.e. --include-files/--exclude-files).

If we add data files to the dvc exp show table, it will become even more important to provide easy ways to narrow the table. Adding more info is not universally desirable since the table can become too busy. See:

@dberenbaum dberenbaum added the discussion requires active participation to reach a conclusion label Aug 19, 2021
@shcheklein
Copy link
Member

Any source files or just source files that are deps of a stage?

I would say just a binary signal that some sources have changed is a good start.

Also, what files should be included (dependencies, outputs, dvc-tracked files, all files)?

My take (from Studio experience) - inputs (datasets) is the most important part. Les so intermediate, etc. We can start with datasets only if want to keep the table sane.

What is meaningful info to show about data files? Just whether they have changed from the baseline commit? Size? Number of files? Maybe there is some tag/label/metadata that would be helpful?

Good question. I think some signal that data has changes is enough (we highlight values in Studio with a special dot icon or something like this). Values - we could do size to start. In Studio users can switch between those.

@karajan1001
Copy link
Contributor

Show the same thing in dvc status is the first step.

@dberenbaum
Copy link
Collaborator

Another idea here is to encourage use of the desc field and show that in the table.

@daavoo
Copy link
Contributor Author

daavoo commented Sep 7, 2021

Another idea here is to encourage use of the desc field and show that in the table.

In addition, we could show the rev (or other selectable property, line in Studio iterative/studio-support#16)

@daavoo daavoo changed the title dvc exp show: Include data files. exp show: Include data files. Sep 7, 2021
@daavoo
Copy link
Contributor Author

daavoo commented Sep 7, 2021

Another idea here is to encourage use of the desc field and show that in the table.

Related #6548

@karajan1001
Copy link
Contributor

Another idea here is to encourage use of the desc field and show that in the table.

But the problem is that as we didn't change the desc after the .dvc file had been generated, it stays the same in all of the revisions. Cann't be taken any comparison.

@dberenbaum
Copy link
Collaborator

I was thinking that users might update the description when they make changes, like cats -> cats and dogs. Maybe it would be better to track the last modified commit and show the commit message 🤷 .

@karajan1001
Copy link
Contributor

karajan1001 commented Sep 10, 2021

I was thinking that users might update the description when they make changes, like cats -> cats and dogs. Maybe it would be better to track the last modified commit and show the commit message 🤷 .

Only support for .dvc files not for pipeline generated ones? That's make sense, Only those dvc add generated data need a name, those generated via pipeline ones can be represented by the parameters from stages.

@daavoo daavoo added the diff/show Related to the diff/show feature label Oct 13, 2021
daavoo added a commit that referenced this issue Dec 3, 2021
Use `repo.index.deps` to collect dependencies associated with each experiment.

Closes #6434
@daavoo daavoo added this to DVC Dec 7, 2021
@daavoo daavoo moved this to Backlog in DVC Dec 7, 2021
@daavoo daavoo moved this from Backlog to In Progress in DVC Dec 7, 2021
@daavoo daavoo self-assigned this Dec 9, 2021
daavoo added a commit that referenced this issue Dec 16, 2021
Removed `--include-metrics` / `--include-params` and
`--exclude-metrics` / `--exclude-params`.

Removed `--no-timestamp` . Can be done by `--drop Created`.

`--drop` and `--keep` operate directly on the table columns.

`--keep` does not perform any filtering. It's only used
to specify columns to keep despite the other filtering.

For example `--only-changed --keep foo` will prevent
`foo` from being removed by `--only-changed`.

Another example, `--drop train.* --keep train.dropout`
will remove all columns matching `train.*` except for
`train.dropout`.

Closes #7079
Pre-requisite for #6434
Pre-requisite for #7080
daavoo added a commit that referenced this issue Dec 17, 2021
Removed `--include-metrics` / `--include-params` and
`--exclude-metrics` / `--exclude-params`.

Removed `--no-timestamp` . Can be done by `--drop Created`.

`--drop` and `--keep` operate directly on the table columns.

`--keep` does not perform any filtering. It's only used
to specify columns to keep despite the other filtering.

For example `--only-changed --keep foo` will prevent
`foo` from being removed by `--only-changed`.

Another example, `--drop train.* --keep train.dropout`
will remove all columns matching `train.*` except for
`train.dropout`.

Closes #7079
Pre-requisite for #6434
Pre-requisite for #7080
@daavoo daavoo moved this from Todo to In Progress in DVC Dec 21, 2021
@daavoo daavoo moved this from In Progress to Review In Progress in DVC Dec 21, 2021
daavoo added a commit that referenced this issue Dec 22, 2021
Use `repo.index.deps` to collect dependencies associated with each experiment.

Closes #6434
daavoo added a commit that referenced this issue Dec 22, 2021
Use `repo.index.deps` to collect dependencies associated with each experiment.

Closes #6434
daavoo added a commit that referenced this issue Dec 23, 2021
Removed `--include-metrics` / `--include-params` and
`--exclude-metrics` / `--exclude-params`.

Removed `--no-timestamp` . Can be done by `--drop Created`.

`--drop` and `--keep` operate directly on the table columns.

`--keep` does not perform any filtering. It's only used
to specify columns to keep despite the other filtering.

For example `--only-changed --keep foo` will prevent
`foo` from being removed by `--only-changed`.

Another example, `--drop train.* --keep train.dropout`
will remove all columns matching `train.*` except for
`train.dropout`.

Closes #7079
Pre-requisite for #6434
Pre-requisite for #7080
daavoo added a commit that referenced this issue Dec 28, 2021
Removed `--include-metrics` / `--include-params` and
`--exclude-metrics` / `--exclude-params`.

Removed `--no-timestamp` . Can be done by `--drop Created`.

`--drop` and `--keep` operate directly on the table columns.

`--keep` does not perform any filtering. It's only used
to specify columns to keep despite the other filtering.

For example `--only-changed --keep foo` will prevent
`foo` from being removed by `--only-changed`.

Another example, `--drop train.* --keep train.dropout`
will remove all columns matching `train.*` except for
`train.dropout`.

Closes #7079
Pre-requisite for #6434
Pre-requisite for #7080
daavoo added a commit that referenced this issue Dec 30, 2021
Removed `--include-metrics` / `--include-params` and
`--exclude-metrics` / `--exclude-params`.

Removed `--no-timestamp` . Can be done by `--drop Created`.

`--drop` and `--keep` operate directly on the table columns.

`--keep` does not perform any filtering. It's only used
to specify columns to keep despite the other filtering.

For example `--only-changed --keep foo` will prevent
`foo` from being removed by `--only-changed`.

Another example, `--drop train.* --keep train.dropout`
will remove all columns matching `train.*` except for
`train.dropout`.

Closes #7079
Pre-requisite for #6434
Pre-requisite for #7080
daavoo added a commit that referenced this issue Jan 3, 2022
Use `repo.index.deps` to collect dependencies associated with each experiment.

Closes #6434
daavoo added a commit that referenced this issue Jan 3, 2022
Removed `--include-metrics` / `--include-params` and
`--exclude-metrics` / `--exclude-params`.

Removed `--no-timestamp` . Can be done by `--drop Created`.

`--drop` and `--keep` operate directly on the table columns.

`--keep` does not perform any filtering. It's only used
to specify columns to keep despite the other filtering.

For example `--only-changed --keep foo` will prevent
`foo` from being removed by `--only-changed`.

Another example, `--drop train.* --keep train.dropout`
will remove all columns matching `train.*` except for
`train.dropout`.

Closes #7079
Closes #7080
Pre-requisite for #6434
daavoo added a commit that referenced this issue Jan 19, 2022
Use `repo.index.deps` to collect dependencies associated with each experiment.

Closes #6434
daavoo added a commit that referenced this issue Jan 19, 2022
Removed `--include-metrics` / `--include-params` and
`--exclude-metrics` / `--exclude-params`.

Removed `--no-timestamp` . Can be done by `--drop Created`.

`--drop` and `--keep` operate directly on the table columns.

`--keep` does not perform any filtering. It's only used
to specify columns to keep despite the other filtering.

For example `--only-changed --keep foo` will prevent
`foo` from being removed by `--only-changed`.

Another example, `--drop train.* --keep train.dropout`
will remove all columns matching `train.*` except for
`train.dropout`.

Closes #7079
Closes #7080
Pre-requisite for #6434
daavoo added a commit that referenced this issue Jan 20, 2022
Removed `--include-metrics` / `--include-params` and
`--exclude-metrics` / `--exclude-params`.

Removed `--no-timestamp` . Can be done by `--drop Created`.

`--drop` and `--keep` operate directly on the table columns.

`--keep` does not perform any filtering. It's only used
to specify columns to keep despite the other filtering.

For example `--only-changed --keep foo` will prevent
`foo` from being removed by `--only-changed`.

Another example, `--drop train.* --keep train.dropout`
will remove all columns matching `train.*` except for
`train.dropout`.

Closes #7079
Closes #7080
Pre-requisite for #6434
Repository owner moved this from Review In Progress to Done in DVC Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp diff/show Related to the diff/show feature discussion requires active participation to reach a conclusion feature request Requesting a new feature ui user interface / interaction
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants