exp show: --queued and --failed #7986

dberenbaum · 2022-07-07T14:48:02Z

dvc exp show should only include successfully completed experiments by default but should have options to include queued/failed items from the queue. (edit: exp show --queued/--failed)

Originally posted by @dberenbaum in #7592 (comment)

The text was updated successfully, but these errors were encountered:

dberenbaum · 2022-07-07T14:49:37Z

Need to coordinate with VS Code on these changes cc @mattseddon

mattseddon · 2022-07-13T20:56:02Z

Related to iterative/vscode-dvc#1995 & iterative/vscode-dvc#1996.

dberenbaum · 2022-07-14T18:52:11Z

Given the discussion of dvc exp being the primary interface over dvc queue, it might make more sense to include all of these by default and make the flags --hide-queued/--hide-failed.

karajan1001 · 2022-09-07T12:46:48Z

@mattseddon I'm now working on this, but I found that there is still some data format required to be decided.

In the current json result of exp show there is no flag for failed experiments .

    "a4dedc07945e645a3b5ca0ede77330908688ea47": {
      "data": {
        "timestamp": "2022-08-26T15:55:17",
        "params": {
          "params.yaml": {
            "data": {
              "interval": 1
            }
          }
        },
        "deps": {},
        "outs": {
          "output.txt": {
            "hash": "c4ca4238a0b923820dcc509a6f75849b",
            "size": 1,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": false
          },
          "out.txt": {
            "hash": null,
            "size": null,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": false
          }
        },
        "queued": true,
        "running": false,
        "executor": null
      }
    }

We need a new flag for it. what do you prefer "success": false or "failed": true. Some other things to be noticed:

the default value to compatible with old versions.
for those failed experiments, you should also disregard the info in outs.

dberenbaum · 2022-09-07T19:22:27Z

Do we need separate fields for each state, or are they all mutually exclusive and we should have a single state field?

karajan1001 · 2022-09-08T02:43:28Z

Do we need separate fields for each state, or are they all mutually exclusive and we should have a single state field?

All are OK for me but will give more trouble to those who are parsing the JSON themselves, inside the DVC the JSON parsing is always going with the JSON format. This is why I'm asking for advice from matt.

mattseddon · 2022-09-08T05:30:58Z

Is completed/failed a boolean result? Can there be a partial failure? If it is a boolean then I'm happy with succeeded of failed. Will the error message will be nested somewhere within the dict? We will want to display as much information to the user as possible 👍🏻.

karajan1001 · 2022-09-08T06:59:46Z

Is completed/failed a boolean result? Can there be a partial failure? If it is a boolean then I'm happy with succeeded of failed. Will the error message will be nested somewhere within the dict? We will want to display as much information to the user as possible 👍🏻.

For the error message, it could be gotten through the dvc queue logs, this command reads the stored output of the executor. It's not hard for me to read the message from this file and paste them into the JSON, the only question is What should I put into the JSON, is the last line of the output enough? And for current discussion the final JSON format would be something like

"sha_rev": {
  "data": {
    "timestamp": %YYYY-%mm-%ddT%HH-%MM-%SS,
    "params": {
...
      },
    "error_msg": "", (if added)
    },
    "deps": {
...
},
    "outs": {
...
    },
    "queued": bool,
    "running": bool,
    "success":bool,
    "executor": ...
  }
}

and what do you think about using a single "status" field instead of three boolean ones? Because you might get a confused result with more than one true (of course it's a bug but it is still possible).

mattseddon · 2022-09-15T02:08:02Z

The format that we expect for a record that is completely broken is:

"sha_rev": {
  "error": {
    "msg": error_msg,
    "type": ""
      },
    },
    ...
    "success":false,
  }
}

Where the error key replaces data.

However, if we know what the params/deps are we could simply put the error message into the missing metrics/outs files.

{
  "data": {
    "timestamp": "2022-09-15T11:07:57",
    "params": {
      "params.yaml": {
        "data": {
          "lr": 0.006,
          "weight_decay": 0,
          "epochs": 15
        }
      }
    },
    "deps": {
      "data/MNIST": {
        ...
      },
      ...
    },
    ...
    "metrics": {
      "metric_file": {
        "error": {
          "msg": error_msg
          "type": error_type
        }
      }
    },
  }
}

The second option is IMO more beneficial to users because they'd want to know which of the queued items have failed so that they can retry them. If we omit the param info and only show failed records they would have to reverse engineer/work out which ones have failed.

and what do you think about using a single "status" field instead of three boolean ones? Because you might get a confused result with more than one true (of course it's a bug but it is still possible).

I would be fine to merge the running/queued/success/pending fields into a single status field. This is a breaking change though so we would need to coordinate on the release. Anyone using an old version of the extension with a new version of the CLI is going to have a weird experience.

karajan1001 · 2022-09-15T09:30:53Z

The second option is IMO more beneficial to users because they'd want to know which of the queued items have failed so that they can retry them. If we omit the param info and only show failed records they would have to reverse engineer/work out which ones have failed.

It is more an execution failure instead of metric_file file format wrong. The error message is probably some callstack message. So I think

"sha_rev": {
  "error": {
    "msg": error_msg,
    "type": ""
      },
    },
    ...
    "success":false,
  }
}

Makes more sense.

I would be fine to merge the running/queued/success/pending fields into a single status field. This is a breaking change though so we would need to coordinate on the release. Anyone using an old version of the extension with a new version of the CLI is going to have a weird experience.

Another choice is to keep both

    "queued": bool,
    "running": bool,

and status for a time, the front end can first look if the status field exists and if not back to the old version of the parsing method. And remove the old API 1 or 2 versions later.

fix: iterative#7986 1. Add two new flags `--hide-queued` and `--hide-failed` to `exp show` 2. Allow `exp show` to show failed experiments. 3. Add unit test for the failed experiments shown.

fix: iterative#7986 1. Add two new flags `--hide-queued` and `--hide-failed` to `exp show` 2. Allow `exp show` to show failed experiments. 3. Add unit test for the failed experiments shown. 4. Add name support for failed exp

fix: iterative#7986 1. Add two new flags `--hide-queued` and `--hide-failed` to `exp show` 2. Allow `exp show` to show failed experiments. 3. Add unit test for the failed experiments shown. 4. Add name support for failed exp 5. Add error msg to the `exp show` output

fix: #7986 1. Add two new flags `--hide-queued` and `--hide-failed` to `exp show` 2. Allow `exp show` to show failed experiments. 3. Add unit test for the failed experiments shown. 4. Add name support for failed exp 5. Add error msg to the `exp show` output

dberenbaum added the A: experiments Related to dvc exp label Jul 7, 2022

dberenbaum added this to DVC Jul 7, 2022

dberenbaum moved this to Backlog in DVC Jul 7, 2022

dberenbaum moved this from Backlog to Todo in DVC Jul 7, 2022

mattseddon mentioned this issue Jul 15, 2022

Show experiments queue info iterative/vscode-dvc#1995

Closed

karajan1001 moved this from Todo to In Progress in DVC Sep 6, 2022

karajan1001 self-assigned this Sep 6, 2022

karajan1001 mentioned this issue Sep 19, 2022

exp show :Add --hide-queued and --hide-failed flag #8318

Merged

2 tasks

karajan1001 moved this from In Progress to Review In Progress in DVC Sep 27, 2022

karajan1001 closed this as completed in #8318 Oct 6, 2022

Repository owner moved this from Review In Progress to Done in DVC Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp show: --queued and --failed #7986

exp show: --queued and --failed #7986

dberenbaum commented Jul 7, 2022

dberenbaum commented Jul 7, 2022

mattseddon commented Jul 13, 2022

dberenbaum commented Jul 14, 2022

karajan1001 commented Sep 7, 2022 •

edited

Loading

dberenbaum commented Sep 7, 2022

karajan1001 commented Sep 8, 2022

mattseddon commented Sep 8, 2022

karajan1001 commented Sep 8, 2022 •

edited

Loading

mattseddon commented Sep 15, 2022

karajan1001 commented Sep 15, 2022

exp show: --queued and --failed #7986

exp show: --queued and --failed #7986

Comments

dberenbaum commented Jul 7, 2022

dberenbaum commented Jul 7, 2022

mattseddon commented Jul 13, 2022

dberenbaum commented Jul 14, 2022

karajan1001 commented Sep 7, 2022 • edited Loading

dberenbaum commented Sep 7, 2022

karajan1001 commented Sep 8, 2022

mattseddon commented Sep 8, 2022

karajan1001 commented Sep 8, 2022 • edited Loading

mattseddon commented Sep 15, 2022

karajan1001 commented Sep 15, 2022

karajan1001 commented Sep 7, 2022 •

edited

Loading

karajan1001 commented Sep 8, 2022 •

edited

Loading