Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: improve experiment logs #9442

Closed
5 of 8 tasks
dberenbaum opened this issue May 11, 2023 · 1 comment
Closed
5 of 8 tasks

Epic: improve experiment logs #9442

dberenbaum opened this issue May 11, 2023 · 1 comment
Labels
A: task-queue Related to task queue. epic Umbrella issue (high level). Include here: list of tasks/PRs, details, Qs, timelines, etc p2-medium Medium priority, should be done, but less important

Comments

@dberenbaum
Copy link
Collaborator

dberenbaum commented May 11, 2023

Summary / Background

Provide robust logging for experiment runs.

Scope

When running any experiment, save logs of the output, errors, hardware usage, time ,etc. Be able to retrieve this anytime/anywhere for any experiment, including sharing between users and product (DVC, VS Code, Studio).

Assumptions

  • Only for pipeline execution (not about dvclive-only experiments)

Open Questions

  • How do we share the logs?
  • Should we share live log updates to Studio?

Blockers / Dependencies

  • Can we make it a joint effort with VS Code and Studio teams? Seems like it would be powerful in Studio for workflows like cloud experiments.

General Approach

We already have dvc queue logs. For sharing, we could add dvc queue push/pull or support dvc push/pull --logs

Steps

Phase 1: Make logging work for all experiments

Phase 2: Expand and share logs

  • queue: track logs #8483
  • Time each stage took to execute
  • Hardware usage and type - number of CPUs/GPUs and their usage, same with memory

Timelines

TBD (not yet prioritized)

@dberenbaum dberenbaum added epic Umbrella issue (high level). Include here: list of tasks/PRs, details, Qs, timelines, etc p2-medium Medium priority, should be done, but less important A: task-queue Related to task queue. labels May 11, 2023
@dberenbaum
Copy link
Collaborator Author

Discussed in #9425 that the current dvc queue logs command won't make sense if we want to capture logs for non-queued experiments. Now that we have dropped checkpoints, do we still need a separate queue command or can we merge it with exp?

Looking through the current queue commands:

  • start: could be in exp run --run-all or exp start
  • stop: could be in exp stop
  • status: is it needed? if so, can it be in exp status?
  • logs: could be in exp logs
  • remove: is it needed? this also might depend on whether/how we plan to preserve the logs; some info could be auto-deleted on exp clean
  • kill: could be in exp kill

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: task-queue Related to task queue. epic Umbrella issue (high level). Include here: list of tasks/PRs, details, Qs, timelines, etc p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

1 participant