-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
queue: track logs #8483
Comments
How would these logs/metadata be shared with other tools (like studio)? Would it need to be pushed to DVC remotes? Or would it need to be stored entirely in git? The main issue here that I can see is that the existing log capturing behavior is entirely separate from the DVC experiment. The logs that are available through the queue commands is the output of an entire encapsulated In order to actually capture this and track it in git/dvc, you would really need to generate the entire experiment (including multiple potential checkpoint commits), and then generate an additional new commit that contains just the log/metadata changes. This workflow also seems a bit strange because for any other non- We could maybe implement this using something like |
I think it should be pushed to DVC remotes and just tracked with its own
Good point. Another option would be to generate the logs only within the stage run so they can be captured as part of the exp commit. I think it would be fine to exclude or keep separate logs of celery operations and dvc boilerplate for setup/teardown of experiments.
So maybe
I would prefer if we can avoid it so we don't have to get even deeper into explaining esoteric git features. |
If we are looking to track individual stage timings, I agree it makes sense for logs to be kept at the stage level as well (and then the log/metadata files can essentially be handled as a per-stage output in dvc.lock) |
Right, stage-level outputs make more sense. In that case, DVC/VS Code/Studio could merge logs and/or times to provide info for the "full" experiment. |
dvc queue
saves info into.dvc/tmp/exps
, including stdout, stderr, and structured JSON output with info like time and return code. This is useful info, but it is only retrievable through queue-specific commands and is treated like temporary data.Instead, each experiment can use DVC to track its own logs and keep them somewhere like
.dvc/logs
with an associated.dvc/logs.dvc
file. This would enable the logs to be saved and shared as part of the experiment. Return code, start/end time, and any other info that we decide to collect in the future, can also be included. The logs and this metadata can be used by Studio, VS Code, and any other experiment tracking interface.For non-queued experiments, there are currently no logs, but maybe we will eventually treat all experiments as queued?
Related: #7160
The text was updated successfully, but these errors were encountered: