Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More consistent trace names. #1825

Merged
merged 3 commits into from
Oct 15, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion torchtune/training/_profiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@


import os
import socket
import time
from functools import partial
from pathlib import Path
Expand Down Expand Up @@ -98,7 +99,9 @@ def trace_handler(
# Use tensorboard trace handler rather than directly exporting chrome traces since
# tensorboard doesn't seem to be able to parse traces with prof.export_chrome_trace
exporter = tensorboard_trace_handler(
curr_trace_dir, worker_name=f"rank{rank}", use_gzip=True
curr_trace_dir,
worker_name=f"rank{rank}_" + f"{socket.gethostname()}_{os.getpid()}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry noob question on this choice of worker_name: if I am launching a bunch of runs with profiling on the same host and not keeping track of the pid when I launch, does this actually solve the problem? Like why not instead allow the manual specification of an output filename or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebsmothers We can do a mamed argument probably. But I was speaking about solution which comes "out of the box". If we will do something like expirement_name: str = "", probably it wan't be usually defined if we don't actually require to define it. Let me update the PR and see if we can do better

use_gzip=True,
)
exporter(prof)

Expand Down
Loading