-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add profiling to dataloader next()
#12124
Conversation
next()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @akashkw!
We cannot profile next()
like this as this would not cover prefetched batches or inter-batch paralelism.
I recently had a call with @daniellepintz where I went over possible solutions. Here's some code pointers:
The concrete lines to profile:
https://github.com/PyTorchLightning/pytorch-lightning/blob/a0655611de460f1659b46150cda256d2e3fa974e/pytorch_lightning/utilities/fetching.py#L267
Option 1:
Pass a profiler reference to the data fetchers, maybe also the stage and dataloader idx for more fine-grained profiling, then use these directly inside the fetching functions
Option 2:
Inject the profiler usage through callables that could be part of these hooks:
https://github.com/PyTorchLightning/pytorch-lightning/blob/a0655611de460f1659b46150cda256d2e3fa974e/pytorch_lightning/utilities/fetching.py#L61-L65
with a pattern similar to the one used in this PR to inject logic around the optimizer.step()
4cc05b2
f878d68
to
d1587a6
Compare
@carmocca do you have any preference between these two approaches? |
I prefer option (2) so the fetching does not need to keep a reference to the trainer. (the profiler is owned by the trainer) |
f6109d5
to
12a3ba5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress!
@@ -485,3 +486,88 @@ def validation_step(self, batch, batch_idx): | |||
assert dm.count_called_on_before_batch_transfer == 4 | |||
assert dm.count_called_transfer_batch_to_device == 4 | |||
assert dm.count_called_on_after_batch_transfer == 4 | |||
|
|||
|
|||
@RunIf(skip_windows=True) # TODO: all durations are 0 on Windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if this is a known issue with the profiler on Windows or a bug...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should open an issue. It just uses time
module. Nothing fancy.
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
What does this PR do?
Adds profiling to
next(dataloader_iter)
in train, eval, and prediction loops.Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃