-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1/n Generalize internal checks for Accelerator in Trainer - remove trainer._device_type #11001
Conversation
pytorch_lightning/trainer/trainer.py
Outdated
rank_zero_info( | ||
f"GPU available: {torch.cuda.is_available()}, used: {isinstance(self.accelerator, GPUAccelerator)}" | ||
) | ||
|
||
num_tpu_cores = ( | ||
self.tpu_cores if self.tpu_cores is not None and self._device_type == _AcceleratorType.TPU else 0 | ||
self.tpu_cores if self.tpu_cores is not None and isinstance(self.accelerator, TPUAccelerator) else 0 | ||
) | ||
rank_zero_info(f"TPU available: {_TPU_AVAILABLE}, using: {num_tpu_cores} TPU cores") | ||
|
||
num_ipus = self.ipus if self.ipus is not None else 0 | ||
rank_zero_info(f"IPU available: {_IPU_AVAILABLE}, using: {num_ipus} IPUs") | ||
|
||
if torch.cuda.is_available() and self._device_type != _AcceleratorType.GPU: | ||
if torch.cuda.is_available() and isinstance(self.accelerator, GPUAccelerator): | ||
rank_zero_warn( | ||
"GPU available but not used. Set the gpus flag in your trainer `Trainer(gpus=1)` or script `--gpus=1`.", | ||
category=PossibleUserWarning, | ||
) | ||
|
||
if _TPU_AVAILABLE and self._device_type != _AcceleratorType.TPU: | ||
if _TPU_AVAILABLE and isinstance(self.accelerator, TPUAccelerator): | ||
rank_zero_warn( | ||
"TPU available but not used. Set the `tpu_cores` flag in your trainer" | ||
" `Trainer(tpu_cores=8)` or script `--tpu_cores=8`." | ||
) | ||
|
||
if ( | ||
_IPU_AVAILABLE | ||
and self._device_type != _AcceleratorType.IPU | ||
and not isinstance(self.accelerator, IPUAccelerator) | ||
): | ||
if _IPU_AVAILABLE and not isinstance(self.accelerator, IPUAccelerator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i find this logging pretty verbose. For instance, I don't want to see "TPU available: False, using: 0 TPU cores"
on every run. This is only useful if there are TPUs available and I'm not using them. I assume that if there are not TPUs available but I have specified them, then an exception would be thrown before this is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. What about when users pass (accelerator="auto", devices="x")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simply lower the logging level to maybe debug here? I also find it to verbose most of the times.
@kaushikb11 same goes here. Usually people know on what kind of machine they submitted their job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree it's too verbose, and why do we have _log_device_info() with all the warnings in trainer? Should we move _log_device_info to the end of the accelerator_connector init, or having the warning relocate to select functions in accelerator connector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created a issue #11014 to discuss and will address it in a different PR
pytorch_lightning/trainer/trainer.py
Outdated
) | ||
rank_zero_info(f"TPU available: {_TPU_AVAILABLE}, using: {num_tpu_cores} TPU cores") | ||
|
||
num_ipus = self.ipus if self.ipus is not None else 0 | ||
rank_zero_info(f"IPU available: {_IPU_AVAILABLE}, using: {num_ipus} IPUs") | ||
|
||
if torch.cuda.is_available() and self._device_type != _AcceleratorType.GPU: | ||
if torch.cuda.is_available() and isinstance(self.accelerator, GPUAccelerator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is incorrect. the prior code was checking if the GPU accelerator was not being used.
But I don't think we need to check the accelerator instance at all. Why can't we check the gpus
flag directly?
Same as above, I don't get why we have to check TPU Accelerator instead of tpu_cores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With ppl reach agreement on gpu/cpus vs devices in issue #10410 Future of gpus/ipus/tpu_cores with respect to devices, I think check accelerator and device number will be better than tpu/gpus.
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: thomas chaton <thomas@grid.ai>
d636bd7
to
3861805
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions. |
This pull request is going to be closed. Please feel free to reopen it create a new from the actual master. |
This pull request is going to be closed. Please feel free to reopen it create a new from the actual master. |
What does this PR do?
#11002 could be follow ups for this
Part of #10821
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃