1/n Generalize internal checks for Accelerator in Trainer - remove trainer._device_type #11001

four4fish · 2021-12-09T01:38:45Z

What does this PR do?

#11002 could be follow ups for this

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

ananthsub · 2021-12-09T06:32:11Z

pytorch_lightning/trainer/trainer.py

+        rank_zero_info(
+            f"GPU available: {torch.cuda.is_available()}, used: {isinstance(self.accelerator, GPUAccelerator)}"
+        )

        num_tpu_cores = (
-            self.tpu_cores if self.tpu_cores is not None and self._device_type == _AcceleratorType.TPU else 0
+            self.tpu_cores if self.tpu_cores is not None and isinstance(self.accelerator, TPUAccelerator) else 0
        )
        rank_zero_info(f"TPU available: {_TPU_AVAILABLE}, using: {num_tpu_cores} TPU cores")

        num_ipus = self.ipus if self.ipus is not None else 0
        rank_zero_info(f"IPU available: {_IPU_AVAILABLE}, using: {num_ipus} IPUs")

-        if torch.cuda.is_available() and self._device_type != _AcceleratorType.GPU:
+        if torch.cuda.is_available() and isinstance(self.accelerator, GPUAccelerator):
            rank_zero_warn(
                "GPU available but not used. Set the gpus flag in your trainer `Trainer(gpus=1)` or script `--gpus=1`.",
                category=PossibleUserWarning,
            )

-        if _TPU_AVAILABLE and self._device_type != _AcceleratorType.TPU:
+        if _TPU_AVAILABLE and isinstance(self.accelerator, TPUAccelerator):
            rank_zero_warn(
                "TPU available but not used. Set the `tpu_cores` flag in your trainer"
                " `Trainer(tpu_cores=8)` or script `--tpu_cores=8`."
            )

-        if (
-            _IPU_AVAILABLE
-            and self._device_type != _AcceleratorType.IPU
-            and not isinstance(self.accelerator, IPUAccelerator)
-        ):
+        if _IPU_AVAILABLE and not isinstance(self.accelerator, IPUAccelerator):


i find this logging pretty verbose. For instance, I don't want to see "TPU available: False, using: 0 TPU cores" on every run. This is only useful if there are TPUs available and I'm not using them. I assume that if there are not TPUs available but I have specified them, then an exception would be thrown before this is called.

Fair point. What about when users pass (accelerator="auto", devices="x")?

can we simply lower the logging level to maybe debug here? I also find it to verbose most of the times.

@kaushikb11 same goes here. Usually people know on what kind of machine they submitted their job.

Totally agree it's too verbose, and why do we have _log_device_info() with all the warnings in trainer? Should we move _log_device_info to the end of the accelerator_connector init, or having the warning relocate to select functions in accelerator connector?

created a issue #11014 to discuss and will address it in a different PR

ananthsub · 2021-12-09T06:32:24Z

pytorch_lightning/trainer/trainer.py

        )
        rank_zero_info(f"TPU available: {_TPU_AVAILABLE}, using: {num_tpu_cores} TPU cores")

        num_ipus = self.ipus if self.ipus is not None else 0
        rank_zero_info(f"IPU available: {_IPU_AVAILABLE}, using: {num_ipus} IPUs")

-        if torch.cuda.is_available() and self._device_type != _AcceleratorType.GPU:
+        if torch.cuda.is_available() and isinstance(self.accelerator, GPUAccelerator):


this is incorrect. the prior code was checking if the GPU accelerator was not being used.
But I don't think we need to check the accelerator instance at all. Why can't we check the gpus flag directly?

Same as above, I don't get why we have to check TPU Accelerator instead of tpu_cores

With ppl reach agreement on gpu/cpus vs devices in issue #10410 Future of gpus/ipus/tpu_cores with respect to devices, I think check accelerator and device number will be better than tpu/gpus.

pytorch_lightning/trainer/trainer.py

Co-authored-by: thomas chaton <thomas@grid.ai>

stale · 2021-12-25T19:37:29Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions.

stale · 2022-01-03T20:28:31Z

This pull request is going to be closed. Please feel free to reopen it create a new from the actual master.

stale · 2022-01-09T09:45:54Z

This pull request is going to be closed. Please feel free to reopen it create a new from the actual master.

four4fish requested review from awaelchli, tchaton and williamFalcon as code owners December 9, 2021 01:38

four4fish marked this pull request as draft December 9, 2021 01:38

four4fish mentioned this pull request Dec 9, 2021

Remove trainer._device_type in favor of check Accelerator class #11002

Closed

ananthsub reviewed Dec 9, 2021

View reviewed changes

four4fish changed the title ~~Generalize internal checks for Accelerator in Trainer~~ 1/n Generalize internal checks for Accelerator in Trainer - remove trainer._device_type Dec 9, 2021

tchaton reviewed Dec 9, 2021

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

four4fish mentioned this pull request Dec 9, 2021

Refactor trainer._log_device_info() method and warnings #11014

Open

four4fish and others added 4 commits December 9, 2021 11:48

Generalize internal checks for Accelerator in Trainer

186929d

Update pytorch_lightning/trainer/trainer.py

a5117e0

Co-authored-by: thomas chaton <thomas@grid.ai>

Update pytorch_lightning/trainer/trainer.py

2c05858

Co-authored-by: thomas chaton <thomas@grid.ai>

remove device_type

3861805

four4fish force-pushed the generalize/check branch from d636bd7 to 3861805 Compare December 9, 2021 21:20

stale bot added the won't fix This will not be worked on label Dec 25, 2021

stale bot closed this Jan 3, 2022

awaelchli reopened this Jan 4, 2022

awaelchli added the accelerator label Jan 4, 2022

awaelchli added this to the 1.6 milestone Jan 4, 2022

stale bot closed this Jan 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1/n Generalize internal checks for Accelerator in Trainer - remove trainer._device_type #11001

1/n Generalize internal checks for Accelerator in Trainer - remove trainer._device_type #11001

four4fish commented Dec 9, 2021 •

edited

Loading

ananthsub Dec 9, 2021

kaushikb11 Dec 9, 2021

justusschock Dec 9, 2021

four4fish Dec 9, 2021

four4fish Dec 9, 2021

ananthsub Dec 9, 2021

four4fish Dec 9, 2021

stale bot commented Dec 25, 2021

stale bot commented Jan 3, 2022

stale bot commented Jan 9, 2022

1/n Generalize internal checks for Accelerator in Trainer - remove trainer._device_type #11001

1/n Generalize internal checks for Accelerator in Trainer - remove trainer._device_type #11001

Conversation

four4fish commented Dec 9, 2021 • edited Loading

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

ananthsub Dec 9, 2021

Choose a reason for hiding this comment

kaushikb11 Dec 9, 2021

Choose a reason for hiding this comment

justusschock Dec 9, 2021

Choose a reason for hiding this comment

four4fish Dec 9, 2021

Choose a reason for hiding this comment

four4fish Dec 9, 2021

Choose a reason for hiding this comment

ananthsub Dec 9, 2021

Choose a reason for hiding this comment

four4fish Dec 9, 2021

Choose a reason for hiding this comment

stale bot commented Dec 25, 2021

stale bot commented Jan 3, 2022

stale bot commented Jan 9, 2022

four4fish commented Dec 9, 2021 •

edited

Loading