Merge remote-tracking branch 'origin/master' into feat/inference_mode

Lightning-AI · Apr 12, 2022 · 7e4c52c · 7e4c52c
2 parents 2762802 + 6fcb590
commit 7e4c52c
Show file tree

Hide file tree

Showing 29 changed files with 213 additions and 528 deletions.
diff --git a/.azure-pipelines/gpu-tests.yml b/.azure-pipelines/gpu-tests.yml
@@ -52,9 +52,10 @@ jobs:
 
     - bash: |
         python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'horovod' not in line] ; open(fname, 'w').writelines(lines)"
-        pip install fairscale==0.4.5
+        pip install fairscale>=0.4.5
         pip install deepspeed>=0.6.0
-        pip install bagua-cuda102==0.9.0
+        CUDA_VERSION_MM=$(python -c "import torch ; print(''.join(map(str, torch.version.cuda.split('.')[:2])))")
+        pip install "bagua-cuda$CUDA_VERSION_MM>=0.9.0"
         pip install . --requirement requirements/devel.txt
         pip list
       displayName: 'Install dependencies'

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -25,14 +25,14 @@ Fixes #\<issue_number>
 - [ ] Did you write any **new necessary tests**? (not for typos and docs)
 - [ ] Did you verify new and **existing tests pass** locally with your changes?
 - [ ] Did you list all the **breaking changes** introduced by this pull request?
-- [ ] Did you **update the [CHANGELOG](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md)**? (not for typos, docs, test updates, or internal minor changes/refactorings)
+- [ ] Did you **update the [CHANGELOG](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md)**? (not for typos, docs, test updates, or minor internal changes/refactors)
 
 <!-- In the CHANGELOG, separate each item in the unreleased section by a blank line to reduce collisions -->
 
 ## PR review
 
 Anyone in the community is welcome to review the PR.
-Before you start reviewing make sure you have read [Review guidelines](https://github.com/PyTorchLightning/pytorch-lightning/wiki/Review-guidelines). In short, see the following bullet-list:
+Before you start reviewing, make sure you have read the [review guidelines](https://github.com/PyTorchLightning/pytorch-lightning/wiki/Review-guidelines). In short, see the following bullet-list:
 
 - [ ] Is this pull request ready for review? (if not, please submit in draft mode)
 - [ ] Check that all items from **Before submitting** are resolved

diff --git a/.github/workflows/ci_dockers.yml b/.github/workflows/ci_dockers.yml
@@ -73,9 +73,10 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        # the config used in '.azure-pipelines/gpu-tests.yml'
-        python_version: ["3.7"]
-        pytorch_version: ["1.8"]
+        include:
+          # the config used in '.azure-pipelines/gpu-tests.yml'
+          - {python_version: "3.7", pytorch_version: "1.8"}
+          - {python_version: "3.9", pytorch_version: "1.10"}
     steps:
       - name: Checkout
         uses: actions/checkout@v2

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -88,6 +88,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Removed the deprecated `terminate_on_nan` argument from the `Trainer` constructor ([#12553](https://github.com/PyTorchLightning/pytorch-lightning/pull/12553))
 
 
+- Remove deprecated `pytorch_lightning.callbacks.progress.progress` ([#12658](https://github.com/PyTorchLightning/pytorch-lightning/pull/12658))
+
+
 - Removed the deprecated `train_transforms` argument from the `LightningDataModule` constructor([#12662](https://github.com/PyTorchLightning/pytorch-lightning/pull/12662))
 
 
@@ -100,8 +103,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Removed deprecated `GPUStatsMonitor` callback ([#12554](https://github.com/PyTorchLightning/pytorch-lightning/pull/12554))
 
 
+- Removed support for passing strategy names or strategy instances to the accelerator Trainer argument ([#12696](https://github.com/PyTorchLightning/pytorch-lightning/pull/12696))
+
+
+- Removed support for passing strategy names or strategy instances to the plugins Trainer argument ([#12700](https://github.com/PyTorchLightning/pytorch-lightning/pull/12700))
+
+
 ### Fixed
 
+- Run main progress bar updates independent of val progress bar updates in `TQDMProgressBar` ([#12563](https://github.com/PyTorchLightning/pytorch-lightning/pull/12563))
+
+
 - Avoid calling `average_parameters` multiple times per optimizer step ([#12452](https://github.com/PyTorchLightning/pytorch-lightning/pull/12452))
 
 
@@ -117,7 +129,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Don't raise a warning when `nn.Module` is not saved under hparams ([#12669](https://github.com/PyTorchLightning/pytorch-lightning/pull/12669))
 
 
--
+- Raise `MisconfigurationException` when the accelerator is available but the user passes invalid `([]/0/"0")` values to the `devices` flag ([#12708](https://github.com/PyTorchLightning/pytorch-lightning/pull/12708))
 
 
 ## [1.6.0] - 2022-03-29

diff --git a/dockers/README.md b/dockers/README.md
@@ -14,10 +14,10 @@ or with specific arguments
 ```bash
 git clone <git-repository>
 docker image build \
-    -t pytorch-lightning:base-cuda-py3.7-pt1.8 \
+    -t pytorch-lightning:base-cuda-py3.9-pt1.10 \
     -f dockers/base-cuda/Dockerfile \
-    --build-arg PYTHON_VERSION=3.7 \
-    --build-arg PYTORCH_VERSION=1.8 \
+    --build-arg PYTHON_VERSION=3.9 \
+    --build-arg PYTORCH_VERSION=1.10 \
     .
 ```
 

diff --git a/dockers/base-conda/Dockerfile b/dockers/base-conda/Dockerfile
@@ -147,12 +147,33 @@ RUN \
     pip install --no-cache-dir --global-option="--cuda_ext" https://github.com/NVIDIA/apex/archive/refs/heads/master.zip && \
     python -c "from apex import amp"
 
+RUN \
+    # install FairScale
+    pip install fairscale==0.4.5 && \
+    python -c "import fairscale; print(fairscale.__version__)"
+
+RUN \
+    # install DeepSpeed
+    pip install deepspeed==0.6.0 && \
+    python -c "import deepspeed; print(deepspeed.__version__)"
+
+RUN \
+    # install Bagua
+    CUDA_VERSION_MM=$(python -c "print(''.join('$CUDA_VERSION'.split('.')[:2]))") && \
+    pip install "bagua-cuda$CUDA_VERSION_MM==0.9.0" && \
+    python -c "import bagua_core; bagua_core.install_deps()" && \
+    python -c "import bagua; print(bagua.__version__)"
+
+COPY requirements/check-avail-extras.py check-avail-extras.py
+COPY requirements/check-avail-strategies.py check-avail-strategies.py
+
 RUN \
     # Show what we have
     pip --version && \
     conda info && \
     pip list && \
     python -c "import sys; ver = sys.version_info ; assert f'{ver.major}.{ver.minor}' == '$PYTHON_VERSION', ver" && \
     python -c "import torch; assert torch.__version__.startswith('$PYTORCH_VERSION'), torch.__version__" && \
-    python -c "import horovod.torch" && \
-    python -c "from horovod.torch import nccl_built; nccl_built()"
+    python check-avail-extras.py && \
+    python check-avail-strategies.py && \
+    rm check-avail-*.py
diff --git a/dockers/base-cuda/Dockerfile b/dockers/base-cuda/Dockerfile
@@ -76,7 +76,6 @@ RUN \
     pip install -q fire && \
     # Disable cache \
     CUDA_VERSION_MM=$(python -c "print(''.join('$CUDA_VERSION'.split('.')[:2]))") && \
-    export BAGUA_CUDA_VERSION=$CUDA_VERSION_MM && \
     pip config set global.cache-dir false && \
     # set particular PyTorch version
     python ./requirements/adjust-versions.py requirements.txt ${PYTORCH_VERSION} && \
@@ -138,14 +137,25 @@ RUN \
 
 RUN \
     # install DeepSpeed
-    pip install deepspeed==0.5.7 && \
+    pip install deepspeed==0.6.0 && \
     python -c "import deepspeed; print(deepspeed.__version__)"
 
+RUN \
+    # install Bagua
+    CUDA_VERSION_MM=$(python -c "print(''.join('$CUDA_VERSION'.split('.')[:2]))") && \
+    pip install "bagua-cuda$CUDA_VERSION_MM==0.9.0" && \
+    python -c "import bagua_core; bagua_core.install_deps()" && \
+    python -c "import bagua; print(bagua.__version__)"
+
+COPY requirements/check-avail-extras.py check-avail-extras.py
+COPY requirements/check-avail-strategies.py check-avail-strategies.py
+
 RUN \
     # Show what we have
     pip --version && \
     pip list && \
     python -c "import sys; ver = sys.version_info ; assert f'{ver.major}.{ver.minor}' == '$PYTHON_VERSION', ver" && \
     python -c "import torch; assert torch.__version__.startswith('$PYTORCH_VERSION'), torch.__version__" && \
-    python -c "import horovod.torch" && \
-    python -c "from horovod.torch import nccl_built; nccl_built()"
+    python check-avail-extras.py && \
+    python check-avail-strategies.py && \
+    rm check-avail-*.py
diff --git a/pytorch_lightning/callbacks/__init__.py b/pytorch_lightning/callbacks/__init__.py
@@ -21,7 +21,7 @@
 from pytorch_lightning.callbacks.model_checkpoint import ModelCheckpoint
 from pytorch_lightning.callbacks.model_summary import ModelSummary
 from pytorch_lightning.callbacks.prediction_writer import BasePredictionWriter
-from pytorch_lightning.callbacks.progress import ProgressBar, ProgressBarBase, RichProgressBar, TQDMProgressBar
+from pytorch_lightning.callbacks.progress import ProgressBarBase, RichProgressBar, TQDMProgressBar
 from pytorch_lightning.callbacks.pruning import ModelPruning
 from pytorch_lightning.callbacks.quantization import QuantizationAwareTraining
 from pytorch_lightning.callbacks.rich_model_summary import RichModelSummary
@@ -43,7 +43,6 @@
     "ModelPruning",
     "ModelSummary",
     "BasePredictionWriter",
-    "ProgressBar",
     "ProgressBarBase",
     "QuantizationAwareTraining",
     "RichModelSummary",

diff --git a/pytorch_lightning/callbacks/progress/__init__.py b/pytorch_lightning/callbacks/progress/__init__.py
@@ -19,6 +19,5 @@
 
 """
 from pytorch_lightning.callbacks.progress.base import ProgressBarBase  # noqa: F401
-from pytorch_lightning.callbacks.progress.progress import ProgressBar  # noqa: F401
 from pytorch_lightning.callbacks.progress.rich_progress import RichProgressBar  # noqa: F401
 from pytorch_lightning.callbacks.progress.tqdm_progress import TQDMProgressBar  # noqa: F401
diff --git a/pytorch_lightning/callbacks/progress/progress.py b/pytorch_lightning/callbacks/progress/progress.py
diff --git a/pytorch_lightning/callbacks/progress/rich_progress.py b/pytorch_lightning/callbacks/progress/rich_progress.py
@@ -193,7 +193,7 @@ class RichProgressBarTheme:
 
 
 class RichProgressBar(ProgressBarBase):
-    """Create a progress bar with `rich text formatting <https://github.com/willmcgugan/rich>`_.
+    """Create a progress bar with `rich text formatting <https://github.com/Textualize/rich>`_.
 
     Install it with pip:
 

diff --git a/pytorch_lightning/callbacks/progress/tqdm_progress.py b/pytorch_lightning/callbacks/progress/tqdm_progress.py
@@ -263,8 +263,9 @@ def on_train_epoch_start(self, trainer: "pl.Trainer", *_: Any) -> None:
         self.main_progress_bar.set_description(f"Epoch {trainer.current_epoch}")
 
     def on_train_batch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", *_: Any) -> None:
-        if self._should_update(self.train_batch_idx, self.total_train_batches):
-            _update_n(self.main_progress_bar, self.train_batch_idx + self._val_processed)
+        current = self.train_batch_idx + self._val_processed
+        if self._should_update(current, self.main_progress_bar.total):
+            _update_n(self.main_progress_bar, current)
             self.main_progress_bar.set_postfix(self.get_metrics(trainer, pl_module))
 
     def on_train_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
@@ -289,10 +290,12 @@ def on_validation_batch_start(
         self.val_progress_bar.set_description(f"{desc} DataLoader {dataloader_idx}")
 
     def on_validation_batch_end(self, trainer: "pl.Trainer", *_: Any) -> None:
-        if self._should_update(self.val_batch_idx, self.total_val_batches_current_dataloader):
+        if self._should_update(self.val_batch_idx, self.val_progress_bar.total):
             _update_n(self.val_progress_bar, self.val_batch_idx)
-            if trainer.state.fn == "fit":
-                _update_n(self.main_progress_bar, self.train_batch_idx + self._val_processed)
+
+        current = self.train_batch_idx + self._val_processed
+        if trainer.state.fn == "fit" and self._should_update(current, self.main_progress_bar.total):
+            _update_n(self.main_progress_bar, current)
 
     def on_validation_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
         if self._main_progress_bar is not None and trainer.state.fn == "fit":
@@ -313,7 +316,7 @@ def on_test_batch_start(
         self.test_progress_bar.set_description(f"{self.test_description} DataLoader {dataloader_idx}")
 
     def on_test_batch_end(self, *_: Any) -> None:
-        if self._should_update(self.test_batch_idx, self.total_test_batches_current_dataloader):
+        if self._should_update(self.test_batch_idx, self.test_progress_bar.total):
             _update_n(self.test_progress_bar, self.test_batch_idx)
 
     def on_test_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
@@ -333,7 +336,7 @@ def on_predict_batch_start(
         self.predict_progress_bar.set_description(f"{self.predict_description} DataLoader {dataloader_idx}")
 
     def on_predict_batch_end(self, *_: Any) -> None:
-        if self._should_update(self.predict_batch_idx, self.total_predict_batches_current_dataloader):
+        if self._should_update(self.predict_batch_idx, self.predict_progress_bar.total):
             _update_n(self.predict_progress_bar, self.predict_batch_idx)
 
     def on_predict_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
@@ -356,8 +359,8 @@ def print(self, *args: Any, sep: str = " ", **kwargs: Any) -> None:
             s = sep.join(map(str, args))
             active_progress_bar.write(s, **kwargs)
 
-    def _should_update(self, current: int, total: Union[int, float]) -> bool:
-        return self.refresh_rate > 0 and (current % self.refresh_rate == 0 or current == total)
+    def _should_update(self, current: int, total: int) -> bool:
+        return self.is_enabled and (current % self.refresh_rate == 0 or current == total)
 
     @staticmethod
     def _resolve_refresh_rate(refresh_rate: int) -> int:

diff --git a/pytorch_lightning/callbacks/rich_model_summary.py b/pytorch_lightning/callbacks/rich_model_summary.py
@@ -25,7 +25,7 @@
 class RichModelSummary(ModelSummary):
     r"""
     Generates a summary of all layers in a :class:`~pytorch_lightning.core.lightning.LightningModule`
-    with `rich text formatting <https://github.com/willmcgugan/rich>`_.
+    with `rich text formatting <https://github.com/Textualize/rich>`_.
 
     Install it with pip:
 

diff --git a/pytorch_lightning/core/memory.py b/pytorch_lightning/core/memory.py
@@ -14,10 +14,10 @@
 from pytorch_lightning.utilities import rank_zero_deprecation
 
 rank_zero_deprecation(
-    "`pytorch_lightning.core.memory.get_memory_profile` and"
-    " `pytorch_lightning.core.memory.get_gpu_memory_map` have been moved"
-    " to `pytorch_lightning.utilities.memory` since v1.5 and will be removed in v1.7."
+    "`pytorch_lightning.core.memory.LayerSummary` and"
+    " `pytorch_lightning.core.memory.ModelSummary` have been moved"
+    " to `pytorch_lightning.utilities.model_summary` since v1.5 and will be removed in v1.7."
 )
 
-# To support backward compatibility as get_memory_profile and get_gpu_memory_map have been moved
-from pytorch_lightning.utilities.memory import get_gpu_memory_map, get_memory_profile  # noqa: E402, F401 # isort: skip
+# To support backward compatibility as LayerSummary and ModelSummary have been moved
+from pytorch_lightning.utilities.model_summary import LayerSummary, ModelSummary  # noqa: E402, F401 # isort: skip
diff --git a/pytorch_lightning/lite/lite.py b/pytorch_lightning/lite/lite.py
@@ -80,7 +80,7 @@ def __init__(
     ) -> None:
         self._check_accelerator_support(accelerator)
         self._check_strategy_support(strategy)
-        gpu_ids, tpu_cores = _parse_devices(gpus=gpus, auto_select_gpus=False, tpu_cores=tpu_cores)
+        _, tpu_cores = _parse_devices(gpus=gpus, auto_select_gpus=False, tpu_cores=tpu_cores)
         self._accelerator_connector = AcceleratorConnector(
             num_processes=None,
             devices=devices,
@@ -89,7 +89,6 @@ def __init__(
             accelerator=accelerator,
             strategy=strategy,
             gpus=gpus,
-            gpu_ids=gpu_ids,
             num_nodes=num_nodes,
             sync_batchnorm=False,  # TODO: add support?
             benchmark=False,