Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into feat/inference_mode
Browse files Browse the repository at this point in the history
  • Loading branch information
rohitgr7 committed Apr 12, 2022
2 parents 2762802 + 6fcb590 commit 7e4c52c
Show file tree
Hide file tree
Showing 29 changed files with 213 additions and 528 deletions.
5 changes: 3 additions & 2 deletions .azure-pipelines/gpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,10 @@ jobs:
- bash: |
python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'horovod' not in line] ; open(fname, 'w').writelines(lines)"
pip install fairscale==0.4.5
pip install fairscale>=0.4.5
pip install deepspeed>=0.6.0
pip install bagua-cuda102==0.9.0
CUDA_VERSION_MM=$(python -c "import torch ; print(''.join(map(str, torch.version.cuda.split('.')[:2])))")
pip install "bagua-cuda$CUDA_VERSION_MM>=0.9.0"
pip install . --requirement requirements/devel.txt
pip list
displayName: 'Install dependencies'
Expand Down
4 changes: 2 additions & 2 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@ Fixes #\<issue_number>
- [ ] Did you write any **new necessary tests**? (not for typos and docs)
- [ ] Did you verify new and **existing tests pass** locally with your changes?
- [ ] Did you list all the **breaking changes** introduced by this pull request?
- [ ] Did you **update the [CHANGELOG](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md)**? (not for typos, docs, test updates, or internal minor changes/refactorings)
- [ ] Did you **update the [CHANGELOG](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/CHANGELOG.md)**? (not for typos, docs, test updates, or minor internal changes/refactors)

<!-- In the CHANGELOG, separate each item in the unreleased section by a blank line to reduce collisions -->

## PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read [Review guidelines](https://github.com/PyTorchLightning/pytorch-lightning/wiki/Review-guidelines). In short, see the following bullet-list:
Before you start reviewing, make sure you have read the [review guidelines](https://github.com/PyTorchLightning/pytorch-lightning/wiki/Review-guidelines). In short, see the following bullet-list:

- [ ] Is this pull request ready for review? (if not, please submit in draft mode)
- [ ] Check that all items from **Before submitting** are resolved
Expand Down
7 changes: 4 additions & 3 deletions .github/workflows/ci_dockers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,10 @@ jobs:
strategy:
fail-fast: false
matrix:
# the config used in '.azure-pipelines/gpu-tests.yml'
python_version: ["3.7"]
pytorch_version: ["1.8"]
include:
# the config used in '.azure-pipelines/gpu-tests.yml'
- {python_version: "3.7", pytorch_version: "1.8"}
- {python_version: "3.9", pytorch_version: "1.10"}
steps:
- name: Checkout
uses: actions/checkout@v2
Expand Down
14 changes: 13 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Removed the deprecated `terminate_on_nan` argument from the `Trainer` constructor ([#12553](https://github.com/PyTorchLightning/pytorch-lightning/pull/12553))


- Remove deprecated `pytorch_lightning.callbacks.progress.progress` ([#12658](https://github.com/PyTorchLightning/pytorch-lightning/pull/12658))


- Removed the deprecated `train_transforms` argument from the `LightningDataModule` constructor([#12662](https://github.com/PyTorchLightning/pytorch-lightning/pull/12662))


Expand All @@ -100,8 +103,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Removed deprecated `GPUStatsMonitor` callback ([#12554](https://github.com/PyTorchLightning/pytorch-lightning/pull/12554))


- Removed support for passing strategy names or strategy instances to the accelerator Trainer argument ([#12696](https://github.com/PyTorchLightning/pytorch-lightning/pull/12696))


- Removed support for passing strategy names or strategy instances to the plugins Trainer argument ([#12700](https://github.com/PyTorchLightning/pytorch-lightning/pull/12700))


### Fixed

- Run main progress bar updates independent of val progress bar updates in `TQDMProgressBar` ([#12563](https://github.com/PyTorchLightning/pytorch-lightning/pull/12563))


- Avoid calling `average_parameters` multiple times per optimizer step ([#12452](https://github.com/PyTorchLightning/pytorch-lightning/pull/12452))


Expand All @@ -117,7 +129,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Don't raise a warning when `nn.Module` is not saved under hparams ([#12669](https://github.com/PyTorchLightning/pytorch-lightning/pull/12669))


-
- Raise `MisconfigurationException` when the accelerator is available but the user passes invalid `([]/0/"0")` values to the `devices` flag ([#12708](https://github.com/PyTorchLightning/pytorch-lightning/pull/12708))


## [1.6.0] - 2022-03-29
Expand Down
6 changes: 3 additions & 3 deletions dockers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ or with specific arguments
```bash
git clone <git-repository>
docker image build \
-t pytorch-lightning:base-cuda-py3.7-pt1.8 \
-t pytorch-lightning:base-cuda-py3.9-pt1.10 \
-f dockers/base-cuda/Dockerfile \
--build-arg PYTHON_VERSION=3.7 \
--build-arg PYTORCH_VERSION=1.8 \
--build-arg PYTHON_VERSION=3.9 \
--build-arg PYTORCH_VERSION=1.10 \
.
```

Expand Down
25 changes: 23 additions & 2 deletions dockers/base-conda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -147,12 +147,33 @@ RUN \
pip install --no-cache-dir --global-option="--cuda_ext" https://github.com/NVIDIA/apex/archive/refs/heads/master.zip && \
python -c "from apex import amp"

RUN \
# install FairScale
pip install fairscale==0.4.5 && \
python -c "import fairscale; print(fairscale.__version__)"

RUN \
# install DeepSpeed
pip install deepspeed==0.6.0 && \
python -c "import deepspeed; print(deepspeed.__version__)"

RUN \
# install Bagua
CUDA_VERSION_MM=$(python -c "print(''.join('$CUDA_VERSION'.split('.')[:2]))") && \
pip install "bagua-cuda$CUDA_VERSION_MM==0.9.0" && \
python -c "import bagua_core; bagua_core.install_deps()" && \
python -c "import bagua; print(bagua.__version__)"

COPY requirements/check-avail-extras.py check-avail-extras.py
COPY requirements/check-avail-strategies.py check-avail-strategies.py

RUN \
# Show what we have
pip --version && \
conda info && \
pip list && \
python -c "import sys; ver = sys.version_info ; assert f'{ver.major}.{ver.minor}' == '$PYTHON_VERSION', ver" && \
python -c "import torch; assert torch.__version__.startswith('$PYTORCH_VERSION'), torch.__version__" && \
python -c "import horovod.torch" && \
python -c "from horovod.torch import nccl_built; nccl_built()"
python check-avail-extras.py && \
python check-avail-strategies.py && \
rm check-avail-*.py
18 changes: 14 additions & 4 deletions dockers/base-cuda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,6 @@ RUN \
pip install -q fire && \
# Disable cache \
CUDA_VERSION_MM=$(python -c "print(''.join('$CUDA_VERSION'.split('.')[:2]))") && \
export BAGUA_CUDA_VERSION=$CUDA_VERSION_MM && \
pip config set global.cache-dir false && \
# set particular PyTorch version
python ./requirements/adjust-versions.py requirements.txt ${PYTORCH_VERSION} && \
Expand Down Expand Up @@ -138,14 +137,25 @@ RUN \

RUN \
# install DeepSpeed
pip install deepspeed==0.5.7 && \
pip install deepspeed==0.6.0 && \
python -c "import deepspeed; print(deepspeed.__version__)"

RUN \
# install Bagua
CUDA_VERSION_MM=$(python -c "print(''.join('$CUDA_VERSION'.split('.')[:2]))") && \
pip install "bagua-cuda$CUDA_VERSION_MM==0.9.0" && \
python -c "import bagua_core; bagua_core.install_deps()" && \
python -c "import bagua; print(bagua.__version__)"

COPY requirements/check-avail-extras.py check-avail-extras.py
COPY requirements/check-avail-strategies.py check-avail-strategies.py

RUN \
# Show what we have
pip --version && \
pip list && \
python -c "import sys; ver = sys.version_info ; assert f'{ver.major}.{ver.minor}' == '$PYTHON_VERSION', ver" && \
python -c "import torch; assert torch.__version__.startswith('$PYTORCH_VERSION'), torch.__version__" && \
python -c "import horovod.torch" && \
python -c "from horovod.torch import nccl_built; nccl_built()"
python check-avail-extras.py && \
python check-avail-strategies.py && \
rm check-avail-*.py
3 changes: 1 addition & 2 deletions pytorch_lightning/callbacks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
from pytorch_lightning.callbacks.model_checkpoint import ModelCheckpoint
from pytorch_lightning.callbacks.model_summary import ModelSummary
from pytorch_lightning.callbacks.prediction_writer import BasePredictionWriter
from pytorch_lightning.callbacks.progress import ProgressBar, ProgressBarBase, RichProgressBar, TQDMProgressBar
from pytorch_lightning.callbacks.progress import ProgressBarBase, RichProgressBar, TQDMProgressBar
from pytorch_lightning.callbacks.pruning import ModelPruning
from pytorch_lightning.callbacks.quantization import QuantizationAwareTraining
from pytorch_lightning.callbacks.rich_model_summary import RichModelSummary
Expand All @@ -43,7 +43,6 @@
"ModelPruning",
"ModelSummary",
"BasePredictionWriter",
"ProgressBar",
"ProgressBarBase",
"QuantizationAwareTraining",
"RichModelSummary",
Expand Down
1 change: 0 additions & 1 deletion pytorch_lightning/callbacks/progress/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,5 @@
"""
from pytorch_lightning.callbacks.progress.base import ProgressBarBase # noqa: F401
from pytorch_lightning.callbacks.progress.progress import ProgressBar # noqa: F401
from pytorch_lightning.callbacks.progress.rich_progress import RichProgressBar # noqa: F401
from pytorch_lightning.callbacks.progress.tqdm_progress import TQDMProgressBar # noqa: F401
26 changes: 0 additions & 26 deletions pytorch_lightning/callbacks/progress/progress.py

This file was deleted.

2 changes: 1 addition & 1 deletion pytorch_lightning/callbacks/progress/rich_progress.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ class RichProgressBarTheme:


class RichProgressBar(ProgressBarBase):
"""Create a progress bar with `rich text formatting <https://github.com/willmcgugan/rich>`_.
"""Create a progress bar with `rich text formatting <https://github.com/Textualize/rich>`_.
Install it with pip:
Expand Down
21 changes: 12 additions & 9 deletions pytorch_lightning/callbacks/progress/tqdm_progress.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,8 +263,9 @@ def on_train_epoch_start(self, trainer: "pl.Trainer", *_: Any) -> None:
self.main_progress_bar.set_description(f"Epoch {trainer.current_epoch}")

def on_train_batch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", *_: Any) -> None:
if self._should_update(self.train_batch_idx, self.total_train_batches):
_update_n(self.main_progress_bar, self.train_batch_idx + self._val_processed)
current = self.train_batch_idx + self._val_processed
if self._should_update(current, self.main_progress_bar.total):
_update_n(self.main_progress_bar, current)
self.main_progress_bar.set_postfix(self.get_metrics(trainer, pl_module))

def on_train_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
Expand All @@ -289,10 +290,12 @@ def on_validation_batch_start(
self.val_progress_bar.set_description(f"{desc} DataLoader {dataloader_idx}")

def on_validation_batch_end(self, trainer: "pl.Trainer", *_: Any) -> None:
if self._should_update(self.val_batch_idx, self.total_val_batches_current_dataloader):
if self._should_update(self.val_batch_idx, self.val_progress_bar.total):
_update_n(self.val_progress_bar, self.val_batch_idx)
if trainer.state.fn == "fit":
_update_n(self.main_progress_bar, self.train_batch_idx + self._val_processed)

current = self.train_batch_idx + self._val_processed
if trainer.state.fn == "fit" and self._should_update(current, self.main_progress_bar.total):
_update_n(self.main_progress_bar, current)

def on_validation_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
if self._main_progress_bar is not None and trainer.state.fn == "fit":
Expand All @@ -313,7 +316,7 @@ def on_test_batch_start(
self.test_progress_bar.set_description(f"{self.test_description} DataLoader {dataloader_idx}")

def on_test_batch_end(self, *_: Any) -> None:
if self._should_update(self.test_batch_idx, self.total_test_batches_current_dataloader):
if self._should_update(self.test_batch_idx, self.test_progress_bar.total):
_update_n(self.test_progress_bar, self.test_batch_idx)

def on_test_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
Expand All @@ -333,7 +336,7 @@ def on_predict_batch_start(
self.predict_progress_bar.set_description(f"{self.predict_description} DataLoader {dataloader_idx}")

def on_predict_batch_end(self, *_: Any) -> None:
if self._should_update(self.predict_batch_idx, self.total_predict_batches_current_dataloader):
if self._should_update(self.predict_batch_idx, self.predict_progress_bar.total):
_update_n(self.predict_progress_bar, self.predict_batch_idx)

def on_predict_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
Expand All @@ -356,8 +359,8 @@ def print(self, *args: Any, sep: str = " ", **kwargs: Any) -> None:
s = sep.join(map(str, args))
active_progress_bar.write(s, **kwargs)

def _should_update(self, current: int, total: Union[int, float]) -> bool:
return self.refresh_rate > 0 and (current % self.refresh_rate == 0 or current == total)
def _should_update(self, current: int, total: int) -> bool:
return self.is_enabled and (current % self.refresh_rate == 0 or current == total)

@staticmethod
def _resolve_refresh_rate(refresh_rate: int) -> int:
Expand Down
2 changes: 1 addition & 1 deletion pytorch_lightning/callbacks/rich_model_summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
class RichModelSummary(ModelSummary):
r"""
Generates a summary of all layers in a :class:`~pytorch_lightning.core.lightning.LightningModule`
with `rich text formatting <https://github.com/willmcgugan/rich>`_.
with `rich text formatting <https://github.com/Textualize/rich>`_.
Install it with pip:
Expand Down
10 changes: 5 additions & 5 deletions pytorch_lightning/core/memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@
from pytorch_lightning.utilities import rank_zero_deprecation

rank_zero_deprecation(
"`pytorch_lightning.core.memory.get_memory_profile` and"
" `pytorch_lightning.core.memory.get_gpu_memory_map` have been moved"
" to `pytorch_lightning.utilities.memory` since v1.5 and will be removed in v1.7."
"`pytorch_lightning.core.memory.LayerSummary` and"
" `pytorch_lightning.core.memory.ModelSummary` have been moved"
" to `pytorch_lightning.utilities.model_summary` since v1.5 and will be removed in v1.7."
)

# To support backward compatibility as get_memory_profile and get_gpu_memory_map have been moved
from pytorch_lightning.utilities.memory import get_gpu_memory_map, get_memory_profile # noqa: E402, F401 # isort: skip
# To support backward compatibility as LayerSummary and ModelSummary have been moved
from pytorch_lightning.utilities.model_summary import LayerSummary, ModelSummary # noqa: E402, F401 # isort: skip
3 changes: 1 addition & 2 deletions pytorch_lightning/lite/lite.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def __init__(
) -> None:
self._check_accelerator_support(accelerator)
self._check_strategy_support(strategy)
gpu_ids, tpu_cores = _parse_devices(gpus=gpus, auto_select_gpus=False, tpu_cores=tpu_cores)
_, tpu_cores = _parse_devices(gpus=gpus, auto_select_gpus=False, tpu_cores=tpu_cores)
self._accelerator_connector = AcceleratorConnector(
num_processes=None,
devices=devices,
Expand All @@ -89,7 +89,6 @@ def __init__(
accelerator=accelerator,
strategy=strategy,
gpus=gpus,
gpu_ids=gpu_ids,
num_nodes=num_nodes,
sync_batchnorm=False, # TODO: add support?
benchmark=False,
Expand Down
Loading

0 comments on commit 7e4c52c

Please sign in to comment.