Lightning-AI
diff --git a/‎.github/workflows/ci_test-mnodes.yml‎
Lines changed: 0 additions & 3 deletions b/‎.github/workflows/ci_test-mnodes.yml‎
Lines changed: 0 additions & 3 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 39 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎azure-pipelines.yml‎
Lines changed: 0 additions & 1 deletion b/‎azure-pipelines.yml‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎dockers/base-cuda/Dockerfile‎
Lines changed: 4 additions & 0 deletions b/‎dockers/base-cuda/Dockerfile‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎dockers/nvidia/Dockerfile‎
Lines changed: 7 additions & 41 deletions b/‎dockers/nvidia/Dockerfile‎
Lines changed: 7 additions & 41 deletions
@@ -78,9 +78,6 @@ jobs:
     - name: Install dependencies
       run: |
         pip install awscli coverage
-        # todo
-        pip install git+https://${{ secrets.PL_GHOST_TOKEN }}@github.com/PyTorchLightning/lightning-dtrun.git@v0.0.3 -q --no-cache-dir
-        #pip install git+https://${{ secrets.PL_GHOST_TOKEN }}@github.com/PyTorchLightning/lightning-dtrun.git@mnodes -q --no-cache-dir
 
     - name: Configure AWS Credentials
       uses: aws-actions/configure-aws-credentials@v1
 
@@ -113,6 +113,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Changed profilers to save separate report files per state and rank ([#6621](https://github.com/PyTorchLightning/pytorch-lightning/pull/6621))
 
 
+- The trainer no longer tries to save a checkpoint on exception or run callback's `on_train_end` functions ([#6864](https://github.com/PyTorchLightning/pytorch-lightning/pull/6864))
+
+
 - Changed `PyTorchProfiler` to use `torch.autograd.profiler.record_function` to record functions ([#6349](https://github.com/PyTorchLightning/pytorch-lightning/pull/6349))
 
 
@@ -153,6 +156,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 ### Removed
 
+- Removed evaluation loop legacy returns for `*_epoch_end` hooks ([#6973](https://github.com/PyTorchLightning/pytorch-lightning/pull/6973))
+
+
 - Removed support for passing a bool value to `profiler` argument of Trainer ([#6164](https://github.com/PyTorchLightning/pytorch-lightning/pull/6164))
 
 
@@ -237,6 +243,36 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Fixed `--gpus` default for parser returned by `Trainer.add_argparse_args` ([#6898](https://github.com/PyTorchLightning/pytorch-lightning/pull/6898))
 
 
+- Fixed pickle error checker to now check for `pickle.PickleError` to catch all pickle errors ([#6917](https://github.com/PyTorchLightning/pytorch-lightning/pull/6917)) 
+
+
+- Fixed `AttributeError` for `require_backward_grad_sync` when running manual optimization with sharded plugin ([#6915](https://github.com/PyTorchLightning/pytorch-lightning/pull/6915))
+
+
+- Fixed multi-gpu join for Horovod ([#6954](https://github.com/PyTorchLightning/pytorch-lightning/pull/6954))
+
+
+- Fixed a bug where `LightningModule.training_epoch_end` was called after the `on_train_end_epoch` hook ([#6969](https://github.com/PyTorchLightning/pytorch-lightning/pull/6969))
+
+
+- Fixed a bug where the outputs object passed to `LightningModule.training_epoch_end` was different from the object passed to the `on_train_end_epoch` hook ([#6969](https://github.com/PyTorchLightning/pytorch-lightning/pull/6969))
+
+
+- Fixed a bug where the outputs passed to `train_batch_end` would be lists even when using a single optimizer and no truncated backprop through time steps ([#6969](https://github.com/PyTorchLightning/pytorch-lightning/pull/6969))
+
+
+- Fixed `sync_dist` for tpus ([#6950](https://github.com/PyTorchLightning/pytorch-lightning/pull/6950))
+
+
+- Fixed bug for trainer error handling which would cause hang for distributed training ([#6864](https://github.com/PyTorchLightning/pytorch-lightning/pull/6864))
+
+
+- Fixed `self.device` not returning the correct device in replicas of data-parallel ([#6414](https://github.com/PyTorchLightning/pytorch-lightning/pull/6414))
+
+
+- Fixed process rank not being available right away after `Trainer` instantiation ([#6941](https://github.com/PyTorchLightning/pytorch-lightning/pull/6941))
+
+
 ## [1.2.7] - 2021-04-06
 
 ### Fixed
@@ -249,6 +285,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Fixed a bug where `TensorBoardLogger` would give a warning and not log correctly to a symbolic link `save_dir` ([#6730](https://github.com/PyTorchLightning/pytorch-lightning/pull/6730))
 
 
+- Fixed bug where `predict` could not be used when `progress_bar_refresh_rate=0` ([#6884](https://github.com/PyTorchLightning/pytorch-lightning/pull/6884))
+
+
 ## [1.2.6] - 2021-03-30
 
 ### Changed
 
@@ -177,7 +177,7 @@ class LitAutoEncoder(pl.LightningModule):
         return embedding
 
     def training_step(self, batch, batch_idx):
-        # training_step defined the train loop. It is independent of forward
+        # training_step defines the train loop. It is independent of forward
         x, y = batch
         x = x.view(x.size(0), -1)
         z = self.encoder(x)
 
@@ -62,7 +62,6 @@ jobs:
         python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'fairscale' not in line] ; open(fname, 'w').writelines(lines)"
         python -c "fname = 'requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if 'horovod' not in line] ; open(fname, 'w').writelines(lines)"
         pip install --requirement ./requirements/devel.txt --upgrade-strategy only-if-needed
-        pip install git+https://$(AUTH_TOKEN)@github.com/PyTorchLightning/lightning-dtrun.git@v0.0.2 --no-cache-dir
         pip list
       displayName: 'Install dependencies'
 
 
@@ -113,6 +113,10 @@ RUN \
     pip install --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex && \
     rm -rf apex
 
+RUN \
+    # install DeepSpeed
+    pip install deepspeed>=0.3.14
+
 RUN \
     # Show what we have
     pip --version && \
 
@@ -12,52 +12,17 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-FROM nvcr.io/nvidia/cuda:11.1.1-runtime-ubuntu20.04
+# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_21-03.html#rel_21-03
+FROM nvcr.io/nvidia/pytorch:20.12-py3
 
 MAINTAINER PyTorchLightning <https://github.com/PyTorchLightning>
 
 ARG LIGHTNING_VERSION=""
 
-SHELL ["/bin/bash", "-c"]
-# https://techoverflow.net/2019/05/18/how-to-fix-configuring-tzdata-interactive-input-when-building-docker-images/
-ENV \
-    DEBIAN_FRONTEND=noninteractive \
-    TZ=Europe/Prague \
-    PATH="$PATH:/root/.local/bin" \
-    CUDA_TOOLKIT_ROOT_DIR="/usr/local/cuda" \
-    MKL_THREADING_LAYER=GNU
-
-RUN apt-get update -qq && \
-    apt-get install -y --no-install-recommends \
-        build-essential \
-        python3 \
-        python3-distutils \
-        python3-dev \
-        pkg-config \
-        cmake \
-        git \
-        wget \
-        unzip \
-        ca-certificates \
-    && \
-
-# Cleaning
-    apt-get autoremove -y && \
-    apt-get clean && \
-    rm -rf /root/.cache && \
-    rm -rf /var/lib/apt/lists/* && \
-
-# Setup PIP
-    update-alternatives --install /usr/bin/python python /usr/bin/python3 1 && \
-    wget https://bootstrap.pypa.io/get-pip.py --progress=bar:force:noscroll --no-check-certificate && \
-    python get-pip.py && \
-    rm get-pip.py && \
-    pip --version
-
-COPY ./ /home/pytorch-lightning/
+COPY ./ /workspace/pytorch-lightning/
 
 RUN \
-    cd /home  && \
+    cd /workspace  && \
     mv pytorch-lightning/notebooks . && \
     mv pytorch-lightning/pl_examples . && \
     # replace by specific version if asked
@@ -71,9 +36,10 @@ RUN \
 
 # Installations
     python -c "fname = './pytorch-lightning/requirements/extra.txt' ; lines = [line for line in open(fname).readlines() if not line.startswith('horovod')] ; open(fname, 'w').writelines(lines)" && \
-    pip install -r ./pytorch-lightning/requirements/extra.txt -U --no-cache-dir && \
-    pip install -r ./pytorch-lightning/requirements/examples.txt -U --no-cache-dir && \
+    pip install -r ./pytorch-lightning/requirements/extra.txt --no-cache-dir --upgrade-strategy only-if-needed && \
+    pip install -r ./pytorch-lightning/requirements/examples.txt --no-cache-dir --upgrade-strategy only-if-needed && \
     pip install ./pytorch-lightning --no-cache-dir && \
+    pip install "Pillow>=8.1" "torchtext>=0.9.0" ipython[all] --no-cache-dir --upgrade-strategy only-if-needed && \
     rm -rf pytorch-lightning
 
 RUN python --version && \