Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync master/1.1.5 into release/1.2 [full merge, no squash] #5583

Merged
merged 21 commits into from
Feb 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
03c861b
fix generate checkpoint (#5489)
Borda Jan 12, 2021
7a50c33
update tests with new auto_opt api (#5466)
rohitgr7 Jan 12, 2021
5bef046
[Docs] fix on_after_backward example (#5278)
rohitgr7 Jan 12, 2021
0ff4c56
fix typo in multi-gpu docs (#5402)
awaelchli Jan 13, 2021
c27d4d2
pipeline release CI (#5494)
Borda Jan 13, 2021
10c7dbe
Refactor setup_training and remove test_mode (#5388)
rohitgr7 Jan 13, 2021
a9a3760
add section & add testing ckpt 1.1.4 (#5495)
Borda Jan 14, 2021
7b42494
Fix visual progress bar bug / properly reset progress bar (#4579)
awaelchli Jan 14, 2021
f8abcf7
reconfigure mergify (#5499)
Borda Jan 14, 2021
43e926b
Fix Wrong exception message (#5492)
lacrosse91 Jan 14, 2021
a375240
Tensorboard Docu about Hyperparams saving (#5158)
Skyy93 Jan 15, 2021
293984b
fix reinit_schedulers with correct optimizer (#5519)
rohitgr7 Jan 15, 2021
ad4e25f
[bugfix] Fix signature mismatch in DDPCPUHPCAccelerator's model_to_de…
ananthsub Jan 16, 2021
f2229fd
Fix val_check_interval with fast_dev_run (#5540)
rohitgr7 Jan 18, 2021
b7920b1
Fix logging on_train_batch_end in a callback with multiple optimizers…
carmocca Jan 18, 2021
f72d939
Fix command line run for refinforce_learn_qnet in pl_examples (#5414)
sidhantls Jan 19, 2021
fd6b3ec
Drop greetings comment (#5563)
carmocca Jan 19, 2021
1d99530
Fix root node resolution in slurm environment
tobiasmaier Jan 19, 2021
2b67388
fix argparse conflicting options error (#5569)
sidhantls Jan 19, 2021
09b8dc7
Prepare 1.1.5 release (#5576)
carmocca Jan 19, 2021
a374168
Fix sync
rohitgr7 Feb 1, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 0 additions & 14 deletions .github/workflows/greetings.yml

This file was deleted.

37 changes: 27 additions & 10 deletions .github/workflows/release-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,8 @@ on: # Trigger the workflow on push or pull request, but only for the master bra

jobs:
# based on https://github.com/pypa/gh-action-pypi-publish
build-publish:
build-package:
runs-on: ubuntu-20.04

steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
Expand All @@ -28,6 +27,16 @@ jobs:
python setup.py sdist bdist_wheel
ls -lh dist/

- uses: actions/upload-artifact@v2
with:
name: pypi-packages
path: dist

publish-package:
runs-on: ubuntu-20.04
needs: build-package
steps:
- uses: actions/checkout@v2
- name: Upload to release
if: startsWith(github.event.ref, 'refs/tags') || github.event_name == 'release'
uses: svenstaro/upload-release-action@v2
Expand Down Expand Up @@ -62,6 +71,14 @@ jobs:
user: __token__
password: ${{ secrets.pypi_password }}

create-legacy-ckpt:
runs-on: ubuntu-20.04
needs: [build-package, publish-package]
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.7
# Note: This uses an internal pip API and may not always work
# https://github.com/actions/cache/blob/master/examples.md#multiple-oss-in-a-workflow
- name: Cache pip
Expand All @@ -74,7 +91,6 @@ jobs:
- name: Install dependencies
run: |
pip install -r requirements.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --quiet
pip install virtualenv
pip install awscli

- name: Configure AWS credentials
Expand All @@ -84,25 +100,26 @@ jobs:
aws-secret-access-key: ${{ secrets.AWS_SECRET_KEY_ID }}
aws-region: us-east-1

- uses: actions/download-artifact@v2
with:
name: pypi-packages
path: dist

- name: Pull files from S3
run: |
aws s3 cp --recursive s3://pl-public-data/legacy/checkpoints/ legacy/checkpoints/ # --acl public-read
ls -l legacy/checkpoints/

- name: Generate checkpoint
if: startsWith(github.event.ref, 'refs/tags') || github.event_name == 'release'
# if: startsWith(github.event.ref, 'refs/tags') || github.event_name == 'release'
run: |
virtualenv vEnv --system-site-packages
source vEnv/bin/activate
pip install dist/*
ls -lh dist/
pip install dist/*.whl

pl_ver=$(python -c "import pytorch_lightning as pl ; print(pl.__version__)" 2>&1)
# generate checkpoint to this version
bash legacy/generate_checkpoints.sh $pl_ver

deactivate
rm -rf vEnv

- name: Push files to S3
run: |
aws s3 sync legacy/checkpoints/ s3://pl-public-data/legacy/checkpoints/
Expand Down
98 changes: 42 additions & 56 deletions .mergify.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,59 +12,45 @@
# See the License for the specific language governing permissions and
# limitations under the License.

#pull_request_rules:
#
# - name: Automatic merge on approval
# conditions:
# - base=master
# # number of review approvals
# - "#approved-reviews-by>=3"
# # no waiting or assigned review
# - "#review-requested=0"
# # no requested chnages from any reviewer
# - "#changes-requested-reviews-by=0"
# # this serves as ALL check has to pass as we have actually around 40 tests in total
# - "#status-success>=54"
# # this is just in case since we rely on GPU tests (note: redundand to the above)
# - status-success=continuous-integration/drone/pr
# - "status-success=ci/circleci: TPU-tests"
# # this is patter-like, unofrunatly serves as `any(...)` (note: redundand to the above)
# #- "status-success~=^ci/circleci:"
# # no conflict with master branch
# - -conflict
# # was not closed yet
# - -closed
# # filter-out GH draft PRs
# - -draft
# actions:
# delete_head_branch: {}
# merge:
# # https://doc.mergify.io/merge-action.html#strict-merge
# # (on head branch) $ git merge --no-ff base
# # (on head branch) # Wait for CI to go green
# # (on head branch) # Squash all commits
# # (on base branch) $ git merge --ff head
# strict: true
# method: squash
# comment:
# message: Great job! =)
#
# - name: warn on conflicts
# conditions:
# - conflict
# # filter-out GH draft PRs
# - -draft
# actions:
# comment:
# message: This pull request is now in conflict... :(
#
# - name: add core reviewer
# conditions:
# # filter-out GH draft PRs
# - -draft
# # number of review approvals
# - "#approved-reviews-by<3"
# actions:
# request_reviews:
# teams:
# - core-contributors
pull_request_rules:

- name: warn on conflicts
conditions:
- conflict
- -draft # filter-out GH draft PRs
- -label="has conflicts"
actions:
# comment:
# message: This pull request is now in conflict... :(
label:
add: [ "has conflicts" ]

- name: resolved conflicts
conditions:
- -conflict
- label="has conflicts"
- -draft # filter-out GH draft PRs
- -merged # not merged yet
- -closed
actions:
label:
remove: [ "has conflicts" ]

- name: update PR
conditions:
- conflict
- -draft # filter-out GH draft PRs
- label="0:] Ready-To-Go"
actions:
update: {}

- name: add core reviewer
conditions:
- -conflict # skip if conflict
- -draft # filter-out GH draft PRs
- label="0:] Ready-To-Go"
- "#approved-reviews-by<3" # number of review approvals
actions:
request_reviews:
teams:
- core-contributors
12 changes: 11 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Fixed loading yaml ([#5619](https://github.com/PyTorchLightning/pytorch-lightning/pull/5619))


## [1.1.5] - 2021-01-19

## [1.1.4] - YYYY-MM-DD
### Fixed

- Fixed a visual bug in the progress bar display initialization ([#4579](https://github.com/PyTorchLightning/pytorch-lightning/pull/4579))
- Fixed logging `on_train_batch_end` in a callback with multiple optimizers ([#5521](https://github.com/PyTorchLightning/pytorch-lightning/pull/5521))
- Fixed `reinit_scheduler_properties` with correct optimizer ([#5519](https://github.com/PyTorchLightning/pytorch-lightning/pull/5519))
- Fixed `val_check_interval` with `fast_dev_run` ([#5540](https://github.com/PyTorchLightning/pytorch-lightning/pull/5540))


## [1.1.4] - 2021-01-12

### Added

Expand All @@ -186,6 +195,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Logging only on `not should_accumulate()` during training ([#5417](https://github.com/PyTorchLightning/pytorch-lightning/pull/5417))
- Resolve interpolation bug with Hydra ([#5406](https://github.com/PyTorchLightning/pytorch-lightning/pull/5406))
- Check environ before selecting a seed to prevent warning message ([#4743](https://github.com/PyTorchLightning/pytorch-lightning/pull/4743))
- Fixed signature mismatch in `model_to_device` of `DDPCPUHPCAccelerator` ([#5505](https://github.com/PyTorchLightning/pytorch-lightning/pull/5505))


## [1.1.3] - 2021-01-05
Expand Down
2 changes: 1 addition & 1 deletion docs/source/advanced/multi_gpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,7 @@ project module) you can use the following method:
.. code-block:: python

# train on 8 GPUs (same machine (ie: node))
trainer = Trainer(gpus=8, accelerator='ddp')
trainer = Trainer(gpus=8, accelerator='ddp_spawn')

We STRONGLY discourage this use because it has limitations (due to Python and PyTorch):

Expand Down
6 changes: 2 additions & 4 deletions pl_examples/domain_templates/reinforce_learn_Qnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,7 @@ def add_model_specific_args(parent_parser): # pragma: no-cover
parser.add_argument("--sync_rate", type=int, default=10, help="how many frames do we update the target network")
parser.add_argument("--replay_size", type=int, default=1000, help="capacity of the replay buffer")
parser.add_argument(
"--warm_start_size",
"--warm_start_steps",
type=int,
default=1000,
help="how many samples do we use to fill our buffer at the start of training"
Expand All @@ -407,8 +407,6 @@ def add_model_specific_args(parent_parser): # pragma: no-cover
parser.add_argument("--eps_start", type=float, default=1.0, help="starting value of epsilon")
parser.add_argument("--eps_end", type=float, default=0.01, help="final value of epsilon")
parser.add_argument("--episode_length", type=int, default=200, help="max length of an episode")
parser.add_argument("--max_episode_reward", type=int, default=200, help="max episode reward in the environment")
parser.add_argument("--warm_start_steps", type=int, default=1000, help="max episode reward in the environment")
return parser


Expand All @@ -429,7 +427,7 @@ def main(args) -> None:
torch.manual_seed(0)
np.random.seed(0)

parser = argparse.ArgumentParser()
parser = argparse.ArgumentParser(add_help=False)
parser = DQNLightning.add_model_specific_args(parser)
args = parser.parse_args()

Expand Down
2 changes: 1 addition & 1 deletion pl_examples/domain_templates/semantic_segmentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@ def main(hparams: Namespace):

if __name__ == '__main__':
cli_lightning_logo()
parser = ArgumentParser()
parser = ArgumentParser(add_help=False)
parser = SegModel.add_model_specific_args(parser)
hparams = parser.parse_args()

Expand Down
5 changes: 5 additions & 0 deletions pytorch_lightning/accelerators/legacy/accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ def __init__(self,
def setup(self, model):
pass

def train(self):
self.trainer.setup_trainer(self.trainer.model)
return self.train_or_test()

def teardown(self):
# Ensure if necessary all processes are finished
self.barrier()
Expand All @@ -65,6 +69,7 @@ def train_or_test(self):
if self.trainer.testing:
results = self.trainer.run_test()
else:
self.trainer.train_loop.setup_training()
results = self.trainer.train()
return results

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -364,8 +364,8 @@ def set_distributed_mode(self):
_ddp = (DistributedType.DDP, DistributedType.DDP_SPAWN, DistributedType.DDP2)
if (self.trainer.num_nodes > 1 and self.trainer._distrib_type not in _ddp):
raise MisconfigurationException(
'DataParallel does not support num_nodes > 1. Switching to DistributedDataParallel for you. '
'To silence this warning set `accelerator="ddp"` or `accelerator="ddp2"`'
'DataParallel does not support num_nodes > 1. '
'To avoid this exception, set `accelerator="ddp"` or `accelerator="ddp2"`'
)

rank_zero_info(
Expand Down
10 changes: 0 additions & 10 deletions pytorch_lightning/accelerators/legacy/cpu_accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,16 +50,6 @@ def setup(self, model):

self.trainer.model = model

def train(self):
model = self.trainer.model

# set up training routine
self.trainer.train_loop.setup_training(model)

# train or test
results = self.train_or_test()
return results

def _step(self, model_step: Callable, args):
if self.trainer.amp_backend == AMPType.NATIVE:
with torch.cuda.amp.autocast():
Expand Down
6 changes: 1 addition & 5 deletions pytorch_lightning/accelerators/legacy/ddp2_accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,9 +186,6 @@ def ddp_train(self, process_idx, mp_queue, model):

self.ddp_plugin.on_after_setup_optimizers(self.trainer)

# set model properties before going into wrapper
self.trainer.model_connector.copy_trainer_model_properties(model)

# 16-bit
model = self.trainer.precision_connector.connect(model)

Expand All @@ -198,8 +195,7 @@ def ddp_train(self, process_idx, mp_queue, model):
# allow user to configure ddp
model = self.configure_ddp(model, device_ids)

# set up training routine
self.trainer.train_loop.setup_training(model)
self.trainer.setup_trainer(model)

# train or test
results = self.train_or_test()
Expand Down
6 changes: 1 addition & 5 deletions pytorch_lightning/accelerators/legacy/ddp_accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,9 +289,6 @@ def ddp_train(self, process_idx, model):
# allow for lr schedulers as well
self.setup_optimizers(model)

# set model properties before going into wrapper
self.trainer.model_connector.copy_trainer_model_properties(model)

# 16-bit
model = self.trainer.precision_connector.connect(model)

Expand All @@ -301,9 +298,8 @@ def ddp_train(self, process_idx, model):
# allow user to configure ddp
model = self.configure_ddp(model, device_ids)

# set up training routine
self.barrier('ddp_setup')
self.trainer.train_loop.setup_training(model)
self.trainer.setup_trainer(model)

# train or test
results = self.train_or_test()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,7 @@ def __init__(self,
super().__init__(trainer, cluster_environment, ddp_plugin)
self.nickname = 'ddp_cpu'

def model_to_device(self, model, process_idx):
# Todo: required argument `process_idx` is not used
def model_to_device(self, model):
model.cpu()

def get_device_ids(self):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -142,9 +142,6 @@ def ddp_train(self, process_idx, mp_queue, model):

self.ddp_plugin.on_after_setup_optimizers(self.trainer)

# set model properties before going into wrapper
self.trainer.model_connector.copy_trainer_model_properties(model)

# 16-bit
model = self.trainer.precision_connector.connect(model)

Expand All @@ -154,8 +151,7 @@ def ddp_train(self, process_idx, mp_queue, model):
# allow user to configure ddp
model = self.configure_ddp(model, device_ids)

# set up training routine
self.trainer.train_loop.setup_training(model)
self.trainer.setup_trainer(model)

# train or test
results = self.train_or_test()
Expand Down
Loading