introduce default cluster environment for lightning-specific ddp (Lig…

…htning-AI#5915) * handle distributed_sampler_kwargs * move emptying cache to accelertor * fix a few tests * restoring the result from subprocess * fix queue.get() order for results * add missing "block_backward_sync" context manager * add missing "block_backward_sync" context manager * fix sync_batchnorm * fix supported gpu-ids for tuple * fix clip gradients and inf recursion * accelerator selection: added cluster_environment plugin * fix torchelastic test * fix reduce early stopping decision for DDP * fix tests: callbacks, conversion to lightning optimizer * fix lightning optimizer does not pickle * fix setting benchmark and deterministic option * fix slurm amp test * fix prepare_data test and determine node_rank * fix retrieving last path when testing * remove obsolete plugin argument * fix test: test_trainer_config * fix torchscript tests * fix trainer.model access * move properties * fix test_transfer_batch_hook * fix auto_select_gpus * fix omegaconf test * fix test that needs to simulate slurm ddp * add horovod plugin * fix test with named arguments * clean up whitespace * fix datamodules test * remove old accelerators * fix naming * move old plugins * move to plugins * create precision subpackage * create training_type subpackage * fix all new import errors * fix wrong arguments order passed to test * fix LR finder * Added sharded training type and amp plugin * Move clip grad to precision plugin * Added sharded spawn, select accelerators based on distributed_backend + enable custom fp16 plugin automatically * Fix import issue, attempting to fix tests * Fix initial test * Reflect hook logic from master, should wrap model after move to device * Optional state consolidation, since master has optimizers not wrapped * change attribute for instance test * reset optimizers optimizers are not used in main process, so state would be wrong. * legacy * imports in accel * legacy2 * trainer imports * fix import errors after rebase * move hook to new setup location * provide unwrapping logic * fix trainer callback system * added ddp2 implementation * fix imports .legacy * move plugins * restore legacy * drop test.py from root * add tpu accelerator and plugins * fixes * fix lightning optimizer merge * reset bugreportmodel * unwrapping * step routing forward * model access * unwrap * opt * integrate distrib_type * sync changes * sync * fixes * add forgotten generators * add missing logic * update * import * missed imports * import fixes * isort * mv f * changelog * format * move helper to parallel plugin * d * add world size * clean up * duplicate * activate ddp_sharded and tpu * set nvidia flags * remove unused colab var * use_tpu <-> on_tpu attrs * make some ddp_cpu and clusterplugin tests pass * Ref/accelerator connector (Lightning-AI#5742) * final cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * connector cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * trainer cleanup Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * accelerator cleanup + missing logic in accelerator connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add missing changes to callbacks Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * reflect accelerator changes to lightning module Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * clean cluster envs Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * cleanup plugins Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * add broadcasting Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * yapf * remove plugin connector Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * plugins * manual optimization * update optimizer routing * add rank to torchelastic * fix memory mixed precision * setstate on trainer for pickling in ddp spawn * add predict method * add back commented accelerator code * adapt test for sync_batch_norm to new plugin * fix deprecated tests * fix ddp cpu choice when no num_processes are given * yapf format * skip a memory test that cannot pass anymore * fix pickle error in spawn plugin * x * avoid * x * fix cyclic import in docs build * add support for sharded * update typing * add sharded and sharded_spawn to distributed types * make unwrap model default * refactor LightningShardedDataParallel similar to LightningDistributedDataParallel * update sharded spawn to reflect changes * update sharded to reflect changes * Merge 1.1.5 changes * fix merge * fix merge * yapf isort * fix merge * yapf isort * fix indentation in test * copy over reinit scheduler implementation from dev1.2 * fix apex tracking calls with dev_debugger * reduce diff to dev1.2, clean up * fix trainer config test when gpus>0 and num_processes >0 and ddp_cpu * sort plugin tests legacy/new * fix error handling for amp on cpu * fix merge fix merge fix merge * [Feat] Resolve manual_backward (Lightning-AI#5837) * resolve manual_backward * resolve flake8 * update * resolve for ddp_spawn * resolve flake8 * resolve flake8 * resolve flake8 Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * fix tests/accelerator tests on cpu * [BugFix] Resolve manual optimization (Lightning-AI#5852) * resolve manual_optimization * update * update Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * Remove copy trainer parameters to happen earlier within the loop and add safe guard to get ref model (Lightning-AI#5856) * resovle a bug * Accelerator refactor sharded rpc (Lightning-AI#5854) * rpc branch * merge * update handling of rpc * make devices etc. Optional in RPC * set devices etc. later if necessary * remove devices from sequential * make devices optional in rpc * fix import * uncomment everything * fix cluster selection Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> * resolve bug * fix assert in rpc test * resolve a test * fix docs compilation * accelerator refactor - fix for sharded parity test (Lightning-AI#5866) * fix memory issue with ddp_spawn * x x x x x x x x x * x * Remove DDP2 as this does not apply * Add missing pre optimizer hook to ensure lambda closure is called * fix apex docstring * [accelerator][BugFix] Resolve some test for 1 gpu (Lightning-AI#5863) * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * update * update * revert init * resolve a bug * update * resolve flake8 * update * update * update * revert init * update * resolve flake8 * update * update * update * update * update * all_gather * update * make plugins work, add misconfig for RPC * update * update * remove breaking test * resolve some tests * resolve flake8 * revert to ddp_spawn Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> * yapf isort * resolve flake8 * fix apex doctests * fix apex doctests 2 * resolve docs * update drone * clean env * update * update * update * update * merge * Fix RPC related tests, clean out old API, update for new accelerator API [skip ci] (Lightning-AI#5881) * Fix RPC related tests, clean out old API, update for new accelerator API * Move tests out of legacy folder, update paths and names * Update test_remove_1-4.py * Expose properties for tpu cores/gpus/num_gpus * Add root GPU property * Move properties to properties.py * move tests that were previously in drone * Fix root GPU property (Lightning-AI#5908) * Move root GPU to property, remove horovod set as this is handled in horovod plugin, ensure we mock correctly to set GPU accelerator * Add missing tests back * fix best model path transfer when no checkpoint callback available * Fix setup hook order [wip] (Lightning-AI#5858) * Call trainer setup hook before accelerator setup * Add test case * add new test * typo * fix callback order in test Co-authored-by: tchaton <thomas@grid.ai> Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * rename ddp sequential -> rpc sequential for special test * revert * fix stupid merge problem * abstract the cluster plugins * default plugin * integrate default environment * fix property * adapt tests * adjust test * fix world size access * base cluster env * revert rebase errors * revert rebase errors * missing import * revert unrelated change * remove unused cluster local rank * remove unrelated changes * fix unrelated changes * fix pep8 * remove unused var * reset permissions * ypaf * test default environment * test torchelastic environment * world size as int * tests for slurm environment * changelog * test comments * remove unintended change * keep master port fixed after it is generated * test random master port * yapf * add missing default environment * move helper function * rename default environment * rename * rename * yapf * Update pytorch_lightning/plugins/environments/lightning_environment.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * Update CHANGELOG.md Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> * spawn -> create Co-authored-by: justusschock <justus.schock@posteo.de> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Justus Schock <justus.schock@rwth-aachen.de> Co-authored-by: chaton <thomas@grid.ai> Co-authored-by: Ubuntu <ubuntu@ip-172-31-88-60.ec2.internal> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: root <root@ip-172-31-88-60.ec2.internal> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
awaelchli · Apr 13, 2021 · 37ceb4c · 37ceb4c
1 parent 4b69a92
commit 37ceb4c
Show file tree

Hide file tree

Showing 14 changed files with 61 additions and 100 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -33,24 +33,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Added `checkpoint` parameter to callback's `on_save_checkpoint` hook ([#6072](https://github.com/PyTorchLightning/pytorch-lightning/pull/6072))
 
 
-- Added `RunningStage.SANITY_CHECKING` ([#4945](https://github.com/PyTorchLightning/pytorch-lightning/pull/4945))
-
-
-- Added `TrainerState.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}` ([#4945](https://github.com/PyTorchLightning/pytorch-lightning/pull/4945))
-
-
-- Added `Trainer.validate()` method to perform one evaluation epoch over the validation set ([#4948](https://github.com/PyTorchLightning/pytorch-lightning/pull/4948))
-
-
-- Added `LightningEnvironment` for Lightning-specific DDP ([#5915](https://github.com/PyTorchLightning/pytorch-lightning/pull/5915))
-
-
-- Added `teardown()` hook to LightningDataModule ([#4673](https://github.com/PyTorchLightning/pytorch-lightning/pull/4673))
-
-
-- Added `auto_insert_metric_name` parameter to `ModelCheckpoint` ([#6277](https://github.com/PyTorchLightning/pytorch-lightning/pull/6277))
-
-
 - Added arg to `self.log` that enables users to give custom names when dealing with multiple dataloaders ([#6274](https://github.com/PyTorchLightning/pytorch-lightning/pull/6274))
 
 

diff --git a/pytorch_lightning/plugins/environments/__init__.py b/pytorch_lightning/plugins/environments/__init__.py
@@ -12,5 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from pytorch_lightning.plugins.environments.cluster_environment import ClusterEnvironment # noqa: F401
+from pytorch_lightning.plugins.environments.lightning_environment import LightningEnvironment # noqa: F401
 from pytorch_lightning.plugins.environments.slurm_environment import SLURMEnvironment # noqa: F401
 from pytorch_lightning.plugins.environments.torchelastic_environment import TorchElasticEnvironment # noqa: F401
diff --git a/pytorch_lightning/plugins/environments/cluster_environment.py b/pytorch_lightning/plugins/environments/cluster_environment.py
@@ -12,18 +12,23 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from abc import ABC, abstractmethod
+from typing import Optional
 
 
-class ClusterEnvironment:
+class ClusterEnvironment(ABC):
+ """ Specification of a cluster environment. """
 
- def __init__(self):
- self._world_size = None
+ @abstractmethod
+ def creates_children(self) -> bool:
+ """ Whether the environment creates the subprocesses or not. """
 
- def master_address(self):
- pass
+ @abstractmethod
+ def master_address(self) -> str:
+ """ The master address through which all processes connect and communicate. """
 
- def master_port(self):
- pass
+ @abstractmethod
+ def master_port(self) -> int:
+ """ An open and configured port in the master node through which all processes communicate. """
 
  @abstractmethod
  def world_size(self) -> int:
@@ -43,7 +48,8 @@ def set_global_rank(self, rank: int) -> None:
 
  @abstractmethod
  def local_rank(self) -> int:
- pass
+ """ The rank (index) of the currently running process inside of the current node. """
 
+ @abstractmethod
  def node_rank(self) -> int:
- pass
+ """ The rank (index) of the node on which the current process runs. """
diff --git a/pytorch_lightning/plugins/environments/slurm_environment.py b/pytorch_lightning/plugins/environments/slurm_environment.py
@@ -39,7 +39,7 @@ def master_address(self) -> str:
  log.debug(f"MASTER_ADDR: {os.environ['MASTER_ADDR']}")
  return root_node
 
- def master_port(self):
+ def master_port(self) -> int:
  # -----------------------
  # SLURM JOB = PORT number
  # -----------------------
@@ -64,7 +64,7 @@ def master_port(self):
 
  log.debug(f"MASTER_PORT: {os.environ['MASTER_PORT']}")
 
- return default_port
+ return int(default_port)
 
  def world_size(self) -> int:
  return int(os.environ["SLURM_NTASKS"])
@@ -81,10 +81,10 @@ def set_global_rank(self, rank: int) -> None:
  def local_rank(self) -> int:
  return int(os.environ['SLURM_LOCALID'])
 
- def node_rank(self):
+ def node_rank(self) -> int:
  return int(os.environ['SLURM_NODEID'])
 
- def resolve_root_node_address(self, root_node):
+ def resolve_root_node_address(self, root_node: str) -> str:
  if '[' in root_node:
  name, numbers = root_node.split('[', maxsplit=1)
  number = numbers.split(',', maxsplit=1)[0]

diff --git a/pytorch_lightning/plugins/environments/torchelastic_environment.py b/pytorch_lightning/plugins/environments/torchelastic_environment.py
@@ -14,6 +14,7 @@
 
 import logging
 import os
+from typing import Optional
 
 from pytorch_lightning.plugins.environments.cluster_environment import ClusterEnvironment
 from pytorch_lightning.utilities import rank_zero_warn
@@ -29,25 +30,29 @@ def is_using_torchelastic() -> bool:
  required_env_vars = ("RANK", "GROUP_RANK", "LOCAL_RANK", "LOCAL_WORLD_SIZE")
  return all(v in os.environ for v in required_env_vars)
 
- def master_address(self):
+ def creates_children(self) -> bool:
+ return True
+
+ def master_address(self) -> str:
  if "MASTER_ADDR" not in os.environ:
  rank_zero_warn("MASTER_ADDR environment variable is not defined. Set as localhost")
  os.environ["MASTER_ADDR"] = "127.0.0.1"
  log.debug(f"MASTER_ADDR: {os.environ['MASTER_ADDR']}")
  master_address = os.environ.get('MASTER_ADDR')
  return master_address
 
- def master_port(self):
+ def master_port(self) -> int:
  if "MASTER_PORT" not in os.environ:
  rank_zero_warn("MASTER_PORT environment variable is not defined. Set as 12910")
  os.environ["MASTER_PORT"] = "12910"
  log.debug(f"MASTER_PORT: {os.environ['MASTER_PORT']}")
 
- port = os.environ.get('MASTER_PORT')
+ port = int(os.environ.get('MASTER_PORT'))
  return port
 
- def world_size(self):
- return os.environ.get('WORLD_SIZE')
+ def world_size(self) -> Optional[int]:
+ world_size = os.environ.get('WORLD_SIZE')
+ return int(world_size) if world_size is not None else world_size
 
  def set_world_size(self, size: int) -> None:
  log.debug("TorchElasticEnvironment.set_world_size was called, but setting world size is not allowed. Ignored.")

diff --git a/pytorch_lightning/plugins/training_type/ddp.py b/pytorch_lightning/plugins/training_type/ddp.py
@@ -30,20 +30,14 @@
 from pytorch_lightning.plugins.environments.cluster_environment import ClusterEnvironment
 from pytorch_lightning.plugins.training_type.parallel import ParallelPlugin
 from pytorch_lightning.utilities import _HYDRA_AVAILABLE, _TORCH_GREATER_EQUAL_1_7, rank_zero_warn
-from pytorch_lightning.utilities.distributed import (
- find_free_network_port,
- rank_zero_only,
- ReduceOp,
- sync_ddp_if_available,
-)
+from pytorch_lightning.utilities.distributed import rank_zero_only, ReduceOp, sync_ddp_if_available
 from pytorch_lightning.utilities.exceptions import MisconfigurationException
 from pytorch_lightning.utilities.seed import seed_everything
 
 if _HYDRA_AVAILABLE:
  from hydra.core.hydra_config import HydraConfig
  from hydra.utils import get_original_cwd, to_absolute_path
 
-
 log = logging.getLogger(__name__)
 
 
@@ -90,8 +84,7 @@ def setup(self, model):
  self._model = model
 
  # start the other scripts
- # TODO: refactor and let generic cluster env hold the information about who spawns the processes
- if os.environ.get("PL_IN_DDP_SUBPROCESS", "0") != "1":
+ if not self.cluster_environment.creates_children() and os.environ.get("PL_IN_DDP_SUBPROCESS", "0") != "1":
  self._call_children_scripts()
 
  # set the task idx
@@ -105,15 +98,12 @@ def _call_children_scripts(self):
  self._has_spawned_children = True
 
  # DDP Environment variables
- os.environ["MASTER_ADDR"] = os.environ.get("MASTER_ADDR", "127.0.0.1")
- os.environ["MASTER_PORT"] = os.environ.get("MASTER_PORT", str(find_free_network_port()))
+ os.environ["MASTER_ADDR"] = self.cluster_environment.master_address()
+ os.environ["MASTER_PORT"] = str(self.cluster_environment.master_port())
 
  # allow the user to pass the node rank
- node_rank = "0"
- node_rank = os.environ.get("NODE_RANK", node_rank)
- node_rank = os.environ.get("GROUP_RANK", node_rank)
- os.environ["NODE_RANK"] = node_rank
- os.environ["LOCAL_RANK"] = "0"
+ os.environ["NODE_RANK"] = str(self.cluster_environment.node_rank())
+ os.environ["LOCAL_RANK"] = str(self.cluster_environment.local_rank())
 
  # when user is using hydra find the absolute path
  path_lib = os.path.abspath if not _HYDRA_AVAILABLE else to_absolute_path

diff --git a/pytorch_lightning/plugins/training_type/ddp_spawn.py b/pytorch_lightning/plugins/training_type/ddp_spawn.py
@@ -30,13 +30,7 @@
 from pytorch_lightning.utilities import _TORCH_GREATER_EQUAL_1_7
 from pytorch_lightning.utilities.cloud_io import atomic_save
 from pytorch_lightning.utilities.cloud_io import load as pl_load
-from pytorch_lightning.utilities.distributed import (
- find_free_network_port,
- rank_zero_only,
- rank_zero_warn,
- ReduceOp,
- sync_ddp_if_available,
-)
+from pytorch_lightning.utilities.distributed import rank_zero_only, rank_zero_warn, ReduceOp, sync_ddp_if_available
 from pytorch_lightning.utilities.seed import seed_everything
 
 log = logging.getLogger(__name__)
@@ -89,7 +83,7 @@ def distributed_sampler_kwargs(self):
  def setup(self, model):
  self._model = model
 
- os.environ["MASTER_PORT"] = os.environ.get("MASTER_PORT", str(find_free_network_port()))
+ os.environ["MASTER_PORT"] = str(self.cluster_environment.master_port())
 
  # pass in a state q
  smp = mp.get_context("spawn")

diff --git a/pytorch_lightning/plugins/training_type/parallel.py b/pytorch_lightning/plugins/training_type/parallel.py
@@ -37,13 +37,6 @@ def __init__(
  self.parallel_devices = parallel_devices
  self.cluster_environment = cluster_environment
 
- @property
- def cluster_local_rank(self):
- try:
- return self.cluster_environment.local_rank()
- except KeyError:
- return 0
-
  @property
  @abstractmethod
  def root_device(self):

diff --git a/pytorch_lightning/trainer/connectors/accelerator_connector.py b/pytorch_lightning/trainer/connectors/accelerator_connector.py
@@ -42,7 +42,12 @@
  TPUSpawnPlugin,
  TrainingTypePlugin,
 )
-from pytorch_lightning.plugins.environments import ClusterEnvironment, SLURMEnvironment, TorchElasticEnvironment
+from pytorch_lightning.plugins.environments import (
+ ClusterEnvironment,
+ LightningEnvironment,
+ SLURMEnvironment,
+ TorchElasticEnvironment,
+)
 from pytorch_lightning.tuner.auto_gpu_select import pick_multiple_gpus
 from pytorch_lightning.utilities import (
  _APEX_AVAILABLE,
@@ -467,17 +472,10 @@ def select_cluster_environment(self) -> ClusterEnvironment:
  return self._cluster_environment
  if self.is_slurm_managing_tasks:
  env = SLURMEnvironment()
- # TODO: decouple DDP from SLURM
- # refactor and let generic cluster env hold the information about who spawns the processes
- os.environ["PL_IN_DDP_SUBPROCESS"] = "1"
  elif TorchElasticEnvironment.is_using_torchelastic():
  env = TorchElasticEnvironment()
- # TODO: decouple DDP from TE
- # refactor and let generic cluster env hold the information about who spawns the processes
- os.environ["PL_IN_DDP_SUBPROCESS"] = "1"
  else:
- # TODO: maybe introduce a DefaultEnvironment?
- env = TorchElasticEnvironment()
+ env = LightningEnvironment()
  return env
 
  def set_distributed_mode(self, distributed_backend: Optional[str] = None):

diff --git a/pytorch_lightning/utilities/distributed.py b/pytorch_lightning/utilities/distributed.py
@@ -82,21 +82,6 @@ def _debug(*args, **kwargs):
 rank_zero_deprecation = partial(rank_zero_warn, category=DeprecationWarning)
 
 
-def find_free_network_port() -> int:
- """
- Finds a free port on localhost.
- It is useful in single-node training when we don't want to connect to a real master node but
- have to set the `MASTER_PORT` environment variable.
- """
- import socket
- s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
- s.bind(("", 0))
- s.listen(1)
- port = s.getsockname()[1]
- s.close()
- return port
-
-
 def gather_all_tensors(result: Union[torch.Tensor], group: Optional[Any] = None):
  """
  Function to gather all tensors from several ddp processes onto a list that

diff --git a/tests/accelerators/test_accelerator_connector.py b/tests/accelerators/test_accelerator_connector.py
@@ -33,8 +33,8 @@
  PrecisionPlugin,
  SingleDevicePlugin,
 )
-from pytorch_lightning.plugins.environments import ClusterEnvironment, SLURMEnvironment, TorchElasticEnvironment
-from pytorch_lightning.utilities import _DEEPSPEED_AVAILABLE
+from pytorch_lightning.plugins.environments import LightningEnvironment, SLURMEnvironment, TorchElasticEnvironment
+from pytorch_lightning.utilities.exceptions import MisconfigurationException
 from tests.helpers.boring_model import BoringModel
 
 
@@ -54,7 +54,7 @@ def test_accelerator_choice_ddp_cpu(tmpdir):
  )
  assert isinstance(trainer.accelerator, CPUAccelerator)
  assert isinstance(trainer.training_type_plugin, DDPSpawnPlugin)
- assert isinstance(trainer.training_type_plugin.cluster_environment, TorchElasticEnvironment)
+ assert isinstance(trainer.training_type_plugin.cluster_environment, LightningEnvironment)
 
 
 @mock.patch.dict(os.environ, {"CUDA_VISIBLE_DEVICES": "0,1"})
@@ -68,7 +68,7 @@ def test_accelerator_choice_ddp(cuda_available_mock, device_count_mock):
  )
  assert isinstance(trainer.accelerator, GPUAccelerator)
  assert isinstance(trainer.training_type_plugin, DDPPlugin)
- assert isinstance(trainer.training_type_plugin.cluster_environment, TorchElasticEnvironment)
+ assert isinstance(trainer.training_type_plugin.cluster_environment, LightningEnvironment)
 
 
 @mock.patch.dict(os.environ, {"CUDA_VISIBLE_DEVICES": "0,1"})
@@ -82,7 +82,7 @@ def test_accelerator_choice_ddp_spawn(cuda_available_mock, device_count_mock):
  )
  assert isinstance(trainer.accelerator, GPUAccelerator)
  assert isinstance(trainer.training_type_plugin, DDPSpawnPlugin)
- assert isinstance(trainer.training_type_plugin.cluster_environment, TorchElasticEnvironment)
+ assert isinstance(trainer.training_type_plugin.cluster_environment, LightningEnvironment)
 
 
 @pytest.mark.skipif(torch.cuda.device_count() < 2, reason="test requires multi-GPU machine")
@@ -321,7 +321,7 @@ def test_accelerator_choice_ddp_cpu_custom_cluster(device_count_mock):
  Test that we choose the custom cluster even when SLURM or TE flags are around
  """
 
- class CustomCluster(ClusterEnvironment):
+ class CustomCluster(LightningEnvironment):
 
  def master_address(self):
  return 'asdf'

diff --git a/tests/deprecated_api/test_remove_1-4.py b/tests/deprecated_api/test_remove_1-4.py
@@ -25,7 +25,7 @@
 )
 from pytorch_lightning.overrides.distributed import LightningDistributedModule
 from pytorch_lightning.plugins import DDPSpawnPlugin
-from pytorch_lightning.plugins.environments import TorchElasticEnvironment
+from pytorch_lightning.plugins.environments import LightningEnvironment
 from tests.deprecated_api import _soft_unimport_module
 from tests.helpers import BoringModel
 
@@ -189,7 +189,7 @@ def test_v1_4_0_deprecated_lightning_distributed_data_parallel(tmpdir):
  plugins=[
  CustomDDPPlugin(
  parallel_devices=[torch.device("cuda", 0), torch.device("cuda", 1)],
- cluster_environment=TorchElasticEnvironment(),
+ cluster_environment=LightningEnvironment(),
  )
  ]
  )

diff --git a/tests/models/test_sync_batchnorm.py b/tests/models/test_sync_batchnorm.py
@@ -20,7 +20,7 @@
 
 from pytorch_lightning import LightningModule, seed_everything, Trainer
 from pytorch_lightning.plugins import DDPSpawnPlugin
-from pytorch_lightning.plugins.environments import TorchElasticEnvironment
+from pytorch_lightning.plugins.environments import LightningEnvironment
 from pytorch_lightning.trainer.states import TrainerState
 from pytorch_lightning.utilities import FLOAT16_EPSILON
 from tests.helpers.datamodules import MNISTDataModule
@@ -108,6 +108,13 @@ def test_sync_batchnorm_ddp(tmpdir):
  dm.setup(stage=None)
 
  model = SyncBNModule(gpu_count=2, bn_targets=bn_outputs)
+ ddp = DDPSpawnPlugin(
+ parallel_devices=[torch.device("cuda", 0), torch.device("cuda", 1)],
+ num_nodes=1,
+ sync_batchnorm=True,
+ cluster_environment=LightningEnvironment(),
+ find_unused_parameters=True
+ )
 
  trainer = Trainer(
  gpus=2,

diff --git a/tests/plugins/environments/__init__.py b/tests/plugins/environments/__init__.py