Returning None from training_step with multi GPU DDP training

## 🐛 Bug

Returning None from training_step with multi GPU DDP training freezes the training without exception

### To Reproduce
Starting multi-gpu training with a None-returning training_step function.

Example training_step function:
```
    def training_step(self, batch, batch_idx):
        data, target = batch
        model_outputs = self.forward(images)
        loss = calc_loss(model_outputs, target)

        if torch.isnan(loss) or random.random() < .05:
            return None

        return loss
```
Example trainer:
```
 trainer = Trainer(
    gpus=2,
    distributed_backend="ddp"
)
```

### Expected behavior

To continue training with skipping the current batch as pointed out at [here](https://pytorch-lightning.readthedocs.io/en/latest/lightning_module.html#training-step).
### Environment

No specific environment is needed to reproduce this bug.

### Additional context

This issue was mentioned here: #4956 but not with specifics. 

**Note: By the time this issue being investigated, a help for a workaround would be great!**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Returning None from training_step with multi GPU DDP training #5243

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Returning None from training_step with multi GPU DDP training #5243

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions