Warning: NaN or Inf found in input tensor when running DeepSpeedExamples/BingBertSquad.

Hi Deepspeed team,

I run DeepSpeedExamples/BingBertSquad on my machine with 2 GPUs. I follow the instruction [https://www.deepspeed.ai/tutorials/bert-finetuning/](https://www.deepspeed.ai/tutorials/bert-finetuning/) and can get a reproduce when I run the `run_squad_baseline.sh`. 

However, when I changed the `deepspeed_bsz24_config.json` file, it gave me the following warning and I could only get 'loss=nan'. Besides, if I used the original config file, it gave me the same result.
> [INFO] [deepspeed_utils.py:118:_handle_overflow] rank 0 detected overflow nan in tensor 0:0 shape torch.Size([30528, 1024])                                                                                                                                 | 3/29324 [00:00<2:33:39,  3.18it/s]
[2020-08-20 14:38:13,808] [INFO] [zero_optimizer_stage1.py:621:step] [deepspeed] OVERFLOW! Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.

The config file is like this:

> {
  "train_batch_size": 12,
  "train_micro_batch_size_per_gpu": 3,
  "steps_per_print": 10,
  "optimizer": {
    "type": "Adam",
    "params": {
      "lr": 3e-5,
      "weight_decay": 0.0,
      "bias_correction": false
    }
  },
  "gradient_clipping": 1.0,
  "fp16": {
    "enabled": true
  },
    "zero_optimization": {
      "stage": 1
  }
}


Could you help me fix it?
Thanks! 

Tony

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning: NaN or Inf found in input tensor when running DeepSpeedExamples/BingBertSquad. #324

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Warning: NaN or Inf found in input tensor when running DeepSpeedExamples/BingBertSquad. #324

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions