[BUG] Incorrect type check in engine.py for CPU training 

**Describe the bug**

The type check inside function `split_half_float_double_sparse` of [runtime/engine.py](https://github.com/microsoft/DeepSpeed/blob/v0.9.5/deepspeed/runtime/engine.py#L115-L124) does not recognize CPU tensors. Simply because it is checking for string `"torch.cpu.FloatTensor"` while the CPU tensor type is `"torch.FloatTensor"` (no `"cpu"`)

https://github.com/microsoft/DeepSpeed/blob/fc9e1ee00e673b2ce2a433c4e34c5440e76c9f3e/deepspeed/runtime/engine.py#L115-L124

**To Reproduce**

Enable Intel CPU backend of deepspeed:

```bash
pip install torch==1.13.1+cpu
pip install deepspeed==0.9.5
pip install intel_extension_for_pytorch==1.13+cpu -f https://developer.intel.com/ipex-whl-stable-cpu
pip install oneccl_bind_pt==1.13+cpu -f https://developer.intel.com/ipex-whl-stable-cpu
# also need to build oneCCL itself
```

Modify [small_model_debugging/test_model.py](https://github.com/microsoft/DeepSpeed/blob/v0.9.5/tests/small_model_debugging/test_model.py) to work on CPU:
- set `"torch_adam": True` in `config_dict` to skip op builder, and set `"enabled": False` for `"fp16"`
- set `dtype=torch.float` for `train_data` to avoid half precision

Running the script leads to error `AssertionError: attempting to reduce an unsupported grad type: torch.FloatTensor`. Remove the assertion (line 123~124 in DS 0.9.5) then it trains fine.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Incorrect type check in engine.py for CPU training #3837

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Incorrect type check in engine.py for CPU training #3837

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions