-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Closed
Labels
Description
Describe the bug
The type check inside function split_half_float_double_sparse of runtime/engine.py does not recognize CPU tensors. Simply because it is checking for string "torch.cpu.FloatTensor" while the CPU tensor type is "torch.FloatTensor" (no "cpu")
To Reproduce
Enable Intel CPU backend of deepspeed:
pip install torch==1.13.1+cpu
pip install deepspeed==0.9.5
pip install intel_extension_for_pytorch==1.13+cpu -f https://developer.intel.com/ipex-whl-stable-cpu
pip install oneccl_bind_pt==1.13+cpu -f https://developer.intel.com/ipex-whl-stable-cpu
# also need to build oneCCL itselfModify small_model_debugging/test_model.py to work on CPU:
- set
"torch_adam": Trueinconfig_dictto skip op builder, and set"enabled": Falsefor"fp16" - set
dtype=torch.floatfortrain_datato avoid half precision
Running the script leads to error AssertionError: attempting to reduce an unsupported grad type: torch.FloatTensor. Remove the assertion (line 123~124 in DS 0.9.5) then it trains fine.
Reactions are currently unavailable