[CPU] Skip CPU support unimplemented error by Yejing-Lai · Pull Request #3633 · deepspeedai/DeepSpeed

Yejing-Lai · 2023-05-30T12:57:30Z

This PR aims to add CPU inference UT support. We skip CPU support unimplemented errors and update cpu inference workflow.
Skip logic:

Builder name does not exist or is not implemented, then skip.
Add an API to get the supported data types by accelerator, which can help us skip GPU fp16 UT.

Yejing-Lai · 2023-06-06T09:06:56Z

I found that if no tests were collected, pytest will return exit code 5 (the correct exit code should be 0). I updated the cmd of running ut to ensure that pytest can exit correctly.

tjruwase · 2023-06-29T04:13:57Z

accelerator/cuda_accelerator.py

            return False

+    def supported_dtypes(self):
+        return [torch.float, torch.half]


Cuda support torch.bfloat16 as well, right?

Is there any reason we cannot use the data types defined here
https://pytorch.org/docs/stable/tensors.html#data-types

Hi tjruwase, I am sorry for the late reply. Cuda support torch.bfloat16 as well. I will update my PR later, Thanks~

tests/unit/checkpoint/test_latest_checkpoint.py

tjruwase · 2023-07-11T15:43:42Z

@Yejing-Lai, is this PR ready for review? Thanks!

Yejing-Lai · 2023-07-12T06:35:45Z

@Yejing-Lai, is this PR ready for review? Thanks!

It's ready. The ut can work correctly on my local env. Please review. Thanks~

tjruwase · 2023-07-12T15:13:04Z

@Yejing-Lai, PR looks good. Please resolve conflict to enable merge. Thanks.

Yejing-Lai · 2023-07-13T07:26:15Z

@Yejing-Lai, PR looks good. Please resolve conflict to enable merge. Thanks.

Hi @tjruwase , can I run the CI again? I think the reason for the failure of both checks is a network issue.

Add CPU FusedAdamBuilder

Revert "remove skip FusedAdamBuilder; add suported_dtypes" Revert "remove unused parameters" Revert "enable zero stage 1 and 2 for synchronized accelerator (a.k.a. CPU)" Revert "use cpu adam to implement fused adam" Revert "fused adam can build"

Yejing-Lai · 2023-07-19T08:03:11Z

Hi @tjruwase, I am sorry to revert the commit about fusedadam. Our fusedadam needs to be further modified, and we will submit another PR about fusedadam after this PR merge. Thanks for your review/merge~

tjruwase · 2023-07-19T10:57:53Z

@Yejing-Lai, no worries. Thanks for the update. Please ping me when you are ready to move forward with this PR. Thanks!

Yejing-Lai · 2023-07-19T13:42:12Z

@Yejing-Lai, no worries. Thanks for the update. Please ping me when you are ready to move forward with this PR. Thanks!

Hi @tjruwase. this PR is ready for merge. Please review. Thanks~

Yejing-Lai requested review from jeffra, loadams, mrwyattii and tjruwase as code owners May 30, 2023 12:57

Yejing-Lai force-pushed the lyj/cpu_infer_workflow branch 2 times, most recently from 5838577 to 7e534ab Compare June 6, 2023 09:03

skip cpu support unimplemented error and update cpu inference workflow

7ea1c8b

Yejing-Lai force-pushed the lyj/cpu_infer_workflow branch from 7e534ab to 7ea1c8b Compare June 6, 2023 09:12

loadams and others added 3 commits June 6, 2023 10:31

Merge branch 'master' into lyj/cpu_infer_workflow

c3e8b12

Merge branch 'master' into lyj/cpu_infer_workflow

1dbe7d1

Merge branch 'master' into lyj/cpu_infer_workflow

f94457a

tjruwase reviewed Jun 29, 2023

View reviewed changes

tjruwase mentioned this pull request Jun 30, 2023

Support cpu tensors without direct device invocation #3842

Merged

add torch.bfloat16 to cuda_accelerator

200857f

tjruwase reviewed Jul 3, 2023

View reviewed changes

tests/unit/checkpoint/test_latest_checkpoint.py Show resolved Hide resolved

Merge branch 'master' into lyj/cpu_infer_workflow

12725db

remove UtilsBuilder skip

5dfef16

tjruwase approved these changes Jul 12, 2023

View reviewed changes

Merge branch 'master' into lyj/cpu_infer_workflow

239a0ec

delock and others added 2 commits July 13, 2023 05:43

fused adam can build

abdb08e

Merge branch 'master' into lyj/cpu_infer_workflow

58a390d

tjruwase added this pull request to the merge queue Jul 17, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 17, 2023

delock added 2 commits July 17, 2023 22:56

use cpu adam to implement fused adam

f891636

enable zero stage 1 and 2 for synchronized accelerator (a.k.a. CPU)

c7c2451

delock and others added 5 commits July 18, 2023 05:04

remove unused parameters

49f5e41

Merge branch 'master' into lyj/cpu_infer_workflow

b231f7e

Merge branch 'master' into gma/fused_adam

a78c626

Merge branch 'gma/fused_adam' into lyj/cpu_infer_workflow

82ba269

Add CPU FusedAdamBuilder

remove skip FusedAdamBuilder; add suported_dtypes

d05c8cf

Yejing-Lai requested review from RezaYazdaniAminabadi, arashb, awan-10, cmikeh2 and samyam as code owners July 18, 2023 15:20

Yejing-Lai and others added 4 commits July 18, 2023 09:26

fix format

c5e8aa8

Merge branch 'master' into lyj/cpu_infer_workflow

cd23f32

Revert "fix format"

8d5c264

Revert "remove skip FusedAdamBuilder; add suported_dtypes" Revert "remove unused parameters" Revert "enable zero stage 1 and 2 for synchronized accelerator (a.k.a. CPU)" Revert "use cpu adam to implement fused adam" Revert "fused adam can build"

Merge branch 'master' into lyj/cpu_infer_workflow

4210904

tjruwase approved these changes Jul 19, 2023

View reviewed changes

tjruwase added this pull request to the merge queue Jul 19, 2023

Merged via the queue into deepspeedai:master with commit 7290aac Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Skip CPU support unimplemented error#3633

[CPU] Skip CPU support unimplemented error#3633
tjruwase merged 21 commits intodeepspeedai:masterfrom
Yejing-Lai:lyj/cpu_infer_workflow

Yejing-Lai commented May 30, 2023

Uh oh!

Yejing-Lai commented Jun 6, 2023

Uh oh!

tjruwase Jun 29, 2023

Uh oh!

tjruwase Jun 30, 2023 •

edited

Loading

Uh oh!

Yejing-Lai Jul 1, 2023

Uh oh!

Uh oh!

tjruwase commented Jul 11, 2023

Uh oh!

Yejing-Lai commented Jul 12, 2023

Uh oh!

tjruwase commented Jul 12, 2023

Uh oh!

Yejing-Lai commented Jul 13, 2023

Uh oh!

Uh oh!

Yejing-Lai commented Jul 19, 2023

Uh oh!

tjruwase commented Jul 19, 2023

Uh oh!

Yejing-Lai commented Jul 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Yejing-Lai commented May 30, 2023

Uh oh!

Yejing-Lai commented Jun 6, 2023

Uh oh!

tjruwase Jun 29, 2023

Choose a reason for hiding this comment

Uh oh!

tjruwase Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yejing-Lai Jul 1, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tjruwase commented Jul 11, 2023

Uh oh!

Yejing-Lai commented Jul 12, 2023

Uh oh!

tjruwase commented Jul 12, 2023

Uh oh!

Yejing-Lai commented Jul 13, 2023

Uh oh!

Uh oh!

Yejing-Lai commented Jul 19, 2023

Uh oh!

tjruwase commented Jul 19, 2023

Uh oh!

Yejing-Lai commented Jul 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tjruwase Jun 30, 2023 •

edited

Loading