Skip to content
This repository has been archived by the owner on Oct 9, 2024. It is now read-only.

Wrong prediction from "bloom-deepspeed-inference-int8" #10

Closed
zomux opened this issue Sep 19, 2022 · 9 comments
Closed

Wrong prediction from "bloom-deepspeed-inference-int8" #10

zomux opened this issue Sep 19, 2022 · 9 comments

Comments

@zomux
Copy link

zomux commented Sep 19, 2022

I'm running bloom-deepspeed-inference-int8 using the following command on 8 x 40G A100 machine.

deepspeed --num_gpus 8 xxx.py --name microsoft/bloom-deepspeed-inference-int8 --dtype int8 --batch_size 8

I got the generation result, but they have a lot of repetition which is not the case for accelerate-based bloom int8 implementation.

Generate args {'max_new_tokens': 100, 'do_sample': False}
------------------------------------------------------------
in=DeepSpeed is a machine learning framework
out=DeepSpeed is a machine learning framework for deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep de
ep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep
 deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep deep

------------------------------------------------------------
in=He is working on
out=He is working on a new album, and is also working on a new album with his band, and is also working on a new album with his band, and is also working on a new album with his band, and is working on a new album, and is
working on a new album,

------------------------------------------------------------
in=He has a
out=He has a lot of money.
He has a lot of money.
He has a lot of money.
He has a lot of money.
He has a lot of money.
He has a

------------------------------------------------------------
in=He got all
out=He got all the way to the top of the mountain, and he was so very very very very very very very very very very very very very very

------------------------------------------------------------
in=Everyone is happy and I can
out=Everyone is happy and I can see that. I am happy too. I am happy too. I am happy too.

------------------------------------------------------------
in=The new movie that got Oscar this year                                                                                                                                                                                     out=The new movie that got Oscar this year is a movie about a movie about a movie about a movie about

------------------------------------------------------------
in=In the far far distance from our galaxy,
out=In the far far distance from our galaxy, there is a a a a a a a a a galaxy

------------------------------------------------------------
in=Peace is the only way
out=Peace is the only way to live. We must be peaceful and live in
@zomux
Copy link
Author

zomux commented Sep 19, 2022

Also got error "probability tensor contains either inf, nan or element < 0" with a longer prompt

@mayank31398
Copy link
Collaborator

@zomux can you try updating to the latest deepspeed (0.7.3)?
microsoft/DeepSpeed#2217 (comment)
This issue was mentioned before and is now fixed.

@zomux
Copy link
Author

zomux commented Sep 19, 2022

@mayank31398 I'm running ds with the latest github checkout

➜  pip list | grep deepspeed
deepspeed                     0.7.3+15923810

Thanks for the pointer, I will check the discussion there.

@zomux
Copy link
Author

zomux commented Sep 20, 2022

I'm also getting the "CUDA error: an illegal memory access was encountered" error with little bit longer prompt , same as microsoft/DeepSpeed#2217 (comment)

Is it possible that the checkpoints in https://huggingface.co/microsoft/bloom-deepspeed-inference-int8/tree/main are produced before that fix was merged?

@mayank31398
Copy link
Collaborator

The checkpoints dont have anything to do with this

@mayank31398
Copy link
Collaborator

Try using this branch
ds-inference/support-large-token-length in deepspeed
This is still WIP

@zomux
Copy link
Author

zomux commented Sep 20, 2022

Awesome thanks, gonna check it. Let me know if you want more details for reproducing this problem.

@zomux
Copy link
Author

zomux commented Sep 20, 2022

@mayank31398 Thanks for the pointers. I think my issue is solved after putting different things together. Thanks!

@zomux
Copy link
Author

zomux commented Sep 20, 2022

Resolving this issue.

@zomux zomux closed this as completed Sep 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants