You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 9, 2024. It is now read-only.
I think the issue is the input is not share to the second GPU. I have a similar issue with microsoft/bloom-deepspeed-inference-int8, if I repeat the input X time (X = # of GPUs) it will keep inference to get the output.
Instead, I tried the bigscience/bloom based on bloom-accelerate-inference.py it works well with interactive input.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I am trying to create a simple chatbot using bloom-7b1 model (may use bigger models later) based on bloom-ds-zero-inference.py.
Here is my code:
I have not yet applied the post-processing of the output. This works fine if I run it with
but when I run it with
I am using two Tesla V100 GPUs.
deepspeed==0.9.2 and torch==1.14.0a0+410ce96 and Python 3.8.10
The text was updated successfully, but these errors were encountered: