-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
Describe the bug
The code in the section:
https://www.deepspeed.ai/tutorials/inference-tutorial/#initializing-for-inference
appears to be incorrect.
Specifically, in the first code block of that section
model = ds_engine.module
output = model('Input String')should probably be:
model = ds_engine.module
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(f"{pipe('Hello')}")(or something similar). The code in
https://www.deepspeed.ai/tutorials/inference-tutorial/#end-to-end-gpt-neo-27b-inference
is closer to what's required.
To Reproduce
I attempted to use the code sample as the basis for my own script.
Expected behavior
Code sample to work
ds_report output
Please run ds_report to give us details about your setup.
torch install path ............... ['/azureml-envs/azureml_28a03b9f9b7c5f401fdd179f0d3ee4d8/lib/python3.10/site-packages/torch']
torch version .................... 2.6.0+cu124
deepspeed install path ........... ['/azureml-envs/azureml_28a03b9f9b7c5f401fdd179f0d3ee4d8/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.16.4, unknown, unknown
torch cuda version ............... 12.4
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0
shared memory (/dev/shm) size .... 2.00 GB
Screenshots
If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
- OS: Ubuntu
- GPU count and types [e.g. two machines with x8 A100s each]: Single machine with 4xA100
- (if applicable) what DeepSpeed-MII version are you using
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
- Python version: 3.10
- Any other relevant info about your setup
Docker context
Are you using a specific docker image that you can share?
Additional context
Running in AzureML.