-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Hi vllm dev team,
is vllm supposed to work with MPT-30B ? I tried loading it on AWS SageMaker using a ml.g5.12xlarge and even a ml.g5.48xlarge instance.
from vllm import LLM, SamplingParams
llm = LLM(model="mosaicml/mpt-30b")However in both cases I run into this error:
OutOfMemoryError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 22.19 GiB total capacity; 21.35 GiB already allocated; 46.50 MiB free; 21.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working