-
Notifications
You must be signed in to change notification settings - Fork 536
Description
Installation
To use sleep mode, a compiled module vllm_ascend.vllm_ascend_C is needed. Environment variable switch for compilation is different by default for v0.7.3 and v0.8.x+. So installation process can be slightly different. You should follow the latest official installation guide. Be sure to do the installation step by step, every sentence of the guide is there for a reason.
For v0.7.3
If you are building from source, you should run export COMPILE_CUSTOM_KERNELS=1 manually. So that it will compile during installation.
For v0.8.x+
Environment variable COMPILE_CUSTOM_KERNELS will be set 1 by default while building from source.
Usage
llm = LLM("Qwen/Qwen2.5-0.5B-Instruct", enable_sleep_mode=True)
# NPU HBM usage will significantly decrease
# Equivalent to calling .sleep()
llm.sleep(level=1)
# Restore from sleep state
llm.wake_up()The usage is quite simple.
Important Notes:
- The
levelparameter defaults to1when using sleep() - Passing values other than
1tolevelwill keep model weights on NPU while only discarding the KV cache
Verify
If you need to ensure sleep mode runs normally on your env, you can run pytest -sv tests/singlecard/test_camem.py on main branch. In this file there are 2 test cases. test_basic_camem tests if CaMemAllocator functions normally. test_end_to_end makes sure .sleep() can reduce most HBM usage, and .wake_up() can resume the model and produce the same output as before sleeping.
FAQs
Error libruntime.so undefined symbol during compilation
#14 45.82 ImportError: /usr/local/Ascend/ascend-toolkit/latest/lib64/libruntime.so:
#14 45.82 undefined symbol:
#14 45.82 _ZN12ErrorManager19ATCReportErrMessageESsRKSt6vectorISsSaISsEES4_As is discussed in #661 , the problem is due to the bad order of LD_LIBRARY_PATH setting. You should do the CANN env initialize with:
source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
source /usr/local/Ascend/nnal/atb/set_env.sh && \
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlibNotice the third line, by the right order of LD_LIBRARY_PATH setting, this problem can be solved.