[Bug] error when serving glm4-9b-chat-1m #2522

YanShuang17 · 2024-09-26T11:14:38Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

尝试在本地部署glm4-9b-chat-1m时报错。

Reproduction

CUDA_VISIBLE_DEVICES=2 lmdeploy serve api_server
--server-port 6066
--backend turbomind
--log-level INFO
--model-name glm-4-9b-chat-1m
--session-len 256000
--tp 1
--max-batch-size 1
--cache-max-entry-count 0.9
--max-prefill-token-num 8192
--model-format hf
--quant-policy 0
--max-prefill-iters 1
/juicefs-algorithm/models/nlp/huggingface/THUDM/glm-4-9b-chat-1m/

Environment

Package                   Version
------------------------- -----------
accelerate                0.34.2
addict                    2.4.0
aiohappyeyeballs          2.4.0
aiohttp                   3.10.6
aiosignal                 1.3.1
annotated-types           0.7.0
anyio                     4.6.0
async-timeout             4.0.3
attrs                     24.2.0
certifi                   2024.8.30
charset-normalizer        3.3.2
click                     8.1.7
cloudpickle               3.0.0
datasets                  3.0.1
dill                      0.3.8
diskcache                 5.6.3
distro                    1.9.0
einops                    0.8.0
exceptiongroup            1.2.2
fastapi                   0.115.0
filelock                  3.16.1
fire                      0.6.0
frozenlist                1.4.1
fsspec                    2024.6.1
h11                       0.14.0
httpcore                  1.0.5
httpx                     0.27.2
huggingface-hub           0.25.1
idna                      3.10
importlib_metadata        8.5.0
interegular               0.3.3
Jinja2                    3.1.4
jiter                     0.5.0
jsonschema                4.23.0
jsonschema-specifications 2023.12.1
lark                      1.2.2
llvmlite                  0.43.0
lmdeploy                  0.6.0
markdown-it-py            3.0.0
MarkupSafe                2.1.5
mdurl                     0.1.2
mmengine-lite             0.10.5
mpmath                    1.3.0
multidict                 6.1.0
multiprocess              0.70.16
nest-asyncio              1.6.0
networkx                  3.3
numba                     0.60.0
numpy                     1.26.4
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.20.5
nvidia-nvjitlink-cu12     12.6.68
nvidia-nvtx-cu12          12.1.105
openai                    1.48.0
outlines                  0.0.46
packaging                 24.1
pandas                    2.2.3
peft                      0.11.1
pillow                    10.4.0
pip                       22.0.2
platformdirs              4.3.6
protobuf                  5.28.2
psutil                    6.0.0
pyairports                2.1.1
pyarrow                   17.0.0
pycountry                 24.6.1
pydantic                  2.9.2
pydantic_core             2.23.4
Pygments                  2.18.0
pynvml                    11.5.3
python-dateutil           2.9.0.post0
pytz                      2024.2
PyYAML                    6.0.2
referencing               0.35.1
regex                     2024.9.11
requests                  2.32.3
rich                      13.8.1
rpds-py                   0.20.0
safetensors               0.4.5
sentencepiece             0.2.0
setuptools                59.6.0
shortuuid                 1.0.13
six                       1.16.0
sniffio                   1.3.1
starlette                 0.38.6
sympy                     1.13.3
termcolor                 2.4.0
tiktoken                  0.7.0
tokenizers                0.20.0
tomli                     2.0.1
torch                     2.3.1
torchvision               0.18.1
tqdm                      4.66.5
transformers              4.45.0
triton                    2.3.1
typing_extensions         4.12.2
tzdata                    2024.2
urllib3                   2.2.3
uvicorn                   0.30.6
wheel                     0.37.1
xxhash                    3.5.0
yapf                      0.40.2
yarl                      1.12.1
zipp                      3.20.2

Error traceback

...
2024-09-26 19:08:25,316 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_format='hf', tp=1, session_len=256000, max_batch_size=1, cache_max_entry_count=0.9, cache_chunk_size=-1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=8192, max_prefill_iters=32)
Traceback (most recent call last):
  File "/data/shuang_yan/lmdeploy-env/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 36, in run
    args.run(args)
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/cli/serve.py", line 287, in api_server
    run_api_server(args.model_path,
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 1007, in serve
    VariableInterface.async_engine = pipeline_class(
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 169, in __init__
    self.stop_words = _stop_words(self.chat_template.stop_words,
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/utils.py", line 170, in _stop_words
    stop_indexes += tokenizer.indexes_containing_token(stop_word)
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 661, in indexes_containing_token
    encoded = self.encode(token, add_bos=False)
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 600, in encode
    return self.model.encode(s, add_bos, add_special_tokens, **kwargs)
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 531, in encode
    return super(ChatGLM4Tokenizer, self).encode(s,
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 366, in encode
    encoded = self.model.encode(s,
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2791, in encode
    encoded_inputs = self.encode_plus(
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3210, in encode_plus
    return self._encode_plus(
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 801, in _encode_plus
    return self.prepare_for_model(
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3706, in prepare_for_model
    encoded_inputs = self.pad(
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3508, in pad
    encoded_inputs = self._pad(
TypeError: ChatGLM4Tokenizer._pad() got an unexpected keyword argument 'padding_side'
[TM][INFO] [InternalThreadEntry] stop requested.
[TM][INFO] [OutputThreadEntry] stop requested.

lvhan028 · 2024-09-26T11:31:52Z

#2520 addresses this issue.
You can downgrade transformers as a workaround, or wait for the release of lmdeploy this week later.

YanShuang17 closed this as completed Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] error when serving glm4-9b-chat-1m #2522

[Bug] error when serving glm4-9b-chat-1m #2522

YanShuang17 commented Sep 26, 2024

lvhan028 commented Sep 26, 2024

[Bug] error when serving glm4-9b-chat-1m #2522

[Bug] error when serving glm4-9b-chat-1m #2522

Comments

YanShuang17 commented Sep 26, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lvhan028 commented Sep 26, 2024