Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] error when serving glm4-9b-chat-1m #2522

Closed
3 tasks
YanShuang17 opened this issue Sep 26, 2024 · 1 comment
Closed
3 tasks

[Bug] error when serving glm4-9b-chat-1m #2522

YanShuang17 opened this issue Sep 26, 2024 · 1 comment

Comments

@YanShuang17
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

尝试在本地部署glm4-9b-chat-1m时报错。

Reproduction

CUDA_VISIBLE_DEVICES=2 lmdeploy serve api_server
--server-port 6066
--backend turbomind
--log-level INFO
--model-name glm-4-9b-chat-1m
--session-len 256000
--tp 1
--max-batch-size 1
--cache-max-entry-count 0.9
--max-prefill-token-num 8192
--model-format hf
--quant-policy 0
--max-prefill-iters 1
/juicefs-algorithm/models/nlp/huggingface/THUDM/glm-4-9b-chat-1m/

Environment

Package                   Version
------------------------- -----------
accelerate                0.34.2
addict                    2.4.0
aiohappyeyeballs          2.4.0
aiohttp                   3.10.6
aiosignal                 1.3.1
annotated-types           0.7.0
anyio                     4.6.0
async-timeout             4.0.3
attrs                     24.2.0
certifi                   2024.8.30
charset-normalizer        3.3.2
click                     8.1.7
cloudpickle               3.0.0
datasets                  3.0.1
dill                      0.3.8
diskcache                 5.6.3
distro                    1.9.0
einops                    0.8.0
exceptiongroup            1.2.2
fastapi                   0.115.0
filelock                  3.16.1
fire                      0.6.0
frozenlist                1.4.1
fsspec                    2024.6.1
h11                       0.14.0
httpcore                  1.0.5
httpx                     0.27.2
huggingface-hub           0.25.1
idna                      3.10
importlib_metadata        8.5.0
interegular               0.3.3
Jinja2                    3.1.4
jiter                     0.5.0
jsonschema                4.23.0
jsonschema-specifications 2023.12.1
lark                      1.2.2
llvmlite                  0.43.0
lmdeploy                  0.6.0
markdown-it-py            3.0.0
MarkupSafe                2.1.5
mdurl                     0.1.2
mmengine-lite             0.10.5
mpmath                    1.3.0
multidict                 6.1.0
multiprocess              0.70.16
nest-asyncio              1.6.0
networkx                  3.3
numba                     0.60.0
numpy                     1.26.4
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.20.5
nvidia-nvjitlink-cu12     12.6.68
nvidia-nvtx-cu12          12.1.105
openai                    1.48.0
outlines                  0.0.46
packaging                 24.1
pandas                    2.2.3
peft                      0.11.1
pillow                    10.4.0
pip                       22.0.2
platformdirs              4.3.6
protobuf                  5.28.2
psutil                    6.0.0
pyairports                2.1.1
pyarrow                   17.0.0
pycountry                 24.6.1
pydantic                  2.9.2
pydantic_core             2.23.4
Pygments                  2.18.0
pynvml                    11.5.3
python-dateutil           2.9.0.post0
pytz                      2024.2
PyYAML                    6.0.2
referencing               0.35.1
regex                     2024.9.11
requests                  2.32.3
rich                      13.8.1
rpds-py                   0.20.0
safetensors               0.4.5
sentencepiece             0.2.0
setuptools                59.6.0
shortuuid                 1.0.13
six                       1.16.0
sniffio                   1.3.1
starlette                 0.38.6
sympy                     1.13.3
termcolor                 2.4.0
tiktoken                  0.7.0
tokenizers                0.20.0
tomli                     2.0.1
torch                     2.3.1
torchvision               0.18.1
tqdm                      4.66.5
transformers              4.45.0
triton                    2.3.1
typing_extensions         4.12.2
tzdata                    2024.2
urllib3                   2.2.3
uvicorn                   0.30.6
wheel                     0.37.1
xxhash                    3.5.0
yapf                      0.40.2
yarl                      1.12.1
zipp                      3.20.2

Error traceback

...
2024-09-26 19:08:25,316 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_format='hf', tp=1, session_len=256000, max_batch_size=1, cache_max_entry_count=0.9, cache_chunk_size=-1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=8192, max_prefill_iters=32)
Traceback (most recent call last):
  File "/data/shuang_yan/lmdeploy-env/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 36, in run
    args.run(args)
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/cli/serve.py", line 287, in api_server
    run_api_server(args.model_path,
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 1007, in serve
    VariableInterface.async_engine = pipeline_class(
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 169, in __init__
    self.stop_words = _stop_words(self.chat_template.stop_words,
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/utils.py", line 170, in _stop_words
    stop_indexes += tokenizer.indexes_containing_token(stop_word)
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 661, in indexes_containing_token
    encoded = self.encode(token, add_bos=False)
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 600, in encode
    return self.model.encode(s, add_bos, add_special_tokens, **kwargs)
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 531, in encode
    return super(ChatGLM4Tokenizer, self).encode(s,
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 366, in encode
    encoded = self.model.encode(s,
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2791, in encode
    encoded_inputs = self.encode_plus(
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3210, in encode_plus
    return self._encode_plus(
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 801, in _encode_plus
    return self.prepare_for_model(
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3706, in prepare_for_model
    encoded_inputs = self.pad(
  File "/data/shuang_yan/lmdeploy-env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3508, in pad
    encoded_inputs = self._pad(
TypeError: ChatGLM4Tokenizer._pad() got an unexpected keyword argument 'padding_side'
[TM][INFO] [InternalThreadEntry] stop requested.
[TM][INFO] [OutputThreadEntry] stop requested.
@lvhan028
Copy link
Collaborator

#2520 addresses this issue.
You can downgrade transformers as a workaround, or wait for the release of lmdeploy this week later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants