-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
通过fastchat无法启动Yi-34B-Chat-4bits #187
Comments
不好意思,我这边没法复现上述的问题。 |
使用如下命令启动可以调用模型,但是模型没有输出 |
lm-sys/FastChat#2723 |
我使用python -m fastchat.serve.cli --model-path Yi-34B-Chat-4bits 可以load模型,但是我输入问题后就开始报错了 (py311) [root@gpu-server models]# python -m fastchat.serve.cli --model-path Yi-34B-Chat-4bits/ |
不单是不能运行,运行起来变量也对应不上,感觉得等fastchat那边更新了才行。 |
试下 main branch 或者等 fast chat 的下一个 release |
root@2d5b8b709f6b:~/llmodel0922start# python3 -m fastchat.serve.multi_model_worker --model-path /data/Yi-34B-Chat-4bits/Yi-34B-Chat-4bits --model-names Yi-34B-Chat-4bits --host 0.0.0.0
2023-11-25 09:56:03 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=21002, worker_address='http://localhost:21002', controller_address='http://localhost:21001', revision='main', device='cuda', gpus=None, num_gpus=1, max_gpu_memory=None, load_8bit=False, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, model_path=['/data/Yi-34B-Chat-4bits/Yi-34B-Chat-4bits'], model_names=[['Yi-34B-Chat-4bits']], limit_worker_concurrency=5, stream_interval=2, no_register=False)
2023-11-25 09:56:03 | INFO | model_worker | Loading the model ['Yi-34B-Chat-4bits'] on worker c5b518c4 ...
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
然后内存不断增加,等了十分钟,gpu没有负载,内存憋到了40G+,请问是哪里设置 错误了?
The text was updated successfully, but these errors were encountered: