You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to the installation instructions, an error occurred when I ran the offline benchmarking script and I have installed the sentencepiece. How can I solve it ? Thanks!
INFO 05-16 17:10:41 llm_engine.py:90] Initializing an LLM engine with config: model='./qserve_checkpoints/Llama-3-8B-QServe', tokenizer='./qserve_checkpoints/Llama-3-8B-QServe', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=int8, device_config=cuda, ifb_config=False, seed=0)
Traceback (most recent call last):
File "/workspace/llm/qserve/qserve_benchmark.py", line 128, in
main(args)
File "/workspace/llm/qserve/qserve_benchmark.py", line 95, in main
engine = initialize_engine(args)
File "/workspace/llm/qserve/qserve_benchmark.py", line 73, in initialize_engine
return LLMEngine.from_engine_args(engine_args)
File "/workspace/llm/qserve/qserve/engine/llm_engine.py", line 241, in from_engine_args
engine = cls(
File "/workspace/llm/qserve/qserve/engine/llm_engine.py", line 131, in init
self._init_tokenizer()
File "/workspace/llm/qserve/qserve/engine/llm_engine.py", line 194, in _init_tokenizer
self.tokenizer = get_tokenizer(self.model_config.tokenizer, **init_kwargs)
File "/workspace/llm/qserve/qserve/utils/tokenizer.py", line 50, in get_tokenizer
raise e
File "/workspace/llm/qserve/qserve/utils/tokenizer.py", line 27, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 862, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
return cls._from_pretrained(
File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 120, in init
raise ValueError(
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
The text was updated successfully, but these errors were encountered:
Hi @Rudin6 , thanks for your interests in QServe! Could you provide more information about this error? For example, what is the version of your transformers and tokenizers? We are using 0.15.1 of tokenizers.
According to the installation instructions, an error occurred when I ran the offline benchmarking script and I have installed the sentencepiece. How can I solve it ? Thanks!
INFO 05-16 17:10:41 llm_engine.py:90] Initializing an LLM engine with config: model='./qserve_checkpoints/Llama-3-8B-QServe', tokenizer='./qserve_checkpoints/Llama-3-8B-QServe', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=int8, device_config=cuda, ifb_config=False, seed=0)
Traceback (most recent call last):
File "/workspace/llm/qserve/qserve_benchmark.py", line 128, in
main(args)
File "/workspace/llm/qserve/qserve_benchmark.py", line 95, in main
engine = initialize_engine(args)
File "/workspace/llm/qserve/qserve_benchmark.py", line 73, in initialize_engine
return LLMEngine.from_engine_args(engine_args)
File "/workspace/llm/qserve/qserve/engine/llm_engine.py", line 241, in from_engine_args
engine = cls(
File "/workspace/llm/qserve/qserve/engine/llm_engine.py", line 131, in init
self._init_tokenizer()
File "/workspace/llm/qserve/qserve/engine/llm_engine.py", line 194, in _init_tokenizer
self.tokenizer = get_tokenizer(self.model_config.tokenizer, **init_kwargs)
File "/workspace/llm/qserve/qserve/utils/tokenizer.py", line 50, in get_tokenizer
raise e
File "/workspace/llm/qserve/qserve/utils/tokenizer.py", line 27, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 862, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
return cls._from_pretrained(
File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/anaconda3/envs/QServe/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 120, in init
raise ValueError(
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a
tokenizers
library serialization file,(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
The text was updated successfully, but these errors were encountered: