[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor #6313

WoosukKwon · 2024-07-10T17:06:16Z

However, I'm still not sure how #4645 passed the Neuron CI test.

WoosukKwon · 2024-07-10T17:22:20Z

@liangfu #6269 might mean that the neuron CI is not working correctly. Could you please take a look?

areanddee · 2024-07-10T17:35:08Z

@WoosukKwon Thanks for the prompt response to my issue #6269! When the PR is approved, can you please follow up with a procedure to update my install to run the patched vLLM on Neuron systems? I urgently need this for a project I am working on.

liangfu · 2024-07-10T21:10:22Z

Thanks for the fix. The current neuron CI only tests online inference, the offline inference capability is currently not tested for neuron backend.

areanddee · 2024-07-10T22:28:32Z

Thanks for the fix. The current neuron CI only tests online inference, the offline inference capability is currently not tested for neuron backend.

Well, the online inference also appears to be broken:

python -m vllm.entrypoints.openai.api_server
--model facebook/opt-125m
WARNING 07-10 22:25:24 _custom_ops.py:14] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 07-10 22:25:29 api_server.py:206] vLLM API server version 0.5.1
INFO 07-10 22:25:29 api_server.py:207] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='facebook/opt-125m', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, device='auto', scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, model_loader_extra_config=None, preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
INFO 07-10 22:25:31 llm_engine.py:169] Initializing an LLM engine (v0.5.1) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cpu, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=facebook/opt-125m, use_v2_block_manager=False, enable_prefix_caching=False)
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 216, in
engine = AsyncLLMEngine.from_engine_args(
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 431, in from_engine_args
engine = cls(
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 360, in init
self.engine = self._init_engine(*args, **kwargs)
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 507, in _init_engine
return engine_class(*args, **kwargs)
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 243, in init
self.model_executor = executor_class(
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 128, in init
super().init(model_config, cache_config, parallel_config,
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 42, in init
self._init_executor()
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/executor/neuron_executor.py", line 21, in _init_executor
self._init_worker()
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/executor/neuron_executor.py", line 26, in _init_worker
self.driver_worker = NeuronWorker(
TypeError: Can't instantiate abstract class NeuronWorker with abstract method execute_worker

WoosukKwon · 2024-07-10T23:38:53Z

@liangfu As @areanddee pointed out, the error happens when NeuronExecutor is initialized since the neuron executor does not implement the abstract methods. The error should happen for both offline and online entry points.

areanddee · 2024-07-11T14:58:20Z

Saw that the patch to #6313 was merged to main, so I did a git pull origin main to update to the latest. The behavior seen in #6269 is still present. I am posting this here because it was speculated that #6313 would fix #6269 as well.

…-project#6313) (cherry picked from commit 997df46)

…-project#6313)

…-project#6313) Signed-off-by: Alvant <alvasian@yandex.ru>

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor

c82558c

WoosukKwon added the Inferentia/Trainium label Jul 10, 2024

liangfu approved these changes Jul 10, 2024

View reviewed changes

WoosukKwon merged commit 997df46 into main Jul 10, 2024
70 of 71 checks passed

WoosukKwon deleted the fix-neuron branch July 10, 2024 23:39

jianyinglangaws mentioned this pull request Jul 11, 2024

f[Bug]: TypeError: Can't instantiate abstract class NeuronWorker with abstract method execute_worker #6269

Closed

adityagoel14 pushed a commit to adityagoel14/vllm-torchrun-test that referenced this pull request Jul 11, 2024

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (vllm…

bead0bc

…-project#6313) (cherry picked from commit 997df46)

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (vllm…

75f37be

…-project#6313)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (vllm…

7675e30

…-project#6313)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (vllm…

08e80ad

…-project#6313) Signed-off-by: Alvant <alvasian@yandex.ru>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor #6313

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor #6313

WoosukKwon commented Jul 10, 2024

WoosukKwon commented Jul 10, 2024

areanddee commented Jul 10, 2024

liangfu commented Jul 10, 2024

areanddee commented Jul 10, 2024

WoosukKwon commented Jul 10, 2024

areanddee commented Jul 11, 2024

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor #6313

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor #6313

Conversation

WoosukKwon commented Jul 10, 2024

WoosukKwon commented Jul 10, 2024

areanddee commented Jul 10, 2024

liangfu commented Jul 10, 2024

areanddee commented Jul 10, 2024

WoosukKwon commented Jul 10, 2024

areanddee commented Jul 11, 2024