Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor #6313

Merged
merged 1 commit into from
Jul 10, 2024

Conversation

WoosukKwon
Copy link
Collaborator

Fixes #6269

However, I'm still not sure how #4645 passed the Neuron CI test.

@WoosukKwon
Copy link
Collaborator Author

@liangfu #6269 might mean that the neuron CI is not working correctly. Could you please take a look?

@areanddee
Copy link

@WoosukKwon Thanks for the prompt response to my issue #6269! When the PR is approved, can you please follow up with a procedure to update my install to run the patched vLLM on Neuron systems? I urgently need this for a project I am working on.

@liangfu
Copy link
Contributor

liangfu commented Jul 10, 2024

Thanks for the fix. The current neuron CI only tests online inference, the offline inference capability is currently not tested for neuron backend.

@areanddee
Copy link

Thanks for the fix. The current neuron CI only tests online inference, the offline inference capability is currently not tested for neuron backend.

Well, the online inference also appears to be broken:

python -m vllm.entrypoints.openai.api_server
--model facebook/opt-125m
WARNING 07-10 22:25:24 _custom_ops.py:14] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 07-10 22:25:29 api_server.py:206] vLLM API server version 0.5.1
INFO 07-10 22:25:29 api_server.py:207] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='facebook/opt-125m', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, device='auto', scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, model_loader_extra_config=None, preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
INFO 07-10 22:25:31 llm_engine.py:169] Initializing an LLM engine (v0.5.1) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cpu, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=facebook/opt-125m, use_v2_block_manager=False, enable_prefix_caching=False)
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 216, in
engine = AsyncLLMEngine.from_engine_args(
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 431, in from_engine_args
engine = cls(
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 360, in init
self.engine = self._init_engine(*args, **kwargs)
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 507, in _init_engine
return engine_class(*args, **kwargs)
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 243, in init
self.model_executor = executor_class(
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 128, in init
super().init(model_config, cache_config, parallel_config,
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 42, in init
self._init_executor()
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/executor/neuron_executor.py", line 21, in _init_executor
self._init_worker()
File "/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.10/site-packages/vllm/executor/neuron_executor.py", line 26, in _init_worker
self.driver_worker = NeuronWorker(
TypeError: Can't instantiate abstract class NeuronWorker with abstract method execute_worker

@WoosukKwon
Copy link
Collaborator Author

@liangfu As @areanddee pointed out, the error happens when NeuronExecutor is initialized since the neuron executor does not implement the abstract methods. The error should happen for both offline and online entry points.

@WoosukKwon WoosukKwon merged commit 997df46 into main Jul 10, 2024
70 of 71 checks passed
@WoosukKwon WoosukKwon deleted the fix-neuron branch July 10, 2024 23:39
@areanddee
Copy link

Saw that the patch to #6313 was merged to main, so I did a git pull origin main to update to the latest. The behavior seen in #6269 is still present. I am posting this here because it was speculated that #6313 would fix #6269 as well.

adityagoel14 pushed a commit to adityagoel14/vllm-torchrun-test that referenced this pull request Jul 11, 2024
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

f[Bug]: TypeError: Can't instantiate abstract class NeuronWorker with abstract method execute_worker
3 participants