BUG: Erroring out on Mac m1 with qwen-chat #328

padamshrestha · 2023-08-08T03:29:29Z

Describe the bug

Erroring out on Mac m1 with qwen-chat

Upon running with

qwn-chat, pytorch, 7, 4-bit

root@b75edd526665:/app# xinference
INFO:xinference:Xinference successfully started. Endpoint: http://127.0.0.1:9997
INFO:xinference.core.supervisor:Worker 127.0.0.1:20355 has been added successfully
INFO:xinference.deploy.worker:Xinference worker successfully started.
Fetching 31 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 73.92steps/s]
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdguicm39
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdguicm39/_remote_module_non_scriptable.py
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/gradio/routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 1392, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 1097, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.8/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 703, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 703, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/xinference/core/gradio.py", line 333, in select_model
model_uid = self._create_model(
File "/usr/local/lib/python3.8/dist-packages/xinference/core/gradio.py", line 60, in _create_model
return self._api.launch_model(
File "/usr/local/lib/python3.8/dist-packages/xinference/core/api.py", line 110, in launch_model
return self._isolation.call(_launch_model())
File "/usr/local/lib/python3.8/dist-packages/xinference/isolation.py", line 44, in call
return fut.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/dist-packages/xinference/core/api.py", line 100, in _launch_model
await supervisor_ref.launch_builtin_model(
File "xoscar/core.pyx", line 288, in __pyx_actor_method_wrapper
File "xoscar/core.pyx", line 422, in _handle_actor_result
File "xoscar/core.pyx", line 465, in _run_actor_async_generator
File "xoscar/core.pyx", line 466, in xoscar.core._BaseActor._run_actor_async_generator
File "xoscar/core.pyx", line 471, in xoscar.core._BaseActor._run_actor_async_generator
File "/usr/local/lib/python3.8/dist-packages/xinference/core/supervisor.py", line 165, in launch_builtin_model
model_ref = yield worker_ref.launch_builtin_model(
File "xoscar/core.pyx", line 476, in xoscar.core._BaseActor._run_actor_async_generator
File "xoscar/core.pyx", line 396, in _handle_actor_result
File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
File "/usr/local/lib/python3.8/dist-packages/xinference/core/utils.py", line 25, in wrapped
ret = await func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/xinference/core/worker.py", line 183, in launch_builtin_model
await model_ref.load()
File "/usr/local/lib/python3.8/dist-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/usr/local/lib/python3.8/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/usr/local/lib/python3.8/dist-packages/xoscar/backends/pool.py", line 657, in send
result = await self._run_coro(message.message_id, coro)
File "/usr/local/lib/python3.8/dist-packages/xoscar/backends/pool.py", line 368, in _run_coro
return await coro
File "/usr/local/lib/python3.8/dist-packages/xoscar/api.py", line 306, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 545, in on_receive
File "xoscar/core.pyx", line 515, in xoscar.core._BaseActor.on_receive
File "xoscar/core.pyx", line 516, in xoscar.core._BaseActor.on_receive
File "xoscar/core.pyx", line 519, in xoscar.core._BaseActor.on_receive
File "/usr/local/lib/python3.8/dist-packages/xinference/core/model.py", line 86, in load
self._model.load()
File "/usr/local/lib/python3.8/dist-packages/xinference/model/llm/pytorch/core.py", line 189, in load
self._model, self._tokenizer = self._load_model(kwargs)
File "/usr/local/lib/python3.8/dist-packages/xinference/model/llm/pytorch/core.py", line 126, in _load_model
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 2842, in from_pretrained
raise ValueError(
ValueError: [address=127.0.0.1:42575, pid=12626]
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

The text was updated successfully, but these errors were encountered:

pangyoki · 2023-08-08T03:57:08Z

Currently qwen-chat model does not support quantization on MacOS, please set the quantization attribute to none.

What's more, qwen-chat model requires the MacOS system to be in MacOS 13, because torch.sort is used in this model, and Half type of this operator is only supported in the MacOS 13 system.

padamshrestha · 2023-08-08T05:18:02Z

Thanks for the info. I have have MacOS 13 but I'm running apps on Docker container setting quantization attribute to none didn't help as well.

pangyoki · 2023-08-08T07:03:16Z

Is your docker container an ubuntu system? If yes, the model may only run on CPU device. However, the cuda device is used by default in the linux system, resulting in an error. （The function of automatically selecting devices according to the environment will be implemented in the next version.）

It is recommended that you directly build a conda environment on the mac to use xinference, so that you can use the mps backend. Because pure cpu device runs the model very slowly.

padamshrestha · 2023-08-08T20:23:49Z

Yes, that's correct. I have Ubuntu container on Mac m1. Looking forward the device selection feature.

UranusSeven · 2023-08-14T04:22:14Z

@padamshrestha Hi, this issue has been resolve by #322 and #331 and is now available in the latest release v0.1.3!

XprobeBot added the gpu label Aug 8, 2023

XprobeBot added this to the v0.2.0 milestone Aug 8, 2023

UranusSeven changed the title ~~Erroring out on Mac m1 with qwen-chat~~ BUG: Erroring out on Mac m1 with qwen-chat Aug 8, 2023

XprobeBot added the bug Something isn't working label Aug 8, 2023

UranusSeven closed this as completed Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Erroring out on Mac m1 with qwen-chat #328

BUG: Erroring out on Mac m1 with qwen-chat #328

padamshrestha commented Aug 8, 2023

pangyoki commented Aug 8, 2023 •

edited

Loading

padamshrestha commented Aug 8, 2023

pangyoki commented Aug 8, 2023 •

edited

Loading

padamshrestha commented Aug 8, 2023

UranusSeven commented Aug 14, 2023

BUG: Erroring out on Mac m1 with qwen-chat #328

BUG: Erroring out on Mac m1 with qwen-chat #328

Comments

padamshrestha commented Aug 8, 2023

Describe the bug

Upon running with

pangyoki commented Aug 8, 2023 • edited Loading

padamshrestha commented Aug 8, 2023

pangyoki commented Aug 8, 2023 • edited Loading

padamshrestha commented Aug 8, 2023

UranusSeven commented Aug 14, 2023

pangyoki commented Aug 8, 2023 •

edited

Loading

pangyoki commented Aug 8, 2023 •

edited

Loading