Issue with app spin-up #15

AvivSham · 2024-11-07T10:23:58Z

Hi all,
How are you?
Thank you for your amazing work!
We followed the README instructions and managed to build both the front and back ends. However, when we tried to spin up the backend by running docker-compose build we encountered the following error which relates to tabbyapi:

tabbyapi                 | Traceback (most recent call last):
tabbyapi                 |   File "/app/main.py", line 171, in <module>
tabbyapi                 |     entrypoint()
tabbyapi                 |   File "/app/main.py", line 167, in entrypoint
tabbyapi                 |     asyncio.run(entrypoint_async())
tabbyapi                 |   File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
tabbyapi                 |     return loop.run_until_complete(main)
tabbyapi                 |   File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
tabbyapi                 |     return future.result()
tabbyapi                 |   File "/app/main.py", line 76, in entrypoint_async
tabbyapi                 |     await model.load_model(model_path.resolve(), **config.model)
tabbyapi                 |   File "/app/common/model.py", line 100, in load_model
tabbyapi                 |     async for _ in load_model_gen(model_path, **kwargs):
tabbyapi                 |   File "/app/common/model.py", line 70, in load_model_gen
tabbyapi                 |     container = ExllamaV2Container(model_path.resolve(), False, **kwargs)
tabbyapi                 |   File "/app/backends/exllamav2/model.py", line 127, in __init__
tabbyapi                 |     self.config.prepare()
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/config.py", line 326, in prepare
tabbyapi                 |     f = STFile.open(st_file, fast = self.fasttensors, keymap = self.arch.keymap)
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 129, in open
tabbyapi                 |     return STFile(filename, fast, keymap)
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 70, in __init__
tabbyapi                 |     self.read_dict()
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 143, in read_dict
tabbyapi                 |     header_json = fp.read(header_size)
tabbyapi                 | MemoryError
tabbyapi exited with code 1

We are running with the following setup:
OS: Ubuntu
GPU: Nvidia A10G (The GPU is recognized by the container)

Thank you for helping!

The text was updated successfully, but these errors were encountered:

nguyenhoangthuan99 · 2024-11-10T04:39:10Z

Hi @AvivSham , It seems like you met error when loading models. Can you try disable fastensors by updating this field to false and try docker-compose up again

https://github.com/homebrewltd/ichigo-demo/blob/f973834f372f08bc3c99a26f31bf6f7db8776480/docker/tabbyapi/config.yml#L97

AvivSham · 2024-11-11T09:53:48Z

Hi @nguyenhoangthuan99,
We tried your suggestion and ended up with the same error (see trace below). We also looked at the GPU/CPU load when running docker-compose up but we did not spot something unusual.

[+] Running 3/0
 ⠿ Container tabbyapi                 Created                                                                                    0.0s
 ⠿ Container docker-whisper-speech-1  Created                                                                                    0.0s
 ⠿ Container docker-fish-speech-1     Created                                                                                    0.0s
Attaching to docker-fish-speech-1, docker-whisper-speech-1, tabbyapi
docker-fish-speech-1     | INFO:     Uvicorn running on http://0.0.0.0:22311 (Press CTRL+C to quit)
docker-fish-speech-1     | INFO:     Started parent process [1]
docker-fish-speech-1     | 2024-11-11 09:49:30.241 | INFO     | api:<module>:425 - Loading Llama model...
docker-fish-speech-1     | 2024-11-11 09:49:30.282 | INFO     | api:<module>:425 - Loading Llama model...
tabbyapi                 | INFO:     ExllamaV2 version: 0.2.1
tabbyapi                 | WARNING:  Disabling authentication makes your instance vulnerable. Set the
tabbyapi                 | `disable_auth` flag to False in config.yml if you want to share this instance
tabbyapi                 | with others.
tabbyapi                 | INFO:     Generation logging is enabled for: prompts, generation params
tabbyapi                 | Traceback (most recent call last):
tabbyapi                 |   File "/app/main.py", line 171, in <module>
tabbyapi                 |     entrypoint()
tabbyapi                 |   File "/app/main.py", line 167, in entrypoint
tabbyapi                 |     asyncio.run(entrypoint_async())
tabbyapi                 |   File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
tabbyapi                 |     return loop.run_until_complete(main)
tabbyapi                 |   File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
tabbyapi                 |     return future.result()
tabbyapi                 |   File "/app/main.py", line 76, in entrypoint_async
tabbyapi                 |     await model.load_model(model_path.resolve(), **config.model)
tabbyapi                 |   File "/app/common/model.py", line 100, in load_model
tabbyapi                 |     async for _ in load_model_gen(model_path, **kwargs):
tabbyapi                 |   File "/app/common/model.py", line 70, in load_model_gen
tabbyapi                 |     container = ExllamaV2Container(model_path.resolve(), False, **kwargs)
tabbyapi                 |   File "/app/backends/exllamav2/model.py", line 127, in __init__
tabbyapi                 |     self.config.prepare()
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/config.py", line 326, in prepare
tabbyapi                 |     f = STFile.open(st_file, fast = self.fasttensors, keymap = self.arch.keymap)
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 129, in open
tabbyapi                 |     return STFile(filename, fast, keymap)
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 70, in __init__
tabbyapi                 |     self.read_dict()
tabbyapi                 |   File "/usr/local/lib/python3.10/dist-packages/exllamav2/fasttensors.py", line 143, in read_dict
tabbyapi                 |     header_json = fp.read(header_size)
tabbyapi                 | MemoryError
docker-fish-speech-1     | 2024-11-11 09:49:31.387 | INFO     | api:<module>:425 - Loading Llama model...
tabbyapi exited with code 1

WDYT?

AvivSham mentioned this issue Nov 11, 2024

Update api.py #14

Open

hiento09 added this to Jan & Cortex Nov 22, 2024

github-project-automation bot moved this to Investigating in Jan & Cortex Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with app spin-up #15

Issue with app spin-up #15

AvivSham commented Nov 7, 2024

nguyenhoangthuan99 commented Nov 10, 2024

AvivSham commented Nov 11, 2024

Issue with app spin-up #15

Issue with app spin-up #15

Comments

AvivSham commented Nov 7, 2024

nguyenhoangthuan99 commented Nov 10, 2024

AvivSham commented Nov 11, 2024