Supports accepting network requests, listening on specific ports and running GPTQ models on multiple GPUs #87

Arondight · 2024-03-22T09:50:23Z

If multiple GPUs are used to run the GPTQ model, memory would only be allocated on the first GPU, resulting in an error due to the inability to allocate more memory. This pr solves this problem. Also allow listening for network requests on specific ports, which is a necessary feature since the deployment environment is likely to not have a graphical interface.

…tiple GPUs

Arondight added 3 commits March 22, 2024 17:34

Add venv directory to git ignore list

38066de

Add --listen and --port to app.py

ada6471

Add --gptq_gpu_memory to app.py to support running GPTQ models on mul…

0b7b3d3

…tiple GPUs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supports accepting network requests, listening on specific ports and running GPTQ models on multiple GPUs #87

Supports accepting network requests, listening on specific ports and running GPTQ models on multiple GPUs #87

Arondight commented Mar 22, 2024

Supports accepting network requests, listening on specific ports and running GPTQ models on multiple GPUs #87

Are you sure you want to change the base?

Supports accepting network requests, listening on specific ports and running GPTQ models on multiple GPUs #87

Conversation

Arondight commented Mar 22, 2024