unsloth with vllm in 8/4 bits #253

quancore · 2024-03-16T22:34:20Z

I have trained qlora model with unsloth and I want to serve with vllm but I did not found a way to serve model in8/4 bits ?

danielhanchen · 2024-03-17T02:53:32Z

@quancore I'm not sure / unsure if vLLM allows serving in 4 or 8 bits!
16bit yes, but unsure on 4 or 8

quancore · 2024-03-17T13:52:06Z

@danielhanchen I think it is: vllm-project/vllm#1155

patleeman · 2024-03-19T12:34:56Z

@danielhanchen I think it is: vllm-project/vllm#1155

Looks like they only support AWQ quantization not via bitsandbytes.

danielhanchen · 2024-03-20T04:27:19Z

@patleeman Oh ye AWQ is great - I'm assuming you want to quantize it to AWQ?

quancore · 2024-03-20T17:57:10Z

@patleeman @danielhanchen well yes, maybe we should support AWQ so we can use qlora models with vllm?

marcelodiaz558 · 2024-04-06T03:11:20Z

Hello there. I am also interested in using with VLLM a 8/4 bits model trained with Unsloth. Currently, it works fine with 16 bits but requires too much VRAM. Is there a way to quantize a model trained with Unsloth using AWQ or GPTQ?

danielhanchen · 2024-05-17T17:52:45Z

Whoops this missed me - yep having an option to convert it to AWQ is interesting

Louis2B2G · 2024-06-05T12:22:52Z

Whoops this missed me - yep having an option to convert it to AWQ is interesting

That would be amazing - is this a feature you are planning on adding in the near future?

danielhanchen · 2024-06-06T16:09:37Z

Yep for a future release!

amir-in-a-cynch · 2024-06-15T23:01:24Z

I'm down to volunteer to work on this, if you're accepting community contributions. (I have to do this for my day job anyway, so it might be nice to contribute to the library.)

Serega6678 · 2024-06-24T10:29:55Z

@amir-in-a-cynch do you plan to do it?

amir-in-a-cynch · 2024-06-24T11:32:13Z

@amir-in-a-cynch do you plan to do it?

I'll take a stab at it tomorrow and wednesday. Not sure if it'll end up being a clean integration to the API for this library (since it adds a dependency), but at the worst case we should be able to get an example notebook together on how to do it for the docs.

Serega6678 · 2024-06-24T13:56:08Z

@amir-in-a-cynch great, keep me in touch
I don't mind giving you a helping hand if you're stuck at some point

danielhanchen · 2024-07-01T00:41:03Z

I think vLLM exporting to 8bits is through AWQ - you can also enable float8 support (if your GPU supports it)

BBiering · 2024-09-09T08:53:56Z

@amir-in-a-cynch @danielhanchen Is there any update on this feature? Would be great to be able to use Unsloth quantized models with vLLM.

danielhanchen · 2024-09-10T08:25:05Z

Actually I think vLLM added 4bit quants - I need to check it out - I'll make some script fro this!

frei-x · 2024-09-25T09:00:09Z

unsloth AttributeError: Model Qwen2ForCausalLM does not support BitsAndBytes quantization yet.

danielhanchen · 2024-10-01T08:20:08Z

@frei-x Oh it should function now hopefully? Please update Unsloth! Sorry on the delay as well!

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

nandagopal1992 · 2024-10-01T11:46:46Z

@danielhanchen does this mean , the latest version has support for vllm with 4 / 8 bits?

Btw amazing work here :)

danielhanchen · 2024-10-02T03:53:31Z

@nandagopal1992 I'm pretty certain vLLM can load 4 bit bitsandbytes modules now

shimmyshimmer · 2025-01-19T06:53:30Z

Now supported! :) Let us know if you still have any issues

Karry11 mentioned this issue May 15, 2024

AWQ support #464

Open

danielhanchen added currently fixing Am fixing now! URGENT BUG Urgent bug labels Sep 10, 2024

shimmyshimmer closed this as completed Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unsloth with vllm in 8/4 bits #253

unsloth with vllm in 8/4 bits #253

quancore commented Mar 16, 2024

danielhanchen commented Mar 17, 2024

quancore commented Mar 17, 2024

patleeman commented Mar 19, 2024

danielhanchen commented Mar 20, 2024

quancore commented Mar 20, 2024

marcelodiaz558 commented Apr 6, 2024

danielhanchen commented May 17, 2024

Louis2B2G commented Jun 5, 2024

danielhanchen commented Jun 6, 2024 •

edited

Loading

amir-in-a-cynch commented Jun 15, 2024

Serega6678 commented Jun 24, 2024

amir-in-a-cynch commented Jun 24, 2024

Serega6678 commented Jun 24, 2024

danielhanchen commented Jul 1, 2024

BBiering commented Sep 9, 2024

danielhanchen commented Sep 10, 2024

frei-x commented Sep 25, 2024

danielhanchen commented Oct 1, 2024

nandagopal1992 commented Oct 1, 2024

danielhanchen commented Oct 2, 2024

shimmyshimmer commented Jan 19, 2025

unsloth with vllm in 8/4 bits #253

unsloth with vllm in 8/4 bits #253

Comments

quancore commented Mar 16, 2024

danielhanchen commented Mar 17, 2024

quancore commented Mar 17, 2024

patleeman commented Mar 19, 2024

danielhanchen commented Mar 20, 2024

quancore commented Mar 20, 2024

marcelodiaz558 commented Apr 6, 2024

danielhanchen commented May 17, 2024

Louis2B2G commented Jun 5, 2024

danielhanchen commented Jun 6, 2024 • edited Loading

amir-in-a-cynch commented Jun 15, 2024

Serega6678 commented Jun 24, 2024

amir-in-a-cynch commented Jun 24, 2024

Serega6678 commented Jun 24, 2024

danielhanchen commented Jul 1, 2024

BBiering commented Sep 9, 2024

danielhanchen commented Sep 10, 2024

frei-x commented Sep 25, 2024

danielhanchen commented Oct 1, 2024

nandagopal1992 commented Oct 1, 2024

danielhanchen commented Oct 2, 2024

shimmyshimmer commented Jan 19, 2025

danielhanchen commented Jun 6, 2024 •

edited

Loading