Add pixart-sigma test to image example #247

dacorvo · 2024-07-18T15:27:40Z

What does this PR do?

This adds a simple example of quantization of a pixart sigma diffusers pipeline.

Both the text_encoder and transformer models of the pipeline are quantized.

This pull-request also fixes #231.

dacorvo · 2024-07-18T16:13:37Z

Here an example image with int4 weights on a T4 (device memory is only 3GB instead of 12 GB).

sayakpaul · 2024-07-19T07:59:36Z

Ah nice. With int4, there's a drastic performance drop.

Can we serialize and deserialize the weights too?

dacorvo · 2024-07-19T11:28:36Z

I have helpers for transformers llm models that can help, but I haven't wrote the equivalent for pipelines:

https://github.com/huggingface/optimum-quanto/blob/main/README.md#llm-models

Ideally, if we were able to load each quantized submodel individually and pass the list to some pipeline creation method it would work.

sayakpaul · 2024-07-19T11:31:18Z

Yes, loading individual modules should be more than enough. Thanks!

sayakpaul · 2024-07-22T03:06:03Z

@dacorvo the example states that with int4 and CUDA, it won't work. Does it still apply?

sayakpaul · 2024-07-22T04:32:28Z

Also, for int4 option in the example, it's failing on the DGX:

Traceback (most recent call last):
  File "/home/sayak/optimum-quanto/examples/vision/text-to-image/quantize_pixart_sigma.py", line 78, in <module>
    image = pipeline(
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/sayak/diffusers/src/diffusers/pipelines/pixart_alpha/pipeline_pixart_sigma.py", line 834, in __call__
    noise_pred = self.transformer(
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sayak/diffusers/src/diffusers/models/transformers/pixart_transformer_2d.py", line 321, in forward
    hidden_states = self.proj_out(hidden_states)
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sayak/optimum-quanto/optimum/quanto/nn/qlinear.py", line 45, in forward
    return torch.nn.functional.linear(input, self.qweight, bias=self.bias)
  File "/home/sayak/optimum-quanto/optimum/quanto/tensor/qtensor.py", line 90, in __torch_function__
    return qfunc(*args, **kwargs)
  File "/home/sayak/optimum-quanto/optimum/quanto/tensor/qtensor_func.py", line 152, in linear
    return QTensorLinear.apply(input, other, bias)
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/sayak/optimum-quanto/optimum/quanto/tensor/qtensor_func.py", line 130, in forward
    output = output + bias
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I am on PyTorch 2.3.1. CUDA VERSION is 12.2.

Anything am I missing out here? It doesn't seem to be a problem for Colab, though.

Could be because of the driver versions?

DGX: Driver Version: 535.129.03
Colab: Driver Version: 535.104.05

dacorvo · 2024-07-22T06:38:03Z

Yes, I am tracking this in #248. As you can see in the issue, the workaround is to exclude the final proj.

sayakpaul · 2024-07-22T06:43:49Z

Ah thank you!

dacorvo added 5 commits July 18, 2024 15:42

feat(qtype): add qmin and qmax

8ed056f

fix(optimizers): use correct qmax

0030e7f

fix(library): avoid overflow in qbytes_mm

d4ea041

test(qbytes_mm): use broadcastable scale

43581bd

feat(pixart): add example

61f27bb

dacorvo force-pushed the sigma-xl branch from 6d6e00a to 61f27bb Compare July 18, 2024 15:42

dacorvo merged commit 28df7f1 into main Jul 18, 2024
12 checks passed

dacorvo deleted the sigma-xl branch July 18, 2024 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pixart-sigma test to image example #247

Add pixart-sigma test to image example #247

dacorvo commented Jul 18, 2024

dacorvo commented Jul 18, 2024

sayakpaul commented Jul 19, 2024

dacorvo commented Jul 19, 2024

sayakpaul commented Jul 19, 2024

sayakpaul commented Jul 22, 2024

sayakpaul commented Jul 22, 2024 •

edited

Loading

dacorvo commented Jul 22, 2024

sayakpaul commented Jul 22, 2024

Add pixart-sigma test to image example #247

Add pixart-sigma test to image example #247

Conversation

dacorvo commented Jul 18, 2024

What does this PR do?

dacorvo commented Jul 18, 2024

sayakpaul commented Jul 19, 2024

dacorvo commented Jul 19, 2024

sayakpaul commented Jul 19, 2024

sayakpaul commented Jul 22, 2024

sayakpaul commented Jul 22, 2024 • edited Loading

dacorvo commented Jul 22, 2024

sayakpaul commented Jul 22, 2024

sayakpaul commented Jul 22, 2024 •

edited

Loading