Integrating with `autograd`, adding static kernel routing. #26

BlackSamorez · 2024-02-18T12:28:28Z

This PR aims to:

Add support for backward passes enabling Adapter Training.
Optimize kernel selection to:
- Allow to select optimal kernels for training/inference.
- Select kernels once to not run selection on each forward pass.

justheuristic · 2024-02-20T17:48:35Z

inference_lib/src/aqlm/inference_kernels/cuda_kernel.cpp

  auto output_sizes = input_sizes.vec();
  output_sizes.pop_back();
  output_sizes.push_back(-1);
-  auto output = flat_output.view(output_sizes);
+  auto output = flat_output.reshape(output_sizes).clone();


Can you please briefly describe why does this need clone on top of reshape?

RuntimeError: Output 0 of _QuantizedMatmulBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.

I added .clone() because the machine told me to.

In Machine we trust!

inference_lib/src/aqlm/inference_kernels/triton_kernel.py

justheuristic

LGTM! Please see minor comments above

Co-authored-by: justheuristic <justheuristic@gmail.com>

BlackSamorez added 15 commits February 17, 2024 16:58

backward pass

ab842da

triton kernel fix (still not working)

43f1c3a

prettier code

9465794

a

16d248b

cheaper init

f57778b

moved bias logic to cpp

16e9858

one-tiem kernel selection

6ba7711

Optimize for training wrapper

672a665

comment

4b08a7e

device guards

8d81cdb

correct backward pass selector

0412546

clone

a7d5627

real default

c2847cf

dequantization kernel

2fad09d

Merge branch 'main' into backward

acfee0e

This was referenced Feb 19, 2024

AQLM support for LoRA huggingface/peft#1476

Merged

Add training version check for AQLM quantizer. huggingface/transformers#29142

Merged

BlackSamorez added 2 commits February 20, 2024 15:45

isort

ca95dc4

python 3.8 support

898ff08

BlackSamorez mentioned this pull request Feb 20, 2024

[CI] Quantization workflow huggingface/transformers#29046

Merged

BlackSamorez requested a review from justheuristic February 20, 2024 15:24

justheuristic reviewed Feb 20, 2024

View reviewed changes

inference_lib/src/aqlm/inference_kernels/triton_kernel.py Outdated Show resolved Hide resolved

justheuristic approved these changes Feb 20, 2024

View reviewed changes

BlackSamorez and others added 3 commits February 20, 2024 21:44

Update inference_lib/src/aqlm/inference_kernels/triton_kernel.py

5752ee8

Co-authored-by: justheuristic <justheuristic@gmail.com>

1.0.2

1b01426

Merge branch 'backward' of github.com:Vahe1994/AQLM into backward

abbb0d9

BlackSamorez merged commit b0683b2 into main Feb 20, 2024
2 checks passed

BlackSamorez mentioned this pull request Mar 8, 2024

Remove post flattening CUDA clone()s, for 2% speedup in a 1x16 7B llama2 #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating with `autograd`, adding static kernel routing. #26

Integrating with `autograd`, adding static kernel routing. #26

BlackSamorez commented Feb 18, 2024

justheuristic Feb 20, 2024

BlackSamorez Feb 20, 2024

BlackSamorez Feb 20, 2024

justheuristic Feb 20, 2024

justheuristic left a comment

Integrating with autograd, adding static kernel routing. #26

Integrating with autograd, adding static kernel routing. #26

Conversation

BlackSamorez commented Feb 18, 2024

justheuristic Feb 20, 2024

Choose a reason for hiding this comment

BlackSamorez Feb 20, 2024

Choose a reason for hiding this comment

BlackSamorez Feb 20, 2024

Choose a reason for hiding this comment

justheuristic Feb 20, 2024

Choose a reason for hiding this comment

justheuristic left a comment

Choose a reason for hiding this comment

Integrating with `autograd`, adding static kernel routing. #26

Integrating with `autograd`, adding static kernel routing. #26