Implementing matrix multiplication based on lookup tables #1851

vlasenkoalexey · 2023-06-28T05:02:29Z

vlasenkoalexey
Jun 28, 2023

The idea is simple, for quantized networks where int8 or int4 weights are used instead of doing matrix multiplication, we can do a table lookup. The lookup table size would not be too big, for int4xbfloat16 it is going to be just 16x65536=1MB, and 16MB for int8xbfloat16. There are a couple of publications proposing similar idea with promising results: https://arxiv.org/pdf/2206.09557.pdf and https://arxiv.org/pdf/2005.09904.pdf

The benefit is that GPUs don't natively support int4xbfloat16 and int8xbfloat16 matmuls, and if implemented efficiently such kernel might give a better performance.

The sample code for int8xint8 could look like this:

lookup_table_flat =  []
for i in range(256):
  for j in range(256):
    lookup_table_flat.append(j * i)
lookup_table_flat = jnp.array(lookup_table_flat)

def matrix_mult_with_lookup(A, B, lookup_table_flat):
    #get the number of rows and columns of the   result matrix
    rows_A, cols_A = A.shape
    rows_B, cols_B = B.shape

    #check if the matrices can be multiplied
    if cols_A != rows_B:
            raise ValueError("Cannot multiply  matrices:  incompatible dimensions.")
    #create the result matrix 
    c = jnp.empty((rows_A, cols_B)).astype(np.int8)
    for i in range(rows_A):
        for j in range(cols_B):
          #c[i][j] = jnp.take(lookup_table_flat, A[i,:] + (B[:,j] << 8)).sum()
          c = c.at[i,j].set(jnp.take(lookup_table_flat, A[i,:] + (B[:,j] << 8)).sum())
    return c

I tried to hack matmul sample, but realized that in order to make it work, I need direct access to tensor elements (for getting A[i,:] and B[:,j]) which is not supported in Triton, and I can't figure out how do this blocks.

Any suggestions how to code this in Triton?
Would it be possible to write an efficient implementation for this approach?
Any pointers are appreciated.

vakidzaci · 2024-11-18T09:26:54Z

vakidzaci
Nov 18, 2024

Did you manage to do it? I was interested in it but didn't find anything other than this.

2 replies

vlasenkoalexey Nov 18, 2024
Author

Unfortunately didn't have a chance to finish that and had to switch to another project.
But I'm still interested, I think this is a promising idea to try out.

vakidzaci Nov 19, 2024

Thanks. I will look into that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing matrix multiplication based on lookup tables #1851

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Implementing matrix multiplication based on lookup tables #1851

vlasenkoalexey Jun 28, 2023

Replies: 1 comment · 2 replies

vakidzaci Nov 18, 2024

vlasenkoalexey Nov 18, 2024 Author

vakidzaci Nov 19, 2024

vlasenkoalexey
Jun 28, 2023

Replies: 1 comment 2 replies

vakidzaci
Nov 18, 2024

vlasenkoalexey Nov 18, 2024
Author