You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have fine-tuned a facebook's nllb model on my custom dataset for language translation. Could you provide a guideline on how to preform SpQR of this fine-tuned model? Specifically, I am interested in post-quantization methodologies.
Thanks in advance and great work implementing SpQR
The text was updated successfully, but these errors were encountered:
Hello!
Sorry for late answer. Unfortunately we did not try SpQR the SpQR technique on an encoder-decoder type models. While it is speculative on my part, I believe that since SpQR (similar to GPTQ) performs quantization per layer, the encoder component in "ecoder-decoder" of the model would require minimal changes to be compatible with SpQR (such as adjusting namings and potentially caching, as seen in this code snippet: https://github.com/Vahe1994/SpQR/blob/1c27ed6294d31f8f508ef02f95fb2bac0337d0a6/main.py#L114C46-L114C47). However, the decoder component would need to store the last activation from the encoder in order to calculate the input and output of the linear layer in the decoder blocks. If you have the input, output, and weights, you can run the SpQR engine on the layer. Therefore, the main part that requires modification is in the main.py file, where you need to retrieve the input and output for each the linear layer that you want to quantize.
Hi @Vahe1994,
I have fine-tuned a facebook's nllb model on my custom dataset for language translation. Could you provide a guideline on how to preform SpQR of this fine-tuned model? Specifically, I am interested in post-quantization methodologies.
Thanks in advance and great work implementing SpQR
The text was updated successfully, but these errors were encountered: