You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Was wondering, if we quantize those on-the-fly would there be any benefit.
The quantization can be done with an extra ggml_cpy() call, before the ggml_mul_mat() call.
See if this speeds up the computation and how it affects perplexity
The text was updated successfully, but these errors were encountered:
Performance stats on 7B indicate that the F16 matrix multiplication account for just 2% of the total processing time.
The PoC in #1103 does not show any significant performance gains, so closing this for now
The following 2 matrix multiplication calls sill remain in FP16 precission:
Was wondering, if we quantize those on-the-fly would there be any benefit.
The quantization can be done with an extra
ggml_cpy()
call, before theggml_mul_mat()
call.See if this speeds up the computation and how it affects perplexity
The text was updated successfully, but these errors were encountered: