You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gemma uses head_dim=256 which is enabled in pip wheels by default. We should compile kernels for head_dim=256 and change some kernel parameters for best performance in this case.
The text was updated successfully, but these errors were encountered:
As mentioned in #130 , the kernels for `head_dim=256` are not compiled
by default, this PR expose these attention kernels to pip wheels and
adds unittests/benchmarks for `head_dim=256`.
Gemma uses
head_dim=256
which is enabled in pip wheels by default. We should compile kernels forhead_dim=256
and change some kernel parameters for best performance in this case.The text was updated successfully, but these errors were encountered: