feat: enable head_dim=256
for attention kernels
#132
Merged
head_dim=256
for attention kernels
#132