Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update shared_32x16_to_ldmatrix_32x16_layout to be injective
Previous version mapped the 512 input indices in a `(32,16)` array to only 128 output indices. This wasn't caught before, because the bijectivity assertion was only triggered for TE schedules.
- Loading branch information