Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX512 quantization (cast from float to uint8) returns wrong results #17800

Open
Flamefire opened this issue Oct 1, 2024 · 0 comments
Open

Comments

@Flamefire
Copy link
Contributor

This is the issue I reported originally at tensorflow/tensorflow#49944

The problem is that the values returned by the float to uint8 cast are always in the wrong order. I.e. the code here:

#ifdef EIGEN_VECTORIZE_AVX512BW
return _mm512_packs_epi16(_mm512_packs_epi32(a_int, b_int),
_mm512_packs_epi32(c_int, d_int));
#else
Packet8i ab_int16_low = _mm256_permute4x64_epi64(
_mm256_packs_epi32(_mm512_castsi512_si256(a_int),
_mm512_castsi512_si256(b_int)),
_MM_SHUFFLE(0, 2, 1, 3));
Packet8i cd_int16_low = _mm256_permute4x64_epi64(
_mm256_packs_epi32(_mm512_castsi512_si256(c_int),
_mm512_castsi512_si256(d_int)),
_MM_SHUFFLE(0, 2, 1, 3));
Packet8i ab_int16_high = _mm256_permute4x64_epi64(
_mm256_packs_epi32(_mm512_extracti32x8_epi32(a_int, 1),
_mm512_extracti32x8_epi32(b_int, 1)),
_MM_SHUFFLE(0, 2, 1, 3));
Packet8i cd_int16_high = _mm256_permute4x64_epi64(
_mm256_packs_epi32(_mm512_extracti32x8_epi32(c_int, 1),
_mm512_extracti32x8_epi32(d_int, 1)),
_MM_SHUFFLE(0, 2, 1, 3));
Packet8i abcd_int8_low = _mm256_permute4x64_epi64(
_mm256_packs_epi16(ab_int16_low, cd_int16_low), _MM_SHUFFLE(0, 2, 1, 3));
Packet8i abcd_int8_high =
_mm256_permute4x64_epi64(_mm256_packs_epi16(ab_int16_high, cd_int16_high),
_MM_SHUFFLE(0, 2, 1, 3));
return _mm512_inserti32x8(_mm512_castsi256_si512(abcd_int8_low),
abcd_int8_high, 1);
#endif

This can best be seen by using a test with inputs such that the output should be an ordered sequence of 120 numbers which I broke down into lines of 8.

AVX512BW path:

0   1   2   3  16  17  18  19  
32  33  34  35  48  49  50  51   
4   5   6   7  20  21  22  23  
36  37  38  39  52  53  54  55   
8   9  10  11  24  25  26  27  
40  41  42  43  56  57  58  59  
12  13  14  15  28  29  30  31  
44  45  46  47  60  61  62  63  
64  65  66  67  68  69  70  71  
72  73  74  75  76  77  78  79  
80  81  82  83  84  85  86  87  
88  89  90  91  92  93  94  95  
96  97  98  99 100 101 102 103 
104 105 106 107 108 109 110 111 
112 113 114 115 116 117 118 119

Fallback:

36  37  38  39  32  33  34  35  
52  53  54  55  48  49  50  51   
4   5    6   7   0   1   2   3  
20  21  22  23  16  17  18  19  
44  45  46  47   40  41  42  43  
60  61  62  63  56  57  58  59  
12  13  14  15   8   9   10  11  
28  29  30  31  24  25  26  27  
64  65  66  67  68  69  70  71   
72  73  74  75  76  77  78  79  
80  81  82  83  84  85  86  87  
88  89   90  91  92  93  94  95  
96  97  98  99 100 101 102 103 
104 105 106 107  108 109 110 111 
112 113 114 115 116 117 118 119

See tensorflow/tensorflow#49944 (comment) for a solution to this specific code although other code paths are likely affected too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant