Commit b085142
PR tensorflow#6599: Fp8 Fast Accumulation support for cublasLt
Imported from GitHub PR openxla/xla#6599
FP8 cublasLt matmul uses fast accumulation when both operands' precision are DEFAULT. Otherwise fall back to high precision acuumulation. Issue#openxla/xla#6168
This PR is closely related to Flax PR-.
Copybara import of the project:
--
a4140da8ca08cd2d4796a7b8f032827867a361bc by shuw <shuw@nvidia.com>:
Add FP8 fast accumulation support for cublasLt.
--
96845683cc4b1e7b947bc919fbf97d8865abeac9 by shuw <shuw@nvidia.com>:
Improve based on review #1
--
e906d7620780d2cf1fe8433c933648dcb98dc61d by shuw <shuw@nvidia.com>:
Improve based on review #2
Merging this change closes tensorflow#6599
PiperOrigin-RevId: 5789485931 parent f2bed49 commit b085142
File tree
3 files changed
+44
-3
lines changed- third_party/xla/xla
- service/gpu/tests
- stream_executor/cuda
3 files changed
+44
-3
lines changedLines changed: 32 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6940 | 6940 | | |
6941 | 6941 | | |
6942 | 6942 | | |
| 6943 | + | |
| 6944 | + | |
| 6945 | + | |
| 6946 | + | |
| 6947 | + | |
| 6948 | + | |
| 6949 | + | |
| 6950 | + | |
| 6951 | + | |
| 6952 | + | |
| 6953 | + | |
| 6954 | + | |
| 6955 | + | |
| 6956 | + | |
| 6957 | + | |
| 6958 | + | |
| 6959 | + | |
| 6960 | + | |
| 6961 | + | |
| 6962 | + | |
| 6963 | + | |
| 6964 | + | |
| 6965 | + | |
| 6966 | + | |
| 6967 | + | |
| 6968 | + | |
| 6969 | + | |
| 6970 | + | |
| 6971 | + | |
| 6972 | + | |
| 6973 | + | |
| 6974 | + | |
6943 | 6975 | | |
6944 | 6976 | | |
6945 | 6977 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
191 | 191 | | |
192 | 192 | | |
193 | 193 | | |
194 | | - | |
| 194 | + | |
| 195 | + | |
195 | 196 | | |
196 | 197 | | |
197 | 198 | | |
| |||
210 | 211 | | |
211 | 212 | | |
212 | 213 | | |
| 214 | + | |
| 215 | + | |
213 | 216 | | |
214 | 217 | | |
215 | 218 | | |
| |||
315 | 318 | | |
316 | 319 | | |
317 | 320 | | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
318 | 327 | | |
319 | 328 | | |
320 | 329 | | |
321 | 330 | | |
322 | | - | |
| 331 | + | |
323 | 332 | | |
324 | 333 | | |
325 | 334 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
73 | | - | |
| 73 | + | |
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| |||
0 commit comments