Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[matrix_transpose_naive]
GPU: 0.00571667+-0.000932589 s
GPU: 2934.79 millions/s
[matrix_transpose_local_bad_banks]
GPU: 0.005+-0 s
GPU: 3355.44 millions/s
[matrix_transpose_local_good_banks]
GPU: 0.0049+-0.0003 s
GPU: 3423.92 millions/s
[naive, ts=4]
GPU: 0.0798333+-0.000687184 s
GPU: 25.0522 GFlops
Average difference: 0.000196008%
[naive, ts=8]
GPU: 0.0451667+-0.000372678 s
GPU: 44.2804 GFlops
Average difference: 0.000196008%
[naive, ts=16]
GPU: 0.0263333+-0.000471405 s
GPU: 75.9494 GFlops
Average difference: 0.000196008%
[local, ts=4]
GPU: 0.0453333+-0.000471405 s
GPU: 44.1176 GFlops
Average difference: 0.000196008%
[local, ts=8]
GPU: 0.0173333+-0.000471405 s
GPU: 115.385 GFlops
Average difference: 0.000196008%
[local, ts=16]
GPU: 0.0221667+-0.000372678 s
GPU: 90.2256 GFlops
Average difference: 0.000196008%
[local wpt, ts=4, wpt=2]
GPU: 0.072+-9.31323e-10 s
GPU: 27.7778 GFlops
Average difference: 0.000196008%
[local wpt, ts=4, wpt=4]
GPU: 0.0911667+-0.000372678 s
GPU: 21.9378 GFlops
Average difference: 0.000196008%
[local wpt, ts=8, wpt=2]
GPU: 0.017+-0 s
GPU: 117.647 GFlops
Average difference: 0.000196008%
[local wpt, ts=8, wpt=4]
GPU: 0.019+-0 s
GPU: 105.263 GFlops
Average difference: 0.000196008%
[local wpt, ts=8, wpt=8]
GPU: 0.033+-0 s
GPU: 60.6061 GFlops
Average difference: 0.000196008%
[local wpt, ts=16, wpt=2]
GPU: 0.0176667+-0.000471405 s
GPU: 113.208 GFlops
Average difference: 0.000196008%
[local wpt, ts=16, wpt=4]
GPU: 0.017+-0 s
GPU: 117.647 GFlops
Average difference: 0.000196008%
[local wpt, ts=16, wpt=8]
GPU: 0.017+-0 s
GPU: 117.647 GFlops
Average difference: 0.000196008%
[local wpt, ts=16, wpt=16]
GPU: 0.0166667+-0.000471405 s
GPU: 120 GFlops
Average difference: 0.000196008%
Выводы:
В транспонировании есть матриц есть прирост порядка 14%
В перемножении матриц есть прирост порядка 20% если сравнивать первую версию и вторую, прирост порядка 100-250%