Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Singnificat performance drop in when using miopen-opencl vs miopen-hip #65

Closed
artyom-beilis opened this issue Nov 23, 2018 · 3 comments
Closed

Comments

@artyom-beilis
Copy link

I found that there sometimes 2 orders of magnitude difference between miopen-opencl and miopen-hip in some cases:

  • Using stride, for example for Input 16:256:14:14, Output 16:512:7:7 Kernel=3, Stride=2, Pad=1 forward convolution using miopen-hip takes: 2.75 ms while same with miopen-openl 19.16ms
  • Convolutions with kernel size = 1: Input 16:256:14:14, Output 16:512:14:14 Kernel=1, Stride=1, Pad=0` hip gives 1.26ms while opencl gives 195ms(!)

Setup:

CPU: Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
GPU: rx560 /gfx803 16 compute units 1196MHz clock
ROCm: 1.9
miopen - latest git.

@artyom-beilis
Copy link
Author

Additional information I notices:

While most measurements using MIOpen-opencl return ~190ms time once in a while it gives same short execution as MIOpen-hip i.e. ~1.3ms - that suggest it related somehow to timing. Note, that I do warm-up before starting actual measurements and the selected algorithm is always (1) miopenConvolutionFwdAlgoDirect regardless short or long measurement result.

Artyom

@atamazov
Copy link
Contributor

atamazov commented Nov 26, 2018

(1) Do you have miopengemm installed? AFAIR MIOpen-opencl uses it for GEMM algorithms while MIOpen-hip uses rocBLAS. Absence of miopengemm may lead to perf degradations of MIOpen-opencl for convolutions where GEMM algorithm is optimal.

(2) We do not supply performance database for devices with 16 compute units. As your device has 16CUs, you need to auto-tune your MIOpen installation(s). More details in https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/doc/src/perfdatabase.md

(3) If the above do not resolve the issue you are seeing, please attach console logs from both MIOpen-opencl and MIOpen-hip obtained with export MIOPEN_LOG_LEVEL=5.

@dagamayank
Copy link
Contributor

Closing this issue due to inactivity. Also, differences between HIP and OpenCL is expected. We are working to resolve this but with no clear ETA at this time.

cderb added a commit that referenced this issue Nov 21, 2022
49e3e3a62 clang format
db80b1777 update to using TestPerfCfgParams for pdb validity checks
e48a4fd3a format
a4f85842c exception for non-tunable solvers in params check
d58c42bbd Check params at end of perf tuning (#70)
1a3b47c7b Return status for failed compile commands (#69)
d59962752 out_layout -> in_layout
6ba7a8f3f Rename conv_mode to mode (#64)
513a3da1b [bg/LWPTUNA-173] (#65)
e05dcb421 perf db validation fix (#68)
260d9465d Add INT8 as a data_type v2 (#67)
b6a5b2a77 sync with fin folder in miopen (#62)
0e03399ec prep for Palamida scan (#63)
e6bd05c33 Performance db testing (#61)
30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: 49e3e3a62a7cc54adacbeea95680d35f9a4685de
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants