Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TESTS][Vega][WORKAROUND] test_conv_embed_db fails with ROCm 4.4 tuning #1161

Closed
2 tasks
junliume opened this issue Sep 18, 2021 · 5 comments · Fixed by #1164
Closed
2 tasks

[TESTS][Vega][WORKAROUND] test_conv_embed_db fails with ROCm 4.4 tuning #1161

junliume opened this issue Sep 18, 2021 · 5 comments · Fixed by #1164

Comments

@junliume
Copy link
Contributor

junliume commented Sep 18, 2021

[Symptom]
test_conv_embed_db fails with ROCm 4.4 tuning on the following stage:
http://micimaster.amd.com/blue/organizations/jenkins/MLLibs%2FMIOpen/detail/tuning-rocm-4.4-fordev/12/pipeline
http://micimaster.amd.com/blue/organizations/jenkins/MLLibs%2FMIOpen/detail/tuning-rocm-4.4-fordev/13/pipeline
http://micimaster.amd.com/blue/organizations/jenkins/MLLibs%2FMIOpen/detail/tuning-rocm-4.4-fordev/14/pipeline

  • Fp32 Hip Debug Embedded Vega20

[Root Cause]
unknown yet, not reproducible on local tests

[Plan]
Workaround to unblock performance tuning updates

  • test_conv_embed_db DISABLED on all VEGA platforms
@atamazov
Copy link
Contributor

atamazov commented Sep 18, 2021

It seems like this issue is an "extension" of #874.

The CI logs show that some configs for ConvHipImplicitGemmV4R4Fwd are missing from the perf-db. The proper workaround should only disable this solver for the failing test cases instead of disabling the whole test (see #875 for example)

@junliume
Copy link
Contributor Author

It seems like this issue is an "extension" of #874.

The CI logs show that some configs for ConvHipImplicitGemmV4R4Fwd are missing from perf-db. The proper workaround should only disable this solver for the failing test cases instead of disabling the whole test (see #875 for example)

The anticipation is that we will get this issue fixed within the next two weeks

@atamazov
Copy link
Contributor

Note that if ConvHipImplicitGemmV4R4Fwd configs are really missing from perf-db (i.e. this issue is not specific to the "embedded" configuration of the library), then we most likely have performance drop. Why: CI logs show that this Solver is the fastest one, and tuning can make it faster.

@junliume
Copy link
Contributor Author

Note that if ConvHipImplicitGemmV4R4Fwd configs are really missing from perf-db (i.e. this issue is not specific to the "embedded" configuration of the library), then we most likely have performance drop. Why: CI logs show that this Solver is the fastest one, and tuning can make it faster.

@JehandadKhan hopefully we can fix this issue with the coming 4.5 tuning?

@JehandadKhan
Copy link
Contributor

@atamazov and @junliume I did tune the solver and add those entries to the db however, was unable to fix the issue when the test runs in the CI. I need to investigate further and figure out if I am doing the right thing.

The tuning PR going through staging would inform us whether there is perf regression elsewhere or not, if discovered we can cover it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants