Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Use onednn impl for dynamic gemm #27212

Merged

Conversation

Lyamin-Roman
Copy link
Contributor

Details:

  • Performance improvement for LoRA

@Lyamin-Roman Lyamin-Roman added the category: GPU OpenVINO GPU plugin label Oct 23, 2024
@Lyamin-Roman Lyamin-Roman requested review from a team as code owners October 23, 2024 18:42
OV_GPU_GET_INSTANCE_OCL(gemm, shape_types::dynamic_shape)
OV_GPU_GET_INSTANCE_OCL(gemm, shape_types::dynamic_shape,
[](const program_node& node) {
return !node.can_use(impl_types::onednn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does oneDNN support efficient kernels caching for gemm? If not, this could cause runtime kernel recompilation and drop performance in some cases. This change probably requires wider performance check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked on LNL with qwen2 and llama3, no performance drops were detected (with and w/o sdpa)

@Lyamin-Roman Lyamin-Roman force-pushed the use_gemm_onednn_impl branch 3 times, most recently from cb044a5 to 6025088 Compare October 31, 2024 00:32
@vladimir-paramuzov vladimir-paramuzov added this pull request to the merge queue Oct 31, 2024
Merged via the queue into openvinotoolkit:master with commit a6a113c Oct 31, 2024
150 checks passed
@Lyamin-Roman Lyamin-Roman deleted the use_gemm_onednn_impl branch October 31, 2024 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GPU OpenVINO GPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants