[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype #27472

fadara01 · 2025-10-24T10:22:08Z

[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype

Makes weight dtype part of the cache key for ClassMatmulCacheKey to avoid having 2 onednn_mm(s) with same src/weight dimensions and different dtypes mapped to the same dnnl::matmul primitive

Fixes: #27465

Purpose

Fixes: #27465

Test Plan

Reproducer in #27465

Test Result

Reproducer in #27465 passes

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request correctly addresses a crash in onednn_mm that occurs during consecutive matrix multiplications with identical dimensions but different data types. The fix, which incorporates the weight dtype into the ClassMatmulCacheKey, is sound and properly implemented across the cache key definition, hash function, and equality operator. My review includes two suggestions to further improve the code: one enhances the robustness of the hash function to minimize collisions and maintain performance, and the other improves code clarity by renaming a shadowed variable.

gemini-code-assist · 2025-10-24T10:24:09Z

csrc/cpu/dnnl_helper.cpp

+    return hash<dnnl_dim_t>()(val.b_n_size) ^ hash<dnnl_dim_t>()(val.b_k_size) ^
+      hash<int>()(static_cast<int>(val.b_type));


Using XOR ^ to combine hash values can lead to a higher rate of collisions, which can degrade the performance of the hash map. A more robust approach is to use a method that mixes the bits more thoroughly, such as a polynomial rolling hash. This is a common practice, for example in Java's hashCode implementation. This principle also applies to the other hash specializations in this file.

Suggested change

return hash<dnnl_dim_t>()(val.b_n_size) ^ hash<dnnl_dim_t>()(val.b_k_size) ^

hash<int>()(static_cast<int>(val.b_type));

return (std::hash<dnnl_dim_t>()(val.b_n_size) * 31 + std::hash<dnnl_dim_t>()(val.b_k_size)) * 31 + std::hash<int>()(static_cast<int>(val.b_type));

this is something for another PR, let's keep the scope of this PR clear.

gemini-code-assist · 2025-10-24T10:24:09Z

csrc/cpu/dnnl_helper.cpp

+    ClassMatmulCacheKey key = {.b_n_size = b_n_size_, .b_k_size = b_k_size_, .b_type = b_type_};
    m_size_cache_ = get_matul_class_primitive_cache(key, primitive_cache_size_);


The local variable key declared at line 497 shadows the function parameter key. This is confusing and error-prone, as it's not immediately obvious which key is being used in subsequent calls (get_matul_class_primitive_cache vs m_size_cache_->get_or_create). It's better to use a more descriptive name for the local cache key to avoid shadowing.

Suggested change

ClassMatmulCacheKey key = {.b_n_size = b_n_size_, .b_k_size = b_k_size_, .b_type = b_type_};

m_size_cache_ = get_matul_class_primitive_cache(key, primitive_cache_size_);

ClassMatmulCacheKey class_key = {.b_n_size = b_n_size_, .b_k_size = b_k_size_, .b_type = b_type_};

m_size_cache_ = get_matul_class_primitive_cache(class_key, primitive_cache_size_);

fadara01 · 2025-10-24T10:26:44Z

@bigPYJ1151 this should fix the crash reported in #27465 and #27244

… and different dtype Makes weight dtype part of the cache key for `ClassMatmulCacheKey` to avoid having 2 onednn_mm(s) with same src/weight dimensions and different dtypes mapped to the same dnnl::matmul primitive Fixes: vllm-project#27465 Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>

bigPYJ1151

Thanks for the fix :)

… and different dtype (vllm-project#27472) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>

… and different dtype (vllm-project#27472) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

fadara01 requested a review from bigPYJ1151 as a code owner October 24, 2025 10:22

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

fadara01 force-pushed the fix_onednn_mm_crash branch from 953eb68 to fa9f0e6 Compare October 24, 2025 10:24

fadara01 force-pushed the fix_onednn_mm_crash branch from fa9f0e6 to b3ae6d4 Compare October 24, 2025 10:35

fadara01 force-pushed the fix_onednn_mm_crash branch from b3ae6d4 to 3651fa6 Compare October 24, 2025 10:51

fadara01 mentioned this pull request Oct 24, 2025

[CPU]Improve cpu fused moe perf #27244

Merged

bigPYJ1151 approved these changes Oct 24, 2025

View reviewed changes

bigPYJ1151 enabled auto-merge (squash) October 24, 2025 12:12

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 24, 2025

fadara01 added 2 commits October 24, 2025 13:23

Merge branch 'main' into fix_onednn_mm_crash

1f8e485

Merge branch 'main' into fix_onednn_mm_crash

83c9a55

bigPYJ1151 merged commit 2080b05 into vllm-project:main Oct 24, 2025
21 checks passed

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025

[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N…

09f43d3

… and different dtype (vllm-project#27472) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>

rohin-garg pushed a commit to rohin-garg/vllm that referenced this pull request Oct 25, 2025

[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N…

3e86001

… and different dtype (vllm-project#27472) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype #27472

[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype #27472

Uh oh!

fadara01 commented Oct 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 24, 2025

Uh oh!

fadara01 Oct 24, 2025

Uh oh!

gemini-code-assist bot Oct 24, 2025

Uh oh!

fadara01 Oct 24, 2025

Uh oh!

fadara01 commented Oct 24, 2025

Uh oh!

bigPYJ1151 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return hash<dnnl_dim_t>()(val.b_n_size) ^ hash<dnnl_dim_t>()(val.b_k_size) ^
		hash<int>()(static_cast<int>(val.b_type));

	return hash<dnnl_dim_t>()(val.b_n_size) ^ hash<dnnl_dim_t>()(val.b_k_size) ^
	hash<int>()(static_cast<int>(val.b_type));
	return (std::hash<dnnl_dim_t>()(val.b_n_size) * 31 + std::hash<dnnl_dim_t>()(val.b_k_size)) * 31 + std::hash<int>()(static_cast<int>(val.b_type));

		ClassMatmulCacheKey key = {.b_n_size = b_n_size_, .b_k_size = b_k_size_, .b_type = b_type_};
		m_size_cache_ = get_matul_class_primitive_cache(key, primitive_cache_size_);

Uh oh!

[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype #27472

[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype #27472

Uh oh!

Conversation

fadara01 commented Oct 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 commented Oct 24, 2025

Uh oh!

bigPYJ1151 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fadara01 commented Oct 24, 2025 •

edited by github-actions bot

Loading