[Dlight][CPU] Add CPU Backend Support for GEMV Optimization #17663

mengshyu · 2025-02-17T22:30:29Z

This PR adds Dlight CPU support with optimized GEMV scheduling, including pattern detection, loop tiling, vectorization, and parallel execution. It improves maintainability by refining target checks, reduction handling, and scheduling logic.

CPU: AMD Ryzen 9 7950X 16-Core Processor
MODEL: Qwen2-0.5B-q4f16_1-MLC
Prompt: What is the meaning of life?

Results:
Baseline:
prompt_tokens=27 completion_tokens=235 total_tokens=262 extra={'prompt_tokens': 27, 'completion_tokens': 235, 'prefill_tokens': 27, 'decode_tokens': 234, 'jump_forward_tokens': 0, 'prefill_tokens_per_s': 0.9777329325367138,
'decode_tokens_per_s': 0.558195154052001,
'end_to_end_latency_s': 446.823128383, 'ttft_s': 27.614902906, 'inter_token_latency_s': 1.9013750143957446}

Optimized:
usage: prompt_tokens=27 completion_tokens=227 total_tokens=254 extra={'prompt_tokens': 27, 'completion_tokens': 227, 'prefill_tokens': 27, 'decode_tokens': 226, 'jump_forward_tokens': 0, 'prefill_tokens_per_s': 1.0010420333327994,
'decode_tokens_per_s': 2.9349053824023454,
'end_to_end_latency_s': 103.976080401, 'ttft_s': 26.971894387, 'inter_token_latency_s': 0.4580444070528635}

This PR adds Dlight CPU support with optimized GEMV scheduling, including pattern detection, loop tiling, vectorization, and parallel execution. It improves maintainability by refining target checks, reduction handling, and scheduling logic. CPU: AMD Ryzen 9 7950X 16-Core Processor MODEL: Qwen2-0.5B-q4f16_1-MLC Prompt: What is the meaning of life? Results: Baseline: prompt_tokens=27 completion_tokens=235 total_tokens=262 extra={'prompt_tokens': 27, 'completion_tokens': 235, 'prefill_tokens': 27, 'decode_tokens': 234, 'jump_forward_tokens': 0, 'prefill_tokens_per_s': 0.9777329325367138, 'decode_tokens_per_s': 0.558195154052001, 'end_to_end_latency_s': 446.823128383, 'ttft_s': 27.614902906, 'inter_token_latency_s': 1.9013750143957446} Optimized: usage: prompt_tokens=27 completion_tokens=227 total_tokens=254 extra={'prompt_tokens': 27, 'completion_tokens': 227, 'prefill_tokens': 27, 'decode_tokens': 226, 'jump_forward_tokens': 0, 'prefill_tokens_per_s': 1.0010420333327994, 'decode_tokens_per_s': 2.9349053824023454, 'end_to_end_latency_s': 103.976080401, 'ttft_s': 26.971894387, 'inter_token_latency_s': 0.4580444070528635}

tqchen · 2025-02-18T12:42:20Z

cc @Hzfengsy can you help to take a look, also cc @tlopex

Hzfengsy · 2025-02-19T02:33:30Z

Also cc @HongHongHongL

Hzfengsy · 2025-02-19T02:34:49Z

python/tvm/dlight/cpu/gemv.py

+    return buffer_store.value.b
+
+
+def is_gemv(sch: tir.Schedule, block_info: BlockInfo) -> Optional[List[tir.Buffer]]:


Can we reuse gpu's util functions?

saying that we can create a folder named something like "analysis" or "utils" under dlight folder, for different backends.

i agree this is a good suggestion, dlight.analysis sounds right

Hi @Hzfengsy, I've created a folder analysis to ensure CPU and GPU backends reuse shared logic for GEMV, could you recheck it, thanks.

Hzfengsy · 2025-02-19T02:35:18Z

python/tvm/dlight/cpu/gemv.py

+    return ret if 0 < len(ret) < len(block_stmt.reads) else None
+
+
+def normalize(  # pylint: disable=too-many-locals, use-a-generator


Maybe we can reuse this one as well

tqchen · 2025-02-21T14:36:46Z

cc @Hzfengsy for another look

Hzfengsy · 2025-02-22T04:38:31Z

python/tvm/dlight/cpu/gemv.py

+        if not isinstance(func, tir.PrimFunc) or not self.is_target_available(target):
+            return None
+        sch = tir.Schedule(func)
+        sch = tir.Schedule(func)


Hzfengsy · 2025-02-22T04:41:33Z

python/tvm/dlight/cpu/utils.py

+    return loop.extent.value if isinstance(loop.extent, tir.IntImm) else loop.extent
+
+
+def auto_vectorize(sch: tir.Schedule, loop: tir.schedule.LoopRV, max_vec: int):


Is there any reason to keep another CPU copy? as there is a same file at python/tvm/dlight/gpu/utils.py

mengshyu · 2025-02-24T20:06:30Z

Hi @Hzfengsy , I've removed the duplicated definition and file, could you take a look, thanks.

Hzfengsy · 2025-02-25T03:04:05Z

Thanks @mengshyu!

…7663) * [Dlight][CPU] Add CPU Backend Support for GEMV Optimization This PR adds Dlight CPU support with optimized GEMV scheduling, including pattern detection, loop tiling, vectorization, and parallel execution. It improves maintainability by refining target checks, reduction handling, and scheduling logic. CPU: AMD Ryzen 9 7950X 16-Core Processor MODEL: Qwen2-0.5B-q4f16_1-MLC Prompt: What is the meaning of life? Results: Baseline: prompt_tokens=27 completion_tokens=235 total_tokens=262 extra={'prompt_tokens': 27, 'completion_tokens': 235, 'prefill_tokens': 27, 'decode_tokens': 234, 'jump_forward_tokens': 0, 'prefill_tokens_per_s': 0.9777329325367138, 'decode_tokens_per_s': 0.558195154052001, 'end_to_end_latency_s': 446.823128383, 'ttft_s': 27.614902906, 'inter_token_latency_s': 1.9013750143957446} Optimized: usage: prompt_tokens=27 completion_tokens=227 total_tokens=254 extra={'prompt_tokens': 27, 'completion_tokens': 227, 'prefill_tokens': 27, 'decode_tokens': 226, 'jump_forward_tokens': 0, 'prefill_tokens_per_s': 1.0010420333327994, 'decode_tokens_per_s': 2.9349053824023454, 'end_to_end_latency_s': 103.976080401, 'ttft_s': 26.971894387, 'inter_token_latency_s': 0.4580444070528635} * lint * Add unit test * Refactor analysis and scheduling utilities * lint * Fix duplicated schedule creation and utils.py

mengshyu force-pushed the 0217-dlightcpu branch from 6bb3b44 to 34b4466 Compare February 18, 2025 03:40

lint

e09b152

mengshyu force-pushed the 0217-dlightcpu branch from 34b4466 to e09b152 Compare February 18, 2025 03:48

tqchen assigned Hzfengsy and MasterJH5574 Feb 18, 2025

Add unit test

3314c61

Hzfengsy reviewed Feb 19, 2025

View reviewed changes

mengshyu added 2 commits February 19, 2025 12:26

Refactor analysis and scheduling utilities

1da944c

lint

33b406b

mengshyu force-pushed the 0217-dlightcpu branch from abc8ad4 to 33b406b Compare February 19, 2025 17:35

Hzfengsy reviewed Feb 22, 2025

View reviewed changes

Fix duplicated schedule creation and utils.py

6180d2a

Hzfengsy approved these changes Feb 25, 2025

View reviewed changes

Hzfengsy merged commit 3c2a1ab into apache:main Feb 25, 2025
10 checks passed

ysh329 mentioned this pull request Apr 19, 2025

[Release] v0.20.0 Release Candidate Notes #17860

Closed

kurisu6912 mentioned this pull request Sep 5, 2025

kurisu add assume attr patch 1 tile-ai/tvm#8

Closed

		return buffer_store.value.b


		def is_gemv(sch: tir.Schedule, block_info: BlockInfo) -> Optional[List[tir.Buffer]]:

		return ret if 0 < len(ret) < len(block_stmt.reads) else None


		def normalize( # pylint: disable=too-many-locals, use-a-generator

		return loop.extent.value if isinstance(loop.extent, tir.IntImm) else loop.extent


		def auto_vectorize(sch: tir.Schedule, loop: tir.schedule.LoopRV, max_vec: int):

[Dlight][CPU] Add CPU Backend Support for GEMV Optimization #17663

[Dlight][CPU] Add CPU Backend Support for GEMV Optimization #17663

Uh oh!

Conversation

mengshyu commented Feb 17, 2025

Uh oh!

tqchen commented Feb 18, 2025

Uh oh!

Hzfengsy commented Feb 19, 2025

Uh oh!

Hzfengsy Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

Hzfengsy Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

tqchen Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

mengshyu Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

Hzfengsy Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

tqchen commented Feb 21, 2025

Uh oh!

Hzfengsy Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

Hzfengsy Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

mengshyu commented Feb 24, 2025

Uh oh!

Uh oh!

Hzfengsy commented Feb 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants