Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique producers/consumers ObjFifos #1060

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

Abhishek-Varma
Copy link
Contributor

@Abhishek-Varma Abhishek-Varma commented Jan 27, 2025

In order to decide the split factor we were basing the inference solely on the number of columns available.
Because of this, for 4x8 array and for the new pipeline, we were splitting L2 buffers :-

  1. LHS - 8 times.
  2. RHS - 8 times.
  3. OUT - 8 times.

As a result, the tiles being assigned were :-

  1. LHS : (0,0) -> (7,0)
  2. RHS : (0,0) -> (7,0)
  3. OUT : (0,0) -> (7,0)

This causes exhaustion of DMA channels. Refer to this thread for the discussion thread.

The L2 buffer split should be :-

  1. LHS : (0,0) -> (3,0)
  2. RHS : (0,0) -> (7,0)
  3. OUT : (0,0) -> (7,0)

So that later on when the tiles are being assigned, the expected no. of tile assignments for LHS/RHS/OUT matches the corresponding L2 buffers.

This PR aims to analyse the number of unique producers/consumers ObjFifos for the ObjFifo being split..

e2e CI test for Matmul both with/without ukernel via pack-peel-4-level-tiling pipeline targeting 4x8 array on Strix have been added.

Signed-off-by: Abhishek Varma abhvarma@amd.com

@Abhishek-Varma Abhishek-Varma changed the title [DO NOT REVIEW] Fix split-logicalobjfifo pass to analyse unique L2<->L1 DMAs [SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique L2<->L1 DMAs Jan 27, 2025
@Abhishek-Varma Abhishek-Varma marked this pull request as ready for review January 27, 2025 17:37
@Abhishek-Varma Abhishek-Varma force-pushed the avarma_fix_split_lof_for_new_pipeline branch from def3a83 to 0904a94 Compare January 29, 2025 07:01
@Abhishek-Varma Abhishek-Varma requested a review from jtuyls January 29, 2025 11:06
@Abhishek-Varma Abhishek-Varma force-pushed the avarma_fix_split_lof_for_new_pipeline branch from 26ef401 to 1703271 Compare January 29, 2025 11:33
@Abhishek-Varma Abhishek-Varma requested a review from jtuyls January 29, 2025 14:48
@Abhishek-Varma Abhishek-Varma force-pushed the avarma_fix_split_lof_for_new_pipeline branch from 531ee50 to da1c62d Compare January 29, 2025 14:49
@Abhishek-Varma Abhishek-Varma force-pushed the avarma_fix_split_lof_for_new_pipeline branch from 2e57be9 to 8c3f762 Compare January 29, 2025 15:54
@Abhishek-Varma Abhishek-Varma requested a review from jtuyls January 29, 2025 15:54
Copy link
Contributor

@yzhang93 yzhang93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This revision is much more concise and cleaner. I have more comments on the test.

Comment on lines +526 to +527
#executable_target_amdaie_pdi_fb = #hal.executable.target<"amd-aie", "amdaie-pdi-fb", {num_cols = 8 : i32, num_rows = 4 : i32, target_device = "npu4", ukernels = "none"}>
#translation = #iree_codegen.translation_info<pipeline = Custom>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this test works as in your description num_cols=2 and num_rows=1, while here it's num_cols=8 and num_rows=4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the intention : how much to split is not only dependent on the number of available columns but also the unique L2<->L1 pairs -> so we use gcd between the two (and in some cases we might need to take gcd between source/target sizes)

"to keep the test case concise it demonstrates a similar splitting strategy for 1 row and 2 columns." means the actual compute is taking place on 1 row and 2 columns. I could change num_cols=2 and num_rows=1 but that'd defeat the purpose.

I've updated the comments.

@Abhishek-Varma Abhishek-Varma force-pushed the avarma_fix_split_lof_for_new_pipeline branch from 3594447 to 42868e6 Compare January 30, 2025 06:30
@Abhishek-Varma Abhishek-Varma requested a review from jtuyls January 30, 2025 10:23
Copy link
Collaborator

@jtuyls jtuyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one nit and also update the PR title to producers/consumers instead of L1/L2.

…SplitLogicalObjFifos.cpp

Co-authored-by: Jorn Tuyls <jtuyls@users.noreply.github.com>
@Abhishek-Varma Abhishek-Varma changed the title [SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique L2<->L1 DMAs [SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique producers/consumers DMAs Jan 30, 2025
@Abhishek-Varma Abhishek-Varma changed the title [SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique producers/consumers DMAs [SplitLogicalObjFifo] Fix split-logicalobjfifo pass to analyse unique producers/consumers ObjFifos Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants