Skip to content

Conversation

@LeiWang1999
Copy link
Member

CPU support introduced a flag skip thread binding into LayoutInference Pass, while it depends on buffer allocations to detect whether it's a cuda like device, which is not efficient and may introduce some bugs.

This pull request includes significant changes to the format.sh and src/transform/layout_inference.cc files, focusing on removing clang-tidy support and refactoring the layout inference logic. Below are the most important changes:

Removal of clang-tidy support:

  • format.sh: Removed the entire section related to clang-tidy checks, including the installation check, function definitions, and script logic for running clang-tidy on files.

Refactoring layout inference logic:

  • src/transform/layout_inference.cc: Replaced the AllocateCollector class with ThreadBindingCollector to focus on collecting thread bindings instead of memory allocations. Removed functions related to shared memory and local fragment checks.
  • src/transform/layout_inference.cc: Updated the LayoutInference function to use ThreadBindingCollector and simplified the logic to determine if thread partitioning should be skipped based on the presence of thread bindings.

@LeiWang1999 LeiWang1999 merged commit 579c3a4 into main Jan 23, 2025
5 checks passed
@LeiWang1999 LeiWang1999 deleted the fix_cuda_thread_binding branch February 12, 2025 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants