Fix Device Consistency in Autotuner Threads and Add Manual Profiler Check #481
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#PR: Fix Device Consistency in Autotuner Threads and Add Manual Profiler Check
Summary
This pull request addresses two important improvements to tilelang.autotuner:
Problem 1: Device Inconsistency in Autotuner
When using ThreadPoolExecutor for parallel compilation in the autotuner, each worker thread might use a different CUDA device than the main thread. This inconsistency can lead to:
tilelang.utils.tensor.get_tensor_supplyin each threads with be incuda:0, instead oftorch.cuda.current_device()of the main process.Solution 1: Thread Device Synchronization
I've modified the autotuner to explicitly set the CUDA device in each worker thread to match the main thread:
Problem 2: Limited Profiler Debugging Options
The current profiler lacks direct manual inspection capability, making it difficult to check the difference of ref_out and lib_out manully, e.g.
Solution 2: Manual Profiler Check
I've added a new
manual_check_progfeature to the profiler that allows developers to check diff of ref_out and lib_out manually. This feature enhances the developer experience by providing more granular control over the inspection process.Implementation Details
Device Consistency Changes (
tilelang/autotuner/__init__.py):Manual Profiler Check (
tilelang/profiler/__init__.py):