You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Language] Support T.annotate_l2_hit_ratio via cudaStreamSetAttribute (#539)
* Refactor OptimizeForTarget function by removing redundant buffer allocation step and cleaning up code
* Removed the PlanAndUpdateBufferAllocationLocation step from the OptimizeForTarget function to streamline the optimization process.
* Cleaned up unnecessary whitespace in the function for improved readability.
* Enhanced the overall clarity and maintainability of the code.
* Refactor AllocateNode handling in vectorize_loop.cc
* Simplified the VisitStmt_ method for AllocateNode by removing the complex extent mutation logic.
* Streamlined the allocation process to directly call the base class method, enhancing code clarity and maintainability.
* Improved overall readability by eliminating unnecessary comments and code related to extent handling.
* Remove `tl_kernel.c` file, eliminating the backward kernel implementation and associated error handling functions. This cleanup enhances code maintainability by removing unused components related to the backward kernel processing.
* Add buffer allocation planning step in OptimizeForTarget function
* Introduced the PlanAndUpdateBufferAllocationLocation step to the OptimizeForTarget function, enhancing the optimization process.
* This addition improves the overall efficiency of buffer allocation during the target optimization phase, ensuring better resource management.
* Update submodule TVM to latest commit db50d4e, ensuring alignment with upstream changes.
* Add L2 persistent annotation support and related functionality
* Introduced a new file `lower_l2_persistent_annotation.cc` to handle the lowering of L2 persistent annotations.
* Added functions to annotate L2 hit ratios for buffers, ensuring compatibility with global buffer requirements.
* Updated the `LowerAndLegalize` function to include the new L2 persistent map lowering step.
* Enhanced CUDA driver with a function to retrieve the maximum size of the persisting L2 cache.
* Modified the `TLCUDASourceWrapper` class to integrate L2 persistent map handling during kernel launches.
These changes improve the framework's ability to manage L2 cache optimizations, enhancing performance for CUDA applications.
* lint fix
0 commit comments