Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Nov 22, 2025

This pull request refactors the kernel caching logic in tilelang/autotuner/param.py to improve atomicity, consistency, and backend support for saving and loading compiled kernels and related artifacts. It introduces safe file writing methods, aligns file naming conventions with the cache system, and enhances support for multiple execution backends. These changes help prevent data corruption, simplify cache management, and make backend-specific handling more robust.

Atomic and Safe File Operations

  • Added _safe_write_file and _safe_write_executable static methods to ensure atomic file writes, preventing partial writes and improving reliability during concurrent operations.
  • Refactored all file write operations in save_to_disk and kernel save/load routines to use these atomic methods for configs, functions, latencies, and parameters. [1] [2]

Consistent File Naming and Backend Support

  • Standardized file names for kernel artifacts (device_kernel.cu, host_kernel.cu, executable.so, etc.) to match the conventions in cache/kernel_cache.py, and made kernel library file selection backend-aware. [1] [2] [3]
  • Improved logic for saving and loading backend-specific files (e.g., NVRTC's .cubin and .py, TVM FFI's .so), including correct handling of host/device sources and executable export. [1] [2]

Kernel Load/Save API Improvements

  • Updated _load_kernel_from_disk and related calls to support new file naming, backend resolution, and compile flags, ensuring correct reconstruction of JITKernel objects from disk. [1] [2] [3]
  • Added normalization and resolution of target/backend in load_from_disk for robust handling of different kernel compilation scenarios.

These changes make the autotuner's disk caching more reliable, backend-aware, and maintainable.

Summary by CodeRabbit

  • Bug Fixes

    • Improved kernel persistence with atomic write operations to prevent data corruption.
    • Enhanced loading logic with target normalization and backend resolution for more reliable kernel recovery.
  • Improvements

    • Strengthened support for multiple execution backends with consistent artifact management.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 22, 2025

Walkthrough

The pull request refactors kernel persistence in the autotuner by introducing file-path constants, atomic write utilities, and reworked save/load logic to separately manage device and host kernels with backend-specific handling for nvrtc and tvm_ffi.

Changes

Cohort / File(s) Summary
File-path constants & atomic I/O utilities
tilelang/autotuner/param.py
Added file-path constants (DEVICE_KERNEL_PATH, HOST_KERNEL_PATH, EXECUTABLE_PATH, KERNEL_CUBIN_PATH, KERNEL_PY_PATH) and static methods (_load_binary, _safe_write_file, _safe_write_executable) for safe kernel artifact management.
Kernel persistence refactoring
tilelang/autotuner/param.py
Reworked save_to_disk and load_from_disk methods to use atomic writes, handle device/host kernel sources separately, resolve execution backend, and normalize target. Extended _load_kernel_from_disk signature with compile_flags parameter; added backend-specific pathways for nvrtc (.cubin + .py helper) and tvm_ffi (executable).

Sequence Diagram

sequenceDiagram
    participant User
    participant AutotuneResult
    participant FileSystem
    participant Backend

    rect rgb(230, 245, 255)
    Note over AutotuneResult,FileSystem: Save Flow (New)
    User->>AutotuneResult: save_to_disk()
    AutotuneResult->>AutotuneResult: _safe_write_file() for best_config
    AutotuneResult->>AutotuneResult: _safe_write_file() for function & latency
    alt nvrtc backend
        AutotuneResult->>AutotuneResult: _safe_write_file() for device_kernel.cu
        AutotuneResult->>AutotuneResult: _safe_write_file() for host_kernel.cu
        AutotuneResult->>FileSystem: Save kernel.cubin (binary)
        AutotuneResult->>FileSystem: Save kernel.py (helper)
    else tvm_ffi backend
        AutotuneResult->>AutotuneResult: _safe_write_file() for kernel sources
        AutotuneResult->>AutotuneResult: _safe_write_executable() for executable.so
    end
    AutotuneResult->>AutotuneResult: _safe_write_file() for parameters
    end

    rect rgb(245, 230, 255)
    Note over User,Backend: Load Flow (New)
    User->>AutotuneResult: load_from_disk()
    AutotuneResult->>AutotuneResult: Normalize target
    AutotuneResult->>Backend: Resolve execution backend
    AutotuneResult->>AutotuneResult: _load_kernel_from_disk(backend, target, compile_flags)
    AutotuneResult->>FileSystem: Load device_kernel.cu + host_kernel.cu
    AutotuneResult->>FileSystem: Load parameters
    alt nvrtc backend
        AutotuneResult->>FileSystem: Load kernel.cubin
    else tvm_ffi backend
        AutotuneResult->>FileSystem: Load executable.so
    end
    AutotuneResult-->>User: Return AutotuneResult instance
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Atomic write utilities: Verify correctness of _safe_write_file and _safe_write_executable implementations and their proper use in save pathways.
  • Backend-specific kernel persistence: Confirm that nvrtc and tvm_ffi pathways correctly differentiate between device/host kernel sources and produce expected artifact files.
  • Target normalization & backend resolution: Validate that load_from_disk correctly normalizes target and resolves the execution backend before kernel loading.
  • Method signature extensions: Ensure compile_flags parameter is properly threaded through _load_kernel_from_disk call chain and that all call sites are updated.

Poem

🐰 Kernels saved with atomic care,
Device and host now form a pair,
Backend paths diverge with grace,
nvrtc's cubin finds its place,
Safe writes guard each precious byte—
Persistence done just right! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title '[Bugfix] Fix autotune cache' is vague and generic, using the term 'Fix autotune cache' without explaining the specific nature of the fix (atomicity, file naming standardization, backend support, etc.). Consider a more specific title that describes the main technical improvement, such as '[Refactor] Improve kernel cache atomicity and backend support' or '[Bugfix] Add atomic writes and standardize kernel artifact file naming'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tilelang/autotuner/param.py (1)

302-316: Consider checking source file existence upfront.

The existence check at line 315 only verifies kernel_lib_path and params_path, but line 349 requires both device_kernel_source and host_kernel_source to be loaded. If source files are missing, the method will silently return None even though the kernel library exists.

Consider adding source files to the existence check for clearer early detection:

-        if not all([os.path.exists(file) for file in (kernel_lib_path, params_path)]):
+        required_files = [device_kernel_path, host_kernel_path, kernel_lib_path, params_path]
+        if not all([os.path.exists(file) for file in required_files]):
+            if verbose:
+                missing = [f for f in required_files if not os.path.exists(f)]
+                logger.debug(f"Cannot load kernel from {cache_path}: missing files {missing}")
             return None
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 470eb74 and cb3894a.

📒 Files selected for processing (1)
  • tilelang/autotuner/param.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/autotuner/param.py (4)
tilelang/cache/kernel_cache.py (3)
  • _load_binary (244-247)
  • _safe_write_file (250-257)
  • _safe_write_executable (260-263)
tilelang/jit/kernel.py (5)
  • export_library (613-630)
  • kernel_source (606-607)
  • params (602-603)
  • from_database (143-183)
  • out_idx (598-599)
tilelang/utils/target.py (1)
  • determine_target (62-123)
tilelang/jit/execution_backend.py (1)
  • resolve_execution_backend (62-100)
🪛 Ruff (0.14.5)
tilelang/autotuner/param.py

204-204: Do not catch blind exception: Exception

(BLE001)


219-219: Do not catch blind exception: Exception

(BLE001)


328-328: Do not catch blind exception: Exception

(BLE001)


337-337: Do not catch blind exception: Exception

(BLE001)


346-346: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Lint
🔇 Additional comments (12)
tilelang/autotuner/param.py (12)

27-34: LGTM! Constants properly aligned with cache layer.

The new file path constants standardize kernel artifact naming across the codebase and provide clear backend-specific identifiers.


153-157: LGTM! Binary loader implementation is correct.

The implementation matches the reference in kernel_cache.py and correctly reads binary files.


159-168: LGTM! Atomic write implementation is robust.

The use of temporary files with os.replace ensures atomic writes and prevents partial file corruption during concurrent operations. The explicit makedirs for the temp directory adds good defensive programming.


170-176: LGTM! Executable export implementation is correct.

The method safely exports TVM executables using atomic file operations, correctly matching the pattern established in kernel_cache.py.


196-220: LGTM! Device and host kernel source saving is well-implemented.

The atomic writes and backend-specific host source selection (TVM FFI uses get_host_source(), others use get_kernel_source()) correctly align with the framework's architecture.


261-268: LGTM! Parameter serialization uses atomic writes.

The kernel parameters are now safely persisted using atomic write operations.


270-278: LGTM! Signature correctly extended with compile_flags.

The compile_flags parameter addition aligns with the JITKernel.from_database signature and enables proper kernel reconstruction from cache.


318-348: LGTM! Source loading correctly handles device and host kernels.

The separate loading of device and host kernel sources with appropriate error handling enables proper reconstruction of backend-specific kernels.


349-364: LGTM! Kernel reconstruction correctly requires all artifacts.

The strict check ensuring both source files and parameters are loaded is appropriate, as JITKernel.from_database requires all these inputs.


366-395: LGTM! All config and metadata writes are now atomic.

Converting all file writes in save_to_disk to use _safe_write_file prevents partial writes and improves reliability during concurrent autotuning operations.


403-409: LGTM! Target normalization and backend resolution are correct.

The explicit normalization of target and resolution of execution backend ensure kernels are loaded with the correct backend configuration.


437-437: Verify whether compile_flags should be persisted to cache.

Line 437 passes None for compile_flags with a comment noting they are "not tracked here". If compile_flags affect kernel compilation behavior, not persisting them could cause kernels loaded from cache to behave differently than originally compiled.

Run this script to check if compile_flags are used during kernel compilation:

If compile_flags significantly affect kernel behavior, consider persisting them alongside other kernel metadata.

Comment on lines +222 to +257
# Save kernel library (backend-specific)
try:
kernel_lib_path = os.path.join(cache_path, KERNEL_LIB_PATH)
src_lib_path = kernel.adapter.libpath
if verbose:
logger.debug(f"Saving kernel library to file: {kernel_lib_path}")
shutil.copy(src_lib_path, kernel_lib_path)
if kernel.execution_backend == "nvrtc":
kernel_lib_file = KERNEL_CUBIN_PATH
elif kernel.execution_backend == "tvm_ffi":
kernel_lib_file = EXECUTABLE_PATH
else:
kernel_lib_file = KERNEL_LIB_PATH

kernel_lib_path = os.path.join(cache_path, kernel_lib_file)

if kernel.execution_backend == "nvrtc":
# Save cubin and python helper file
src_lib_path = kernel.adapter.libpath
kernel_py_path = os.path.join(cache_path, KERNEL_PY_PATH)
py_src_path = src_lib_path.replace(".cubin", ".py")
if verbose:
logger.debug(f"Saving kernel nvrtc python code to file: {kernel_py_path}")
self._safe_write_file(kernel_py_path, "wb",
lambda f: f.write(self._load_binary(py_src_path)))
if verbose:
logger.debug(f"Saving kernel library to file: {kernel_lib_path}")
self._safe_write_file(kernel_lib_path, "wb",
lambda f: f.write(self._load_binary(src_lib_path)))
elif kernel.execution_backend == "tvm_ffi":
executable = kernel.adapter.executable
if verbose:
logger.debug(f"Saving kernel executable to file: {kernel_lib_path}")
self._safe_write_executable(executable, kernel_lib_path)
else:
src_lib_path = kernel.adapter.libpath
if verbose:
logger.debug(f"Saving kernel library to file: {kernel_lib_path}")
self._safe_write_file(kernel_lib_path, "wb",
lambda f: f.write(self._load_binary(src_lib_path)))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verify nvrtc Python helper file exists before reading.

The nvrtc path derives the Python helper file path using string replacement:

py_src_path = src_lib_path.replace(".cubin", ".py")

Two concerns:

  1. String replacement could match .cubin anywhere in the path, not just the extension
  2. No verification that py_src_path exists before calling _load_binary(py_src_path) at line 241

Consider this more robust approach:

-                src_lib_path = kernel.adapter.libpath
-                kernel_py_path = os.path.join(cache_path, KERNEL_PY_PATH)
-                py_src_path = src_lib_path.replace(".cubin", ".py")
+                src_lib_path = kernel.adapter.libpath
+                kernel_py_path = os.path.join(cache_path, KERNEL_PY_PATH)
+                # Use splitext + join to avoid matching .cubin in directory names
+                base_path = os.path.splitext(src_lib_path)[0]
+                py_src_path = base_path + ".py"
+                
+                # Verify py file exists before reading
+                if not os.path.exists(py_src_path):
+                    logger.warning(f"Python helper file not found: {py_src_path}")
+                else:

Comment on lines +429 to +439
kernel = cls._load_kernel_from_disk(
cls,
path,
norm_target,
compile_args.target_host,
compile_args.out_idx,
resolved_backend,
compile_args.pass_configs,
None, # compile_flags not tracked here
func,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Clarify the method invocation pattern for _load_kernel_from_disk.

At line 430, cls is passed as the first argument to cls._load_kernel_from_disk(cls, ...). However, _load_kernel_from_disk is defined as an instance method with self as the first parameter (line 270). This works because the method doesn't actually use any instance state, but it creates confusion.

Consider converting _load_kernel_from_disk to a static method for clarity:

+    @staticmethod
     def _load_kernel_from_disk(
-        self,
         cache_path: Path,
         target: str | Target = "auto",

Then call it as:

-        kernel = cls._load_kernel_from_disk(
-            cls,
+        kernel = AutotuneResult._load_kernel_from_disk(
             path,

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tilelang/autotuner/param.py around lines 429-439, the call
cls._load_kernel_from_disk(cls, ...) passes cls as the first argument to a
method currently defined as an instance method (self) which is confusing;
convert _load_kernel_from_disk to a staticmethod (add @staticmethod above its
definition and remove the self parameter or rename it to match actual args) so
it does not expect an instance, then update the call site to
cls._load_kernel_from_disk(path, norm_target, compile_args.target_host,
compile_args.out_idx, resolved_backend, compile_args.pass_configs, None, func)
(i.e., stop passing cls as the first arg) and ensure any internal uses that
referenced self are adjusted to use explicit parameters or class-level
references as needed.

@LeiWang1999 LeiWang1999 merged commit 721baed into tile-ai:main Nov 22, 2025
7 checks passed
@SiriusNEO SiriusNEO mentioned this pull request Nov 24, 2025
2 tasks
RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant