[Bugfix] Fix autotune cache #1315

LeiWang1999 · 2025-11-22T10:44:41Z

This pull request refactors the kernel caching logic in tilelang/autotuner/param.py to improve atomicity, consistency, and backend support for saving and loading compiled kernels and related artifacts. It introduces safe file writing methods, aligns file naming conventions with the cache system, and enhances support for multiple execution backends. These changes help prevent data corruption, simplify cache management, and make backend-specific handling more robust.

Atomic and Safe File Operations

Added _safe_write_file and _safe_write_executable static methods to ensure atomic file writes, preventing partial writes and improving reliability during concurrent operations.
Refactored all file write operations in save_to_disk and kernel save/load routines to use these atomic methods for configs, functions, latencies, and parameters. [1] [2]

Consistent File Naming and Backend Support

Standardized file names for kernel artifacts (device_kernel.cu, host_kernel.cu, executable.so, etc.) to match the conventions in cache/kernel_cache.py, and made kernel library file selection backend-aware. [1] [2] [3]
Improved logic for saving and loading backend-specific files (e.g., NVRTC's .cubin and .py, TVM FFI's .so), including correct handling of host/device sources and executable export. [1] [2]

Kernel Load/Save API Improvements

Updated _load_kernel_from_disk and related calls to support new file naming, backend resolution, and compile flags, ensuring correct reconstruction of JITKernel objects from disk. [1] [2] [3]
Added normalization and resolution of target/backend in load_from_disk for robust handling of different kernel compilation scenarios.

These changes make the autotuner's disk caching more reliable, backend-aware, and maintainable.

Summary by CodeRabbit

Bug Fixes
- Improved kernel persistence with atomic write operations to prevent data corruption.
- Enhanced loading logic with target normalization and backend resolution for more reliable kernel recovery.
Improvements
- Strengthened support for multiple execution backends with consistent artifact management.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-11-22T10:44:50Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2025-11-22T10:44:51Z

Walkthrough

The pull request refactors kernel persistence in the autotuner by introducing file-path constants, atomic write utilities, and reworked save/load logic to separately manage device and host kernels with backend-specific handling for nvrtc and tvm_ffi.

Changes

Cohort / File(s)	Summary
File-path constants & atomic I/O utilities `tilelang/autotuner/param.py`	Added file-path constants (DEVICE_KERNEL_PATH, HOST_KERNEL_PATH, EXECUTABLE_PATH, KERNEL_CUBIN_PATH, KERNEL_PY_PATH) and static methods (_load_binary, _safe_write_file, _safe_write_executable) for safe kernel artifact management.
Kernel persistence refactoring `tilelang/autotuner/param.py`	Reworked save_to_disk and load_from_disk methods to use atomic writes, handle device/host kernel sources separately, resolve execution backend, and normalize target. Extended _load_kernel_from_disk signature with compile_flags parameter; added backend-specific pathways for nvrtc (.cubin + .py helper) and tvm_ffi (executable).

Sequence Diagram

sequenceDiagram
    participant User
    participant AutotuneResult
    participant FileSystem
    participant Backend

    rect rgb(230, 245, 255)
    Note over AutotuneResult,FileSystem: Save Flow (New)
    User->>AutotuneResult: save_to_disk()
    AutotuneResult->>AutotuneResult: _safe_write_file() for best_config
    AutotuneResult->>AutotuneResult: _safe_write_file() for function & latency
    alt nvrtc backend
        AutotuneResult->>AutotuneResult: _safe_write_file() for device_kernel.cu
        AutotuneResult->>AutotuneResult: _safe_write_file() for host_kernel.cu
        AutotuneResult->>FileSystem: Save kernel.cubin (binary)
        AutotuneResult->>FileSystem: Save kernel.py (helper)
    else tvm_ffi backend
        AutotuneResult->>AutotuneResult: _safe_write_file() for kernel sources
        AutotuneResult->>AutotuneResult: _safe_write_executable() for executable.so
    end
    AutotuneResult->>AutotuneResult: _safe_write_file() for parameters
    end

    rect rgb(245, 230, 255)
    Note over User,Backend: Load Flow (New)
    User->>AutotuneResult: load_from_disk()
    AutotuneResult->>AutotuneResult: Normalize target
    AutotuneResult->>Backend: Resolve execution backend
    AutotuneResult->>AutotuneResult: _load_kernel_from_disk(backend, target, compile_flags)
    AutotuneResult->>FileSystem: Load device_kernel.cu + host_kernel.cu
    AutotuneResult->>FileSystem: Load parameters
    alt nvrtc backend
        AutotuneResult->>FileSystem: Load kernel.cubin
    else tvm_ffi backend
        AutotuneResult->>FileSystem: Load executable.so
    end
    AutotuneResult-->>User: Return AutotuneResult instance
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Atomic write utilities: Verify correctness of _safe_write_file and _safe_write_executable implementations and their proper use in save pathways.
Backend-specific kernel persistence: Confirm that nvrtc and tvm_ffi pathways correctly differentiate between device/host kernel sources and produce expected artifact files.
Target normalization & backend resolution: Validate that load_from_disk correctly normalizes target and resolves the execution backend before kernel loading.
Method signature extensions: Ensure compile_flags parameter is properly threaded through _load_kernel_from_disk call chain and that all call sites are updated.

Poem

🐰 Kernels saved with atomic care,
Device and host now form a pair,
Backend paths diverge with grace,
nvrtc's cubin finds its place,
Safe writes guard each precious byte—
Persistence done just right! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title '[Bugfix] Fix autotune cache' is vague and generic, using the term 'Fix autotune cache' without explaining the specific nature of the fix (atomicity, file naming standardization, backend support, etc.).	Consider a more specific title that describes the main technical improvement, such as '[Refactor] Improve kernel cache atomicity and backend support' or '[Bugfix] Add atomic writes and standardize kernel artifact file naming'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tilelang/autotuner/param.py (1)
302-316: Consider checking source file existence upfront.

The existence check at line 315 only verifies kernel_lib_path and params_path, but line 349 requires both device_kernel_source and host_kernel_source to be loaded. If source files are missing, the method will silently return None even though the kernel library exists.

Consider adding source files to the existence check for clearer early detection:
-        if not all([os.path.exists(file) for file in (kernel_lib_path, params_path)]):
+        required_files = [device_kernel_path, host_kernel_path, kernel_lib_path, params_path]
+        if not all([os.path.exists(file) for file in required_files]):
+            if verbose:
+                missing = [f for f in required_files if not os.path.exists(f)]
+                logger.debug(f"Cannot load kernel from {cache_path}: missing files {missing}")
             return None

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 470eb74 and cb3894a.

📒 Files selected for processing (1)

tilelang/autotuner/param.py (9 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tilelang/autotuner/param.py (4)

tilelang/cache/kernel_cache.py (3)

_load_binary (244-247)

_safe_write_file (250-257)

_safe_write_executable (260-263)

tilelang/jit/kernel.py (5)

export_library (613-630)

kernel_source (606-607)

params (602-603)

from_database (143-183)

out_idx (598-599)

tilelang/utils/target.py (1)

determine_target (62-123)

tilelang/jit/execution_backend.py (1)

resolve_execution_backend (62-100)

🪛 Ruff (0.14.5)

tilelang/autotuner/param.py

204-204: Do not catch blind exception: Exception

(BLE001)

219-219: Do not catch blind exception: Exception

(BLE001)

328-328: Do not catch blind exception: Exception

(BLE001)

337-337: Do not catch blind exception: Exception

(BLE001)

346-346: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Quick Lint

🔇 Additional comments (12)

tilelang/autotuner/param.py (12)

27-34: LGTM! Constants properly aligned with cache layer.

The new file path constants standardize kernel artifact naming across the codebase and provide clear backend-specific identifiers.

153-157: LGTM! Binary loader implementation is correct.

The implementation matches the reference in kernel_cache.py and correctly reads binary files.

159-168: LGTM! Atomic write implementation is robust.

The use of temporary files with os.replace ensures atomic writes and prevents partial file corruption during concurrent operations. The explicit makedirs for the temp directory adds good defensive programming.

170-176: LGTM! Executable export implementation is correct.

The method safely exports TVM executables using atomic file operations, correctly matching the pattern established in kernel_cache.py.

196-220: LGTM! Device and host kernel source saving is well-implemented.

The atomic writes and backend-specific host source selection (TVM FFI uses get_host_source(), others use get_kernel_source()) correctly align with the framework's architecture.

261-268: LGTM! Parameter serialization uses atomic writes.

The kernel parameters are now safely persisted using atomic write operations.

270-278: LGTM! Signature correctly extended with compile_flags.

The compile_flags parameter addition aligns with the JITKernel.from_database signature and enables proper kernel reconstruction from cache.

318-348: LGTM! Source loading correctly handles device and host kernels.

The separate loading of device and host kernel sources with appropriate error handling enables proper reconstruction of backend-specific kernels.

349-364: LGTM! Kernel reconstruction correctly requires all artifacts.

The strict check ensuring both source files and parameters are loaded is appropriate, as JITKernel.from_database requires all these inputs.

366-395: LGTM! All config and metadata writes are now atomic.

Converting all file writes in save_to_disk to use _safe_write_file prevents partial writes and improves reliability during concurrent autotuning operations.

403-409: LGTM! Target normalization and backend resolution are correct.

The explicit normalization of target and resolution of execution backend ensure kernels are loaded with the correct backend configuration.

437-437: Verify whether compile_flags should be persisted to cache.

Line 437 passes None for compile_flags with a comment noting they are "not tracked here". If compile_flags affect kernel compilation behavior, not persisting them could cause kernels loaded from cache to behave differently than originally compiled.

Run this script to check if compile_flags are used during kernel compilation:

If compile_flags significantly affect kernel behavior, consider persisting them alongside other kernel metadata.

coderabbitai · 2025-11-22T10:50:04Z

tilelang/autotuner/param.py

+        # Save kernel library (backend-specific)
        try:
-            kernel_lib_path = os.path.join(cache_path, KERNEL_LIB_PATH)
-            src_lib_path = kernel.adapter.libpath
-            if verbose:
-                logger.debug(f"Saving kernel library to file: {kernel_lib_path}")
-            shutil.copy(src_lib_path, kernel_lib_path)
+            if kernel.execution_backend == "nvrtc":
+                kernel_lib_file = KERNEL_CUBIN_PATH
+            elif kernel.execution_backend == "tvm_ffi":
+                kernel_lib_file = EXECUTABLE_PATH
+            else:
+                kernel_lib_file = KERNEL_LIB_PATH
+
+            kernel_lib_path = os.path.join(cache_path, kernel_lib_file)
+
+            if kernel.execution_backend == "nvrtc":
+                # Save cubin and python helper file
+                src_lib_path = kernel.adapter.libpath
+                kernel_py_path = os.path.join(cache_path, KERNEL_PY_PATH)
+                py_src_path = src_lib_path.replace(".cubin", ".py")
+                if verbose:
+                    logger.debug(f"Saving kernel nvrtc python code to file: {kernel_py_path}")
+                self._safe_write_file(kernel_py_path, "wb",
+                                      lambda f: f.write(self._load_binary(py_src_path)))
+                if verbose:
+                    logger.debug(f"Saving kernel library to file: {kernel_lib_path}")
+                self._safe_write_file(kernel_lib_path, "wb",
+                                      lambda f: f.write(self._load_binary(src_lib_path)))
+            elif kernel.execution_backend == "tvm_ffi":
+                executable = kernel.adapter.executable
+                if verbose:
+                    logger.debug(f"Saving kernel executable to file: {kernel_lib_path}")
+                self._safe_write_executable(executable, kernel_lib_path)
+            else:
+                src_lib_path = kernel.adapter.libpath
+                if verbose:
+                    logger.debug(f"Saving kernel library to file: {kernel_lib_path}")
+                self._safe_write_file(kernel_lib_path, "wb",
+                                      lambda f: f.write(self._load_binary(src_lib_path)))
+


⚠️ Potential issue | 🟡 Minor

Verify nvrtc Python helper file exists before reading.

The nvrtc path derives the Python helper file path using string replacement:

py_src_path = src_lib_path.replace(".cubin", ".py")

Two concerns:

String replacement could match .cubin anywhere in the path, not just the extension

No verification that py_src_path exists before calling _load_binary(py_src_path) at line 241

Consider this more robust approach:

- src_lib_path = kernel.adapter.libpath - kernel_py_path = os.path.join(cache_path, KERNEL_PY_PATH) - py_src_path = src_lib_path.replace(".cubin", ".py") + src_lib_path = kernel.adapter.libpath + kernel_py_path = os.path.join(cache_path, KERNEL_PY_PATH) + # Use splitext + join to avoid matching .cubin in directory names + base_path = os.path.splitext(src_lib_path)[0] + py_src_path = base_path + ".py" + + # Verify py file exists before reading + if not os.path.exists(py_src_path): + logger.warning(f"Python helper file not found: {py_src_path}") + else:

coderabbitai · 2025-11-22T10:50:04Z

tilelang/autotuner/param.py

+        kernel = cls._load_kernel_from_disk(
+            cls,
+            path,
+            norm_target,
+            compile_args.target_host,
+            compile_args.out_idx,
+            resolved_backend,
+            compile_args.pass_configs,
+            None,  # compile_flags not tracked here
+            func,
+        )


🛠️ Refactor suggestion | 🟠 Major

Clarify the method invocation pattern for _load_kernel_from_disk.

At line 430, cls is passed as the first argument to cls._load_kernel_from_disk(cls, ...). However, _load_kernel_from_disk is defined as an instance method with self as the first parameter (line 270). This works because the method doesn't actually use any instance state, but it creates confusion.

Consider converting _load_kernel_from_disk to a static method for clarity:

+ @staticmethod def _load_kernel_from_disk( - self, cache_path: Path, target: str | Target = "auto",

Then call it as:

- kernel = cls._load_kernel_from_disk( - cls, + kernel = AutotuneResult._load_kernel_from_disk( path,

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In tilelang/autotuner/param.py around lines 429-439, the call cls._load_kernel_from_disk(cls, ...) passes cls as the first argument to a method currently defined as an instance method (self) which is confusing; convert _load_kernel_from_disk to a staticmethod (add @staticmethod above its definition and remove the self parameter or rename it to match actual args) so it does not expect an instance, then update the call site to cls._load_kernel_from_disk(path, norm_target, compile_args.target_host, compile_args.out_idx, resolved_backend, compile_args.pass_configs, None, func) (i.e., stop passing cls as the first arg) and ensure any internal uses that referenced self are adjusted to use explicit parameters or class-level references as needed.

fix autotume_cache

cb3894a

coderabbitai bot reviewed Nov 22, 2025

View reviewed changes

LeiWang1999 merged commit 721baed into tile-ai:main Nov 22, 2025
7 checks passed

SiriusNEO mentioned this pull request Nov 24, 2025

[BUG] Autotuning is broken #1313

Closed

2 tasks

RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025

[Bugfix] Fix autotune cache (tile-ai#1315)

6783c3d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix autotune cache #1315

[Bugfix] Fix autotune cache #1315

Uh oh!

LeiWang1999 commented Nov 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 22, 2025

Uh oh!

coderabbitai bot Nov 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Bugfix] Fix autotune cache #1315

[Bugfix] Fix autotune cache #1315

Uh oh!

Conversation

LeiWang1999 commented Nov 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 22, 2025

Uh oh!

coderabbitai bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeiWang1999 commented Nov 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 22, 2025 •

edited

Loading