Skip to content

Conversation

@yyttt6
Copy link
Contributor

@yyttt6 yyttt6 commented Oct 12, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Fixed GPU synchronization issues that could cause runtime interruptions during benchmarking on newer PyTorch versions.
  • Compatibility
    • Improved synchronization behavior to maintain correct operation across PyTorch/CUDA versions.
  • Performance/Profiling
    • More consistent and reliable benchmark timing across environments, improving result accuracy with no user action required.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 12, 2025

Walkthrough

Replaces a post-invocation synchronization call in tilelang/profiler/bench.py: torch.accelerator.synchronize() was changed to torch.cuda.synchronize().

Changes

Cohort / File(s) Summary
Profiler synchronization change
tilelang/profiler/bench.py
Replace torch.accelerator.synchronize() call with torch.cuda.synchronize() after the profiled function invocation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant B as Bench
  participant T as PyTorch
  participant C as CUDA

  B->>T: Run profiled function
  Note right of T: Post-call synchronization
  T->>C: torch.cuda.synchronize()
Loading

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

I twitch my nose at a tiny swap,
Accelerator out, CUDA hops on top.
One-line shuffle, timing kept neat—
Benchmarks hum with steady beat. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check ✅ Passed The title clearly summarizes the primary change by indicating a bugfix that replaces torch.accelerator.synchronize() with torch.cuda.synchronize(), matching the diff and providing sufficient context for reviewers scanning the history.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc41463 and 08314c6.

📒 Files selected for processing (1)
  • tilelang/profiler/bench.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test-metal

from typing import Callable, List, Literal, Optional, Union

import torch
from packaging import version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Import looks good, but verify dependency.

The packaging library is commonly used for version parsing and the import is correct.

Run the following script to verify that packaging is listed as a dependency:


🏁 Script executed:

#!/bin/bash
# Check if packaging is listed in project dependencies
fd -t f 'requirements.*\.txt|setup\.py|setup\.cfg|pyproject\.toml' --exec grep -l 'packaging'

Length of output: 123


🏁 Script executed:

#!/bin/bash
grep -R "packaging" setup.py pyproject.toml requirements*.txt || true

Length of output: 224


Ensure device-agnostic synchronization and API compatibility

  • Line 161 still uses torch.cuda.synchronize() without version or device check.
  • do_bench hardcodes "cuda" and uses torch.cuda.Event; adapt for MPS and other targets.
  • Confirm that torch.accelerator.synchronize() exists in your PyTorch ≥2.6.0 build.
🤖 Prompt for AI Agents
tilelang/profiler/bench.py lines 8-8: the file currently imports
packaging.version but later uses hardcoded torch.cuda synchronization and CUDA
events; update to be device-agnostic and API-safe by adding a small helper that
takes a torch.device (or device string) and synchronizes safely: first, if torch
has torch.accelerator.synchronize (check hasattr and ensure torch version ≥2.6
via packaging.version.parse(torch.__version__)), call it; else if device.type ==
"cuda" call torch.cuda.synchronize(); else if device.type == "mps" call
torch.mps.synchronize() only if available (hasattr(torch, "mps") and
hasattr(torch.mps, "synchronize")), otherwise no-op. Modify do_bench to accept a
device parameter (or infer torch.device(device)), use torch.cuda.Event only when
device.type == "cuda" and fall back to time.perf_counter for other devices while
surrounding timing with the helper synchronize calls; ensure all calls guarded
by hasattr/version checks to avoid AttributeError on builds without
accelerator/mps support.

Comment on lines 101 to 104
if version.parse(torch.__version__) < version.parse("2.6.0"):
torch.cuda.synchronize()
else:
torch.accelerator.synchronize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Inconsistent synchronization and missing device type check.

The version-conditional synchronization has several issues:

  1. Inconsistency: Line 161 in _bench_with_cuda_events() still uses torch.cuda.synchronize() without the version check, creating inconsistent behavior across the benchmarking workflow.

  2. Missing device type check: The synchronization logic doesn't consider the device type. Lines 58-60 detect whether CUDA or MPS is available, but this new logic at lines 101-104 always calls torch.cuda.synchronize() for versions < 2.6.0, even on non-CUDA devices like MPS.

  3. API verification needed: Need to confirm that torch.accelerator.synchronize() is the correct API for PyTorch 2.6.0+ and that version 2.6.0 is the correct cutoff.

To verify the correct API and version, please search for the official PyTorch documentation:

To check for other synchronization calls that need updating, run:

Consider refactoring to:

  1. Update line 161 to use the same version-conditional logic
  2. Consider the device type (IS_CUDA) in the synchronization logic
  3. Extract the synchronization logic into a helper function to ensure consistency

Example refactor:

def _synchronize():
    """Synchronize the current device."""
    if version.parse(torch.__version__) < version.parse("2.6.0"):
        if IS_CUDA:
            torch.cuda.synchronize()
        else:
            torch.mps.synchronize()
    else:
        torch.accelerator.synchronize()

Then use _synchronize() at both lines 102 and 161.


🏁 Script executed:

#!/bin/bash
# Find all torch.cuda.synchronize() calls in the file
rg -n 'torch\.cuda\.synchronize\(\)' tilelang/profiler/bench.py

Length of output: 69


🌐 Web query:

What is the correct API for device synchronization in PyTorch 2.6.0 and later? Is it torch.accelerator.synchronize()?

💡 Result:

Yes — starting in PyTorch 2.6 the accelerator‑agnostic API is torch.accelerator.synchronize(device=None). Use that for device synchronization; older device‑specific calls (e.g. torch.cuda.synchronize or torch.cpu.synchronize) still exist for their backends. [1][2][3]

Sources:
[1] PyTorch 2.6 torch.accelerator.synchronize docs.
[2] PyTorch 2.6 torch.accelerator overview.
[3] PyTorch torch.cuda.synchronize docs.


Unify device synchronization into a helper
Both torch.cuda.synchronize() calls at lines 102 and 161 need the same version-conditional, device-aware logic. For PyTorch <2.6.0, call torch.cuda.synchronize() if IS_CUDA else torch.mps.synchronize(); for >=2.6.0, use torch.accelerator.synchronize(). Extract this into a single _synchronize() function and replace both calls with _synchronize().

🤖 Prompt for AI Agents
In tilelang/profiler/bench.py around lines 101-104 and also where the second
sync occurs near line 161, both places duplicate the same PyTorch version- and
device-aware synchronization logic; extract this into a single helper function
_synchronize() that checks torch.__version__ (using version.parse) and calls
torch.cuda.synchronize() when PyTorch <2.6.0 and IS_CUDA is true, calls
torch.mps.synchronize() when PyTorch <2.6.0 and not IS_CUDA, and calls
torch.accelerator.synchronize() for PyTorch >=2.6.0; add the helper near the top
of the file (ensuring version, torch and IS_CUDA are in scope), then replace
both original conditional blocks with a simple call to _synchronize().

@LeiWang1999 LeiWang1999 changed the title [Refactor]:Add support for torch version lower than 2.6.0 [Bugfix] Fallback torch.accelerator.synchronize() to torch.cuda.synchronize() Oct 12, 2025
@LeiWang1999 LeiWang1999 merged commit 4a229dd into tile-ai:main Oct 12, 2025
6 of 8 checks passed
RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025
…nchronize()` (tile-ai#987)

* [Refactor]:Add support for torch version lower than 2.6.0

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants