[To merge AFTER flashinfer-ci changes updated] Reduce test time by moving compilation off-line #2089

kahyunnam · 2025-11-14T01:13:14Z

📌 Description

Download flashinfer-cubin and flashinfer-jit-cache to avoid compilation. (Unless the JIT kernel is not in the flashinfer-jit-cache; then it will still JIT compile during test runtime. We could set export FLASHINFER_DISABLE_JIT = 1 to avoid this, but then it will "skip" a lot of tests that use JIT kernels that aren't found in flashinfer-jit-cache.)

🔍 Related Issues

Issue was discussed on slack. "Ideally, we would move that compilation off-line which would reduce test time & make kernel hang detection much easier. "

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Summary by CodeRabbit

Chores
- Improved kernel installation to detect CUDA and attempt installing matching precompiled kernel packages.
- If precompiled kernel packages are missing, emit a warning and continue instead of aborting.
- Install local package sources during setup and verify installation by displaying configuration.
- Log detected CUDA version and only perform these runtime steps when not in dry-run mode.

coderabbitai · 2025-11-14T01:13:20Z

Walkthrough

Adds a non-dry-run initialization block to scripts/task_test_blackwell_kernels.sh that prints CUDA_VERSION, sets per-version distribution dirs under dist/${CUDA_VERSION}, attempts to install flashinfer-cubin and flashinfer-jit-cache wheels from those dirs (emitting warnings and continuing if missing), installs local Python sources, and verifies via python -m flashinfer show-config in /tmp.

Changes

Cohort / File(s)	Change Summary
Kernel install script `scripts/task_test_blackwell_kernels.sh`	Adds non-dry-run initialization: prints `CUDA_VERSION`; defines `DIST_CUBIN_DIR` and `DIST_JIT_CACHE_DIR` under `dist/${CUDA_VERSION}`; attempts to install `flashinfer-cubin` from `dist/${CUDA_VERSION}/cubin` wheels and now emits a warning and continues if none found (previously would exit); attempts to install `flashinfer-jit-cache` from `dist/${CUDA_VERSION}/jit-cache` wheels and now emits a warning and continues if none found (previously would exit); installs local Python sources with `pip install -e . -v --no-deps`; verifies installation by running `python -m flashinfer show-config` in `/tmp`.

Sequence Diagram(s)

sequenceDiagram
    participant Script as task_test_blackwell_kernels.sh
    participant Env as Environment
    participant FS as Filesystem (dist/)
    participant Pip as pip
    participant Python as python

    rect rgb(230,240,255)
    Note over Script: Start (only if DRY_RUN unset)
    Script->>Env: Check DRY_RUN
    alt DRY_RUN is not set
        Script->>Env: Read & echo CUDA_VERSION
        Script->>Script: Set DIST_CUBIN_DIR = dist/${CUDA_VERSION}/cubin
        Script->>Script: Set DIST_JIT_CACHE_DIR = dist/${CUDA_VERSION}/jit-cache
    else DRY_RUN set
        Script->>Env: Exit / skip install
    end
    end

    rect rgb(220,255,230)
    Note over Script,FS: Attempt install of prebuilt wheels (warn if missing)
    Script->>FS: Check DIST_CUBIN_DIR for wheels
    alt cubin wheels present
        Script->>Pip: pip install <wheels from DIST_CUBIN_DIR>
        Pip-->>Script: success
    else missing
        Script-->>Script: emit warning and continue
    end
    Script->>FS: Check DIST_JIT_CACHE_DIR for wheels
    alt jit-cache wheels present
        Script->>Pip: pip install <wheels from DIST_JIT_CACHE_DIR>
        Pip-->>Script: success
    else missing
        Script-->>Script: emit warning and continue
    end
    end

    rect rgb(255,245,220)
    Note over Script,Python: Install & verify local package
    Script->>Pip: pip install -e . -v --no-deps
    Pip-->>Script: installation complete
    Script->>Python: (cd /tmp && python -m flashinfer show-config)
    Python-->>Script: configuration output / verification
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Single-file change with small control-flow and messaging adjustments.
Areas to check:
- Warning messages and exit/non-exit behavior when wheels are missing.
- Correctness of DIST_* path construction and pip install commands.
- DRY_RUN gating and the verification command execution context (/tmp).

Poem

🐰 I sniff the CUDA version bright and clear,
I look for cubins and, if none, I cheer,
I pip-install my roots and then I test,
A rabbit hops on — warnings are no pest! 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: moving compilation offline to reduce test time, which aligns with the PR's core objective.
Description check	✅ Passed	The description includes the required template sections (Description, Related Issues, Pre-commit Checks, Tests) and provides sufficient detail about the changes and motivation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a27eac and 0a7f5c7.

📒 Files selected for processing (1)

scripts/task_test_blackwell_kernels.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/task_test_blackwell_kernels.sh

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example:

"Create a concise high-level summary as a bullet-point list. Then include a Markdown table showing lines added and removed by each contributing author."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kahyunnam · 2025-11-14T01:18:15Z

/bot run

flashinfer-bot · 2025-11-14T01:18:55Z

GitLab MR !137 has been created, and the CI pipeline #38459095 is currently running. I'll report back once the pipeline job completes.

scripts/task_test_blackwell_kernels.sh

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

scripts/task_test_blackwell_kernels.sh (2)
41-50: Inconsistent verbosity flags in sequential pip installations.

Lines 43 and 45 use -q (quiet) for kernel installations, while line 49 uses -v (verbose) for local source installation. This inconsistency makes it unclear whether the verbosity change is intentional and may make output harder to parse in CI logs.

Standardize the verbosity flags across all pip installations in this initialization block:
  # Install precompiled kernels
  echo "Installing flashinfer-cubin from PyPI/index..."
- pip install -q flashinfer-cubin
+ pip install -q flashinfer-cubin
  echo "Installing flashinfer-jit-cache for ${CUDA_STREAM} from https://flashinfer.ai/whl/${CUDA_STREAM} ..."
- pip install -q --extra-index-url "https://flashinfer.ai/whl/${CUDA_STREAM}" flashinfer-jit-cache
+ pip install -q --extra-index-url "https://flashinfer.ai/whl/${CUDA_STREAM}" flashinfer-jit-cache
  echo ""

  # Install local python sources
- pip install -e . -v --no-deps
+ pip install -e . -q --no-deps
Alternatively, if verbose output is intentional for debugging local installs, add a comment explaining the choice.

41-50: Verify that the custom PyPI index URL for flashinfer-jit-cache is reliable.

The script hardcodes the index URL https://flashinfer.ai/whl/${CUDA_STREAM} and expects it to always be available and contain the flashinfer-jit-cache package for the detected CUDA stream. If this URL becomes unavailable or if a CUDA stream version is not published, the pip install will fail and halt all subsequent tests.

Add error handling and diagnostics to surface issues clearly:
  echo "Installing flashinfer-jit-cache for ${CUDA_STREAM} from https://flashinfer.ai/whl/${CUDA_STREAM} ..."
- pip install -q --extra-index-url "https://flashinfer.ai/whl/${CUDA_STREAM}" flashinfer-jit-cache
+ if ! pip install -q --extra-index-url "https://flashinfer.ai/whl/${CUDA_STREAM}" flashinfer-jit-cache; then
+     echo "❌ ERROR: Failed to install flashinfer-jit-cache for CUDA stream ${CUDA_STREAM}"
+     echo "   Index URL: https://flashinfer.ai/whl/${CUDA_STREAM}"
+     exit 1
+ fi
Can you confirm that the custom index URL is stable and that all supported CUDA streams (cu128, cu129, cu130) are consistently published with the corresponding flashinfer-jit-cache package?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54101e9 and 375ca18.

📒 Files selected for processing (1)

scripts/task_test_blackwell_kernels.sh (1 hunks)

🔇 Additional comments (1)

scripts/task_test_blackwell_kernels.sh (1)

52-55: Verify that python -m flashinfer show-config is an appropriate verification step.

The verification runs python -m flashinfer show-config to confirm successful installation. However, this assumes:

The show-config subcommand exists in the flashinfer module

The command is idempotent and doesn't modify the environment

The command completes quickly without external dependencies

If this command fails (e.g., due to missing dependencies, invalid environment, or a transient issue), the entire test run is aborted before any tests can run, which may be overly strict for a verification step.

Can you confirm:

That python -m flashinfer show-config is a lightweight, read-only command that verifies the installation without side effects?

What the expected output is and whether it should be validated beyond the exit code?

Whether a failed verification should block all tests or only warn/skip?

scripts/task_test_blackwell_kernels.sh

flashinfer-bot · 2025-11-14T05:13:43Z

[FAILED] Pipeline #38459095: 3/17 passed

…kernels

scripts/task_test_blackwell_kernels.sh

bkryu

Thanks @kahyunnam! Left a comment about the behavior with jit cache & cubin wheels are not found.

scripts/task_test_blackwell_kernels.sh

update blackwell script to download cubins and jit cache

375ca18

kahyunnam force-pushed the knam/unit-testing-move-compilation-offline branch from c9c6768 to 375ca18 Compare November 14, 2025 01:16

kahyunnam self-assigned this Nov 14, 2025

kahyunnam marked this pull request as ready for review November 14, 2025 01:18

kahyunnam requested review from bkryu, nvmbreughe, yongwww and yzh119 as code owners November 14, 2025 01:18

yzh119 reviewed Nov 14, 2025

View reviewed changes

scripts/task_test_blackwell_kernels.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 14, 2025

View reviewed changes

scripts/task_test_blackwell_kernels.sh Outdated Show resolved Hide resolved

kahyunnam closed this Nov 14, 2025

Fix to use artifacts from previous pipeline step for the precompiled …

7a27eac

…kernels

kahyunnam reopened this Nov 14, 2025

kahyunnam changed the title ~~Reduce test time by moving compilation off-line~~ [To merge AFTER flashinfer-ci changes updated] Reduce test time by moving compilation off-line Nov 14, 2025

bkryu reviewed Nov 15, 2025

View reviewed changes

scripts/task_test_blackwell_kernels.sh Outdated Show resolved Hide resolved

bkryu reviewed Nov 15, 2025

View reviewed changes

scripts/task_test_blackwell_kernels.sh Outdated Show resolved Hide resolved

fix to print warning and continue if no jit-cache/cubin available

0a7f5c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[To merge AFTER flashinfer-ci changes updated] Reduce test time by moving compilation off-line #2089

[To merge AFTER flashinfer-ci changes updated] Reduce test time by moving compilation off-line #2089

Uh oh!

kahyunnam commented Nov 14, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

kahyunnam commented Nov 14, 2025

Uh oh!

flashinfer-bot commented Nov 14, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

flashinfer-bot commented Nov 14, 2025

Uh oh!

Uh oh!

bkryu left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[To merge AFTER flashinfer-ci changes updated] Reduce test time by moving compilation off-line #2089

Are you sure you want to change the base?

[To merge AFTER flashinfer-ci changes updated] Reduce test time by moving compilation off-line #2089

Uh oh!

Conversation

kahyunnam commented Nov 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

kahyunnam commented Nov 14, 2025

Uh oh!

flashinfer-bot commented Nov 14, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flashinfer-bot commented Nov 14, 2025

Uh oh!

Uh oh!

bkryu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kahyunnam commented Nov 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 14, 2025 •

edited

Loading

bkryu left a comment •

edited

Loading