Skip to content

Conversation

@kavin-sai-krishna
Copy link
Contributor

This PR adds support for bucketize op which is used in many vision models like Phi4, SmolVLM etc.,

@tqchen
Copy link
Member

tqchen commented Jun 5, 2025

for operators like this one, we also need legalization rule to know how to lower them. We don;t want to be end up in a situation where we have the ops but canot lower/compile them. cc @tlopex

@kavin-sai-krishna
Copy link
Contributor Author

for operators like this one, we also need legalization rule to know how to lower them

@tqchen I came across the PyTorch implementation of this operation and noticed that they used searchsorted. Following that approach, I’ve used topi.searchsorted to lower the operation. I also tested the implementation numerically with boundaries = [0, 2, 4, 6, 8, 10] and input = [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], and the results appear to be correct.

@tqchen
Copy link
Member

tqchen commented Jun 5, 2025

Thanks, would be good to go through the checklist below. Some checklist for adding a new op.

  • C0: Can the operator be decomposed into smaller ops (if so, then it may not be necessary to introduce the high-level op). It is helpful to reuse existing ops when possible so we don't have to introduce C1/C2, otherwise we need C1/C2
  • C1: Introduce legalization rules so the op at least compiles on CPU
  • C2: Make sure the op compiles on CUDA

@tqchen
Copy link
Member

tqchen commented Jun 5, 2025

I think the main question is how to make sure it runs on cuda

@kavin-sai-krishna
Copy link
Contributor Author

@tqchen Thank you. I understood the high-level idea you suggested, but I have a few specific questions regarding the design choices:

  • Q1: What’s the difference between decomposing an op using Relax ops vs. TOPI ops vs. TIR?
    How does the abstraction level impact performance or correctness?

  • Q2: If the op compiles on CUDA, is numerical verification still required?

  • Q3: Are there nightly tests that check numerical correctness?
    I ask because I found a case (fmod) where the op ran but didn't match PyTorch output.

@tqchen
Copy link
Member

tqchen commented Jun 5, 2025

Q1: What’s the difference between decomposing an op using Relax ops vs. TOPI ops vs. TIR?

Given this is relax importer, we can chose either options as long as the correctness match. When possible, if we can decompose via relax then legalize, it gives most opportunities for possible choice of lowering path. We should aim to reduce total number of core relax ops

Q2/ Q3

yes ideally we should have a nightly test validating the correctness

We can add such tests to

https://github.com/apache/tvm/tree/main/tests/python/nightly

nightly/relax/test_relax_op_numeric.py

@kavin-sai-krishna
Copy link
Contributor Author

@tqchen Thanks for your response. I'll make sure the checklists are satisfied. But I'm not sure what i should do if C2 is not met.

@kavin-sai-krishna
Copy link
Contributor Author

@tqchen I've updated the op to compile and run on CUDA as you requested. Can you please review it.

@tlopex
Copy link
Member

tlopex commented Jun 28, 2025

Please resolve the conflicts so that we can merge it:)
@kavin-sai-krishna

@kavin-sai-krishna
Copy link
Contributor Author

@tlopex I've resolved the conflicts. Can you please take a look?

Copy link
Member

@tlopex tlopex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@tlopex tlopex merged commit 9eb8b30 into apache:main Jun 30, 2025
10 checks passed
ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025
* add support for bucketize

* fix lint issue

* Fix lint issue

* Add GPU code for bucketize

* Resolve merge conflict

* Fix lint issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants