-
Notifications
You must be signed in to change notification settings - Fork 332
Add int2 and longlong4 pack functions #1129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughTwo small packing helpers were added to the CUDA common header: Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant make_int2 as make_int2 / make_int
participant make_longlong4 as make_longlong4 / make_int/int2
Note over Caller,make_int2: Packing 8 signed chars -> int2
Caller->>make_int2: call with x0..x3, y0..y3
make_int2-->>make_int2: compose laneA = make_int(x0,x1,x2,x3)
make_int2-->>make_int2: compose laneB = make_int(y0,y1,y2,y3)
make_int2-->>Caller: return int2(laneA, laneB)
Note over Caller,make_longlong4: Packing 8 ints -> longlong4
Caller->>make_longlong4: call with x0,x1,y0,y1,z0,z1,w0,w1
make_longlong4-->>make_longlong4: build int2 a = make_int(x0,x1,x2?,x3?)
make_longlong4-->>make_longlong4: build int2 b = make_int(y0,y1,...)
make_longlong4-->>make_longlong4: assemble longlong4 via pointer/assignment of int2 lanes
make_longlong4-->>Caller: return longlong4
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/tl_templates/cuda/common.h (1)
127-136: Consider a helper function for consistency.The implementation uses type-punning via pointer casting (
*((int2*)&result.x)), which technically violates strict aliasing rules. While this pattern works in CUDA device code, it's inconsistent with the existingmake_int4function (lines 113-125), which directly assigns to fields using a helper.Consider introducing a
make_longlonghelper for consistency:+// Pack two int values. +TL_DEVICE long long make_longlong(int x0, int x1) { + long long result; + *((int2*)&result) = make_int2(x0, x1); + return result; +} + // Pack eight int values. TL_DEVICE longlong4 make_longlong4(int x0, int x1, int y0, int y1, int z0, int z1, int w0, int w1) { longlong4 result; - *((int2*)&result.x) = make_int2(x0, x1); - *((int2*)&result.y) = make_int2(y0, y1); - *((int2*)&result.z) = make_int2(z0, z1); - *((int2*)&result.w) = make_int2(w0, w1); + result.x = make_longlong(x0, x1); + result.y = make_longlong(y0, y1); + result.z = make_longlong(z0, z1); + result.w = make_longlong(w0, w1); return result; }This approach maintains the same packing logic while matching the pattern used throughout the file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/tl_templates/cuda/common.h(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Quick Lint
🔇 Additional comments (2)
src/tl_templates/cuda/common.h (2)
96-100: LGTM! Clear documentation.The comment clarification improves readability without affecting functionality.
102-110: LGTM! Consistent pattern and good code reuse.The implementation correctly packs eight signed chars into an int2 by reusing the existing
make_inthelper, maintaining consistency with themake_int4pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/tl_templates/cuda/common.h (2)
102-110: Consider renaming to avoid overload confusion with CUDA's built-inmake_int2.The implementation correctly packs 8 signed chars into an
int2by reusingmake_intfor each lane. However, CUDA already provides a built-inmake_int2(int x, int y)function. While these are technically valid overloads (different signatures), this creates naming ambiguity:
- In
make_longlong4(line 131-134), calls likemake_int2(x0, x1)withintarguments resolve to the CUDA built-in, not this custom function.- Developers may be confused about which
make_int2is being invoked in different contexts.Consider either:
- Renaming to
make_int2_from_charsor similar to clarify intent, or- Adding a comment explaining the overload and when each version is used.
127-136: Logic is correct, but consider documenting the reliance on CUDA's built-inmake_int2.The implementation correctly packs 8 ints into a
longlong4:
- The pointer cast
*((int2 *)&result.x)is safe becauselong long(64 bits) andint2(2×32 bits) have matching sizes.- Calls to
make_int2(x0, x1)correctly resolve to CUDA's built-inmake_int2(int, int), not the custom 8-parameter version defined above.However, note the design inconsistency:
make_int,make_int4, and the new custommake_int2all pack signed chars, whilemake_longlong4packs ints. This breaks the established pattern and may confuse maintainers who expect allmake_*functions to pack the same base type.Consider adding a comment clarifying that this function uses CUDA's built-in
make_int2for composition, or documenting why this function packs a different type than the others.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/tl_templates/cuda/common.h(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)
🔇 Additional comments (1)
src/tl_templates/cuda/common.h (1)
96-96: LGTM - Minor style consistency improvement.The added period makes the comment consistent with other function comments in the file.
* Remove an incorrect check * add fp8 pack function * code lint * minor fix * minor fix * minor fix * Minor fix * Minor fix * add pack function * code lint * code lint
Summary by CodeRabbit