-
Notifications
You must be signed in to change notification settings - Fork 333
[Feat] Add fast sine and cosine definitions in CUDA templates #865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,8 @@ using int4_t = int4; | |
| #define hexp cutlass::fast_exp | ||
| #define hlog cutlass::fast_log | ||
| #define hsqrt cutlass::fast_sqrt | ||
| #define hsin cutlass::fast_sin | ||
| #define hcos cutlass::fast_cos | ||
|
Comment on lines
+21
to
+22
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chainAdds fast trig macros consistently; verify CUTLASS support for all target types/versions. hsin/hcos align with existing hexp/hlog/hsqrt patterns. Please confirm that cutlass::fast_sin / cutlass::fast_cos are available (and device-callable) for your targeted CUTLASS/CUDA versions and support the input types you pass (e.g., half_t, bfloat16_t, float). If bfloat16_t isn’t covered (similar to the custom bfloat16_t fast_exp wrapper below), consider adding analogous wrappers or fallbacks. Run the following to check for potential collisions and usage sites: Optionally, verify availability in docs: 🏁 Script executed: #!/bin/bash
# 1) Find any other definitions/usages of hsin/hcos to avoid macro conflicts.
rg -n -C2 -P '(^\s*#\s*define\s+hsin\b)|\bhsin\s*\('
rg -n -C2 -P '(^\s*#\s*define\s+hcos\b)|\bhcos\s*\('
# 2) Confirm CUTLASS fast trig references in the repo (sanity check).
rg -n -C2 -P '\bcutlass::fast_(sin|cos)\b'
# 3) Optional: locate bfloat16 sin/cos usage that might require wrappers.
rg -n -C2 -P '\b(bfloat16_t|half_t)\b.*\b(hsin|hcos)\s*\('Length of output: 1025 🌐 Web query: 💡 Result: Short answer: No — CUTLASS’s fast_math.h does not expose cutlass::fast_sin / cutlass::fast_cos. The fast_math header in CUTLASS implements a set of “fast_…” helpers (e.g. fast_tanh, fast_exp, etc.) with overloads for float / double and some device half support, but you won’t find cutlass::fast_sin/fast_cos there. The header and related GitHub issues show the library provides fast variants for operations like tanh (including float/double/half overloads) but not sin/cos or bfloat16 fast trig helpers. [1][2] Sources:
References: Remove/guard hsin/hcos — cutlass::fast_sin / cutlass::fast_cos not provided CUTLASS’s fast_math.h does not expose cutlass::fast_sin/cutlass::fast_cos; the macros in src/tl_templates/cuda/common.h (lines 21–22) will reference undefined symbols and likely break builds.
src/tl_templates/cuda/common.h:21-22 🤖 Prompt for AI Agents |
||
| #define htanh cutlass::fast_tanh | ||
| #define hpow powf | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While using macros is consistent with the existing definitions in this file, it's generally safer to use
inlinefunctions in C++. This avoids potential issues with macro expansion, improves type safety, and respects namespaces. Since the project uses C++17 features (likestd::is_same_v), you can useautofunction parameters (a C++14 feature) to create generic wrappers concisely. This would be a good opportunity to start migrating away from macros for these function aliases.