Skip to content

Commit

Permalink
chore(deps): update dependency com_github_nvidia_cutlass to v3.5.1 (#838
Browse files Browse the repository at this point in the history
)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [com_github_nvidia_cutlass](https://togithub.com/NVIDIA/cutlass) |
http_archive | patch | `v3.5.0` -> `v3.5.1` |

---

### Release Notes

<details>
<summary>NVIDIA/cutlass (com_github_nvidia_cutlass)</summary>

### [`v3.5.1`](https://togithub.com/NVIDIA/cutlass/releases/tag/v3.5.1):
CUTLASS 3.5.1

[Compare
Source](https://togithub.com/NVIDIA/cutlass/compare/v3.5.0...v3.5.1)

- [Minimal SM90 WGMMA + TMA GEMM example in 100 lines of
code](./examples/cute/tutorial/wgmma_sm90.cu).
- [Exposure of L2 `cache_hint`s in TMA copy
atoms](./include/cute/arch/copy_sm90\_tma.hpp#L48)
- Exposure of raster order and tile swizzle extent in [CUTLASS library
profiler](./media/docs/profiler.md#GEMM), and
[example
48](./examples/48\_hopper_warp_specialized_gemm/48\_hopper_warp_specialized_gemm.cu).
- [TMA store based and EVT supported
epilogues](./include/cutlass/epilogue/collective/sm90\_epilogue_array_tma_warpspecialized.hpp)
for [Hopper pointer array batched
kernels](./test/unit/gemm/device/sm90\_gemm_f16\_f16\_f16\_tensor_op_f32\_ptr_array.cu).
- A new [`GemmSparseUniversal` API for CUTLASS 2.x Ampere
kernels](./include/cutlass/gemm/device/gemm_sparse_universal.h) to
enable serial and parallel split-k for sparse tensor cores and new tiny
tile sizes to better support LLM inference.
- [CUDA host adapter](./include/cutlass/cuda_host_adapter.hpp)
extensions to support TMA descriptor construction driver APIs.
- Inclusion of more [Hopper fprop, dgrad, and wgrad convolution kernels
in CUTLASS library and profiler](./python/cutlass_library/generator.py).
-   Support for residual add (beta != 0) in convolution kernels.
- A new convolution
[epilogue](./examples/16\_ampere_tensorop_conv2dfprop/ampere_tensorop_conv2dfprop.cu#L269)
for CUTLASS 2.x to support non-packed NHWC output.
- A refactor of [include files throughout CUTLASS core
directories](./include/cutlass/gemm/collective/collective_mma_decl.hpp)
to reduce circular dependencies and [tests to guard against
them](./test/self_contained_includes/CMakeLists.txt).
- [A guide for setting up VSCode to work well with
CUTLASS](./media/docs/ide_setup.md) and [expanded code style
guide](./media/docs/programming_guidelines.md).
-   Better support for MSVC as a host compiler.
- Many performance optimizations, improvements, and bug fixes including
fixes for FlashAttention-2.
-   Optimal code generation with CUDA toolkit versions 12.4 and 12.5u1.
-   NOTICE:
- Upcoming CUTLASS 3.6 release will include a breaking refactor to the
CUTLASS 3.x convolution `kernel::ConvUniversal` API to bring it in line
with `gemm::GemmUniversal`. After this, the 3.x convolution API will no
longer be considered as a beta API.
- Upcoming CUTLASS 3.6 release will include a breaking refactor to the
Hopper TMA pointer array batched epilogue in order to support grouped
GEMMs.

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/secretflow/spu).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC41Ni4wIiwidXBkYXRlZEluVmVyIjoiMzguNTYuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
  • Loading branch information
renovate[bot] authored Sep 14, 2024
1 parent e09b8cc commit 3acc89c
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions bazel/repositories.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -242,10 +242,10 @@ def _com_github_nvidia_cutlass():
maybe(
http_archive,
name = "com_github_nvidia_cutlass",
strip_prefix = "cutlass-3.5.0",
strip_prefix = "cutlass-3.5.1",
urls = [
"https://github.com/NVIDIA/cutlass/archive/refs/tags/v3.5.0.tar.gz",
"https://github.com/NVIDIA/cutlass/archive/refs/tags/v3.5.1.tar.gz",
],
sha256 = "ef6af8526e3ad04f9827f35ee57eec555d09447f70a0ad0cf684a2e426ccbcb6",
sha256 = "20b7247cda2d257cbf8ba59ba3ca40a9211c4da61a9c9913e32b33a2c5883a36",
build_file = "@spulib//bazel:nvidia_cutlass.BUILD",
)

0 comments on commit 3acc89c

Please sign in to comment.