Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cutlass update #1

Merged
merged 5 commits into from
Jun 25, 2020
Merged

cutlass update #1

merged 5 commits into from
Jun 25, 2020

Conversation

denghuilu
Copy link
Owner

No description provided.

kerrmudgeon and others added 5 commits April 7, 2020 13:51
CUTLASS 2.1 contributes:
- BLAS-style host-side API added to CUTLASS Library
- Planar Complex GEMM kernels targeting Volta and Turing Tensor Cores
- Minor enhancements and bug fixes
#82)

#70 only updates the documentation. This commit reflects this bump in python version to the CMake configuration as well.
Adds support for NVIDIA Ampere Architecture features. CUDA 11 Toolkit recommended.
…100)

- Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>.
- Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out
- Added test_examples target to build and test all CUTLASS examples
- Minor edits to documentation to point to GTC 2020 webinar
* Updated documentation of fused GEMM example and removed UNITY BUILD batch size. The default batch size when unity build is enabled tends to be favorable.
@denghuilu denghuilu merged commit 0a6b59b into denghuilu:master Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants