Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jing's contribution: prototype of mixed precision gemm FP16/BF16xint4 GEMM #1762

Open
wants to merge 74 commits into
base: develop
Choose a base branch
from

Conversation

aosewski
Copy link
Collaborator

Proposed changes

Prototype of Mixed Precision GEMM FP16/BF16xint4. Contribution from PR #1572

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

This is just a prototype and as is it needs some polishing and further improvements. Especially there's a need to:

  • Move PackSize into custom data type as a attribute
  • Add is_pack_type in trait
  • Refactor coordinate and copy function for sub-byte data type
  • Refactor dynamic/static buffer/tensor for sub-byte data type

Those additional changes probably would be added in next PR.

zjing14 and others added 12 commits December 16, 2024 14:51
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
…ration.hpp

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
Copy link
Contributor

@spolifroni-amd spolifroni-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem to be anything to do for docs.

@zjing14
Copy link
Contributor

zjing14 commented Dec 19, 2024

@aosewski Could post logs of CI failure?

@mtgu0705
Copy link

@zjing14, attached please find the CI failed log:

[2024-12-18T16:17:18.860Z] In file included from /var/jenkins/workspace/MLLIBS_composable_kernel_PR-1762/test/gemm/instance/gemm_f16_nt_instance.cpp:9:
[2024-12-18T16:17:18.861Z] In file included from /var/jenkins/workspace/MLLIBS_composable_kernel_PR-1762/include/ck/tensor_operation/gpu/device/impl/device_gemm_xdl_cshuffle.hpp:15:
[2024-12-18T16:17:18.861Z] In file included from /var/jenkins/workspace/MLLIBS_composable_kernel_PR-1762/include/ck/tensor_operation/gpu/grid/gridwise_gemm_xdl_cshuffle_v1.hpp:13:
[2024-12-18T16:17:18.861Z] In file included from /var/jenkins/workspace/MLLIBS_composable_kernel_PR-1762/include/ck/tensor_operation/gpu/block/thread_group_tensor_slice_transfer_v4r1.hpp:10:
[2024-12-18T16:17:18.861Z] /var/jenkins/workspace/MLLIBS_composable_kernel_PR-1762/include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1.hpp:378:27: error: static assertion failed due to requirement 'is_same_v<_Float16, ck::pk_i4_t>': transpose is not allowed for pk_i4_t
[2024-12-18T16:17:18.861Z] 378 | static_assert(is_same_v<remove_cvref_t, pk_i4_t>,
[2024-12-18T16:17:18.861Z] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[2024-12-18T16:17:18.861Z] /var/jenkins/workspace/MLLIBS_composable_kernel_PR-1762/include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1.hpp:460:9: note: in instantiation of function template specialization

@mtgu0705
Copy link

@zjing14 Also attached the fully log file:
failed.log.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI - Testing enhancement New feature or request external contribution Code contribution from users community..
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants