Full documentation for hipTensor is available at rocm.docs.amd.com/projects/hiptensor.
- Added benchmarking suites for contraction, permutation, and reduction. YAML files are categorized into bench and validation folders for organization
- Added emulation test suites for contraction, permutation, and reduction
- Support has been added for changing the default data layout using the
HIPTENSOR_DEFAULT_STRIDES_COL_MAJOR
environment variable
- Used
GPU_TARGETS
instead ofAMDGPU_TARGETS
incmakelists.txt
- Optimized the hyper-parameter selection algorithm for permutation
- For CMake bug workaround, set
CMAKE_NO_BUILTIN_CHRPATH
whenBUILD_OFFLOAD_COMPRESS
is unset
- Added support for tensor reduction, including APIs, CPU reference, unit tests, and documentation
- ASAN builds only support xnack+ targets.
- ASAN builds use
-mcmodel=large
to accommodate library sizes greater than 2GB. - Updated the permute backend to accommodate changes to element-wise operations.
- Updated the actor-critic implementation.
- Split kernel instances to improve build times
- Fixed a bug in randomized tensor input data generation.
- Fixed the default strides calculation to be in column major order.
- Fixed a small memory leak by properly destroying HIP event objects in tests.
- Default strides calculations now follow column-major convention.
- Various documentation formatting updates and fixes.
- Added support for tensor permutation of ranks of 2, 3, 4, 5 and 6
- Added tests for tensor permutation of ranks of 2, 3, 4, 5 and 6
- Added support for tensor contraction of M6N6K6: M, N, K up to rank 6
- Added tests for tensor contraction of M6N6K6: M, N, K up to rank 6
- Added new test YAML parsing to support sequential parameters ordering
- Documentation updates for installation, programmer's guide and API reference
- Prefer amd-llvm-devel package before system LLVM library
- Preferred compilers changed to CC=amdclang CXX=amdclang++
- Updated actor-critic selection for new contraction kernel additions
- Fixed LLVM parsing crash
- Fixed memory consumption issue in complex kernels
- Work-around implemented for compiler crash during debug build
- Allow random modes ordering for tensor contractions
- API support for permutation of rank 4 tensors: f16 and f32
- New datatype support in contractions of rank 4: f16, bf16, complex f32, complex f64
- Added scale and bilinear contraction samples and tests for new supported data types
- Added permutation samples and tests for f16, f32 types
- Fixed bug in contraction calculation with data type f32
- Architecture support for gfx940, gfx941, and gfx942
- Client tests configuration parameters now support YAML file input format
- Doxygen now treats warnings as errors
- Client tests output redirections now behave accordingly
- Removed dependency static library deployment
- Security issues for documentation
- Compile issues in debug mode
- Corrected soft link for ROCm deployment
- Initial prototype enablement of hipTensor library that supports tensor operations
- Kernel selection support for Default and Actor-Critic algorithms
- API support for:
- Definition and contraction of rank 4 tensors
- Contextual logging and output redirection
- Kernel selection caching
- Data type support for f32 and f64
- Architecture support for gfx908 and gfx90a