v1.9.0 release

Latest

Latest

Anerudhan released this 20 Dec 19:22

ee971b1

cudnn frontend v1.9 release notes

New API

Enhancements to flash attention API

SDPA_attributes and SDPA_bprop_attributes now accepts a score_mod function through set_score_mod and set_score_mod_bprop API. The function accepts a custom chain of pointwise operations which operate on the Attention Score Matrix. Some common functors like causal mask, sliding window mask, soft capping etc. have been added to the headers as reference. More examples of usage have been added in samples for fprop and bprop.
Added support for THD format and sliding window mask.
Added support for THD format and Bottom right causal mask.
Added support for bottom right causal masking with sliding window mask
Added a new parameter called set_max_total_seq_len_q/set_max_total_seq_len_kv on the sdpa bprop node. This will help reduce the workspace size required when running with THD format.

Improvements

Allow creation of serialized json for dgrad, wgrad and resample operations.
Added more diagnostic message when the compiled version of cudnn does not match the run-time version of cudnn.

Bug fixes

Fixed an issue where log messages unparseable data at the end of messages.
Fixed an issue where while building the python pip wheel would hang.
Fixed natively creating cuda graphs for SDPA with alibi masks.

New samples

Added a new sample for Layernorm with dynamic shapes and a kernel cache to showcase reduced plan build time when using the kernel cache.

Assets 2