Skip to content

v1.8.0 release

Compare
Choose a tag to compare
@Anerudhan Anerudhan released this 23 Oct 18:44
· 1 commit to main since this release
936021b

cudnn frontend v1.8 release:

New API

Paged Attention API

SDPA forward operation now supports paged attention on cudnn 9.5.0 and later by setting the appropriate page table descriptors. SDPA_attributes now accepts set_paged_attention_k_table and set_paged_attention_v_table to input these descriptors. Please refer to samples for usage : cpp samples, python samples. See docs for more API details. Paged attention allows for more efficient memory usage by storing K/V caches in non-contiguous memory, and using page tables to reconstruct them. For more information, refer to the cudnn_graph Library, and the Paged Attention paper

cuda Graph API

cudnn graph now allows user to directly build native cuda_graph for given sub_graph (requires cudnn 9.5.0). There are two APIs:

  • populate_cuda_graph : add the cudnn nodes to the empty cuda_graph provided as input.
  • update_cuda_graph : update the populated cuda graph with necessary data pointers.
    See docs and backend documentation for more details.

Enhancements

  • Kernel cache for dynamic shapes are now supported in python. Added a sample to showcase usage.

  • graph.deselect_engines(str: ) has now a python equivalent through pybind11.

  • graph.tensor(...) can now accept int64_t scalars directly. (Previously limited to int32_t, float and fp16 data types).

  • fp8 sdpa attention now allows dropout and padding mask. Requires cudnn 9.5.0 and above.

  • More enhancements to pointwise output stride inferencing (for broadcast operation). For non-unary operands, the broadcasted tensor can now be either at IN_0 or IN_1.

  • SDPA backward operation now allows d upto 256 for Hopper. Requires cudnn 9.5.0 and above.

Bug fixes

  • Fixed an issue while querying cudnnGetLastErrorString() from the backend. The error_t object will now have more meaningful message.

  • Fixed build issues seen with clang-19 compiler.

  • Fixed an issue where it was assumed a graph with bias in sdpa_bprop will always have a dbias.