Skip to content

Conversation

@tianmu-li
Copy link

@tianmu-li tianmu-li commented Apr 22, 2025

The current way of calling the op directly prevents neural-compressor from patching the fp8 module correctly, as it relies on ModuleFusedSDPA being present: https://github.com/intel/neural-compressor/blob/master/neural_compressor/torch/algorithms/fp8_quant/utils/patched_module_restore_registry.py#L128

@madamczyk-intel
Copy link

madamczyk-intel commented Apr 23, 2025

Hi. Thanks for the PR, It's already covered in #1086 and #1087 that are waiting for merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants