-
Notifications
You must be signed in to change notification settings - Fork 13.4k
CUDA: fix numerical issue in tile FA kernel #16540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: fix numerical issue in tile FA kernel #16540
Conversation
525916a
to
ba62ea9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes themselves look fine and i think its a good idea to do this, even if the the intimidate precision issue lies elsewhere, which i did not verify.
I definitely did observe issues with the numerical range when I debugged this, there seems to be an additional bug that specifically occurs |
ba62ea9
to
7dcee04
Compare
* origin/master: (32 commits) metal : FA support F32 K and V and head size = 32 (ggml-org#16531) graph : support cacheless embeddings with FA and iSWA (ggml-org#16528) opencl: fix build targeting CL 2 (ggml-org#16554) CUDA: fix numerical issues in tile FA kernel (ggml-org#16540) ggml : fix build broken with -march=armv9-a on MacOS (ggml-org#16520) CANN: fix CPU memory leak in CANN backend (ggml-org#16549) fix: add remark plugin to render raw HTML as literal text (ggml-org#16505) metal: add support for opt_step_sgd (ggml-org#16539) ggml : fix scalar path for computing norm (ggml-org#16558) CANN: Update several operators to support FP16 data format (ggml-org#16251) metal : add opt_step_adamw and op_sum (ggml-org#16529) webui: remove client-side context pre-check and rely on backend for limits (ggml-org#16506) [SYCL] fix UT fault cases: count-equal, argsort, pad OPs (ggml-org#16521) ci : add Vulkan on Ubuntu with default packages build (ggml-org#16532) common : handle unicode during partial json parsing (ggml-org#16526) common : update presets (ggml-org#16504) ggml : Fix FP16 ELU positive branch (ggml-org#16519) hparams : add check for layer index in is_recurrent (ggml-org#16511) ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (ggml-org#16518) CUDA: faster tile FA, add oob checks, more HSs (ggml-org#16492) ...
This reverts commit 7049736.
This reverts commit 7049736.
This reverts commit 7049736.
Fixes issue described in #16528 (comment) .
The problem as far as I can tell are numerical issues for the rescaling of the VKQ accumulators with the inverse of the KQ sum at the end of the kernel. The input values in
test-backend-ops
and the models I tested did not provoke this issue so I did not detect it in #16492 . The fix is to simply use FP32 arithmetic, the impact on performance is negligible since it's only done once per CUDA block.