Fix cuda configuration error. #570

oscarkey · 2025-10-28T13:33:15Z

Port https://github.com/PriorLabs/TabPFN-private/pull/46 to public:

remove flash attention package use
chunck scaled_dot_product_attention

This might mean people who installed their own flash attention get slower inference. The difference should be non existent/small for more recent PyTorch, but would be big for anyone using PyTorch 2.1 where FA2 isn't built in. But probably no one is doing this?

Port PriorLabs/TabPFN-private#46 to public. * remove flash attention package use * chunck scaled_dot_product_attention This might mean people who installed their own flash attention get slower inference, particularly anyone using PyTorch 2.1, where FA2 isn't built in. But I doubt anyone is doing this? Co-authored-by: Oscar Key <oscar@priorlabs.ai>

gemini-code-assist

Code Review

This pull request removes the dependency on the flash-attn package and introduces a chunking mechanism for scaled_dot_product_attention to work around a CUDA kernel limitation with large batch sizes. This is a good simplification that improves robustness. The changes are well-implemented and include new tests. I've identified one edge case where an empty batch would cause a runtime error and have suggested a fix along with an additional test case to cover it. Otherwise, the changes look solid.

src/tabpfn/architectures/base/attention/full_attention.py

tests/test_model/test_attention.py

LeoGrin

LGTM thanks!

This might mean people who installed their own flash attention get slower inference. The difference should be non existent/small for more recent PyTorch, but would be big for anyone using PyTorch 2.1 where FA2 isn't built in. But probably no one is doing this?

Yes that seems fine to me. I'm wondering if we want to force torch 2.2. On the one hand it's quite recent (jan 2024), on the other hand people using torch 2.1 will have a very bad experience 🤔

oscarkey · 2025-10-28T14:17:34Z

I would be in favour of forcing 2.2, but I created https://linear.app/priorlabs/issue/RES-813/drop-torch-21warn-if-used-on-big-datasets

Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com>

oscarkey requested a review from LeoGrin October 28, 2025 13:33

oscarkey requested a review from a team as a code owner October 28, 2025 13:33

gemini-code-assist bot reviewed Oct 28, 2025

View reviewed changes

src/tabpfn/architectures/base/attention/full_attention.py Show resolved Hide resolved

tests/test_model/test_attention.py Show resolved Hide resolved

LeoGrin approved these changes Oct 28, 2025

View reviewed changes

oscarkey merged commit 93f8673 into main Oct 28, 2025
10 checks passed

oscarkey deleted the ok-sync-attention branch October 28, 2025 14:18

oscarkey pushed a commit that referenced this pull request Nov 12, 2025

[from public #570] [Skipped as originating from private] (#210)

2b9e2f3

Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix cuda configuration error. #570

Fix cuda configuration error. #570

Uh oh!

oscarkey commented Oct 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

LeoGrin left a comment •

edited

Loading

Uh oh!

oscarkey commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix cuda configuration error. #570

Fix cuda configuration error. #570

Uh oh!

Conversation

oscarkey commented Oct 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

LeoGrin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oscarkey commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LeoGrin left a comment •

edited

Loading