-
Notifications
You must be signed in to change notification settings - Fork 23.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent unbounded growth of sparse tensor in add operation #36030
Conversation
💊 Build failures summary and remediationsAs of commit cfa142e (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following build failures do not appear to be due to upstream breakages (reran 1 job to discount flakiness):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a reasonable heuristic. Does CPU sparse tensor add have this problem too?
I wasn't getting the same growth with CPU sparse tensors. Looking at the code though, pytorch/aten/src/ATen/native/sparse/SparseTensorMath.cpp Lines 510 to 512 in 82d58ed
However, it's only triggered for sparse tensors with non-contiguous indices or value tensors. The CPU add for contiguous index & value tensors seems to do full addition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Oh, it's the uncoalesced addition situation. I'm not fundamentally opposed to some sort heuristic here, but I want to explore a few other options first. It sounds like part of the problem is that CUDA and CPU don't have equivalent coalescing behavior. Do you think it would be difficult for CUDA to be made to behave the same way as CPU (I could believe this is hard, due to CUDA's model, but it would be helpful if you could confirm.) |
I looked over the heuristic and I think it's pretty good. |
Dismissing my review based on @ezyang's request to explore other options first
The CPU implementation does a variation of merging two sorted lists. A similar thing could be done for coalesced inputs in CUDA using
One thought could be to |
Sorry, landing this fell through the cracks. Could you rebase the PR, @peterbell10, just to get test signal again, and then I'll land this once the tests look fine? |
Only triggered when sparse tensor indices or values are non-contiguous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Test failures look unrelated |
Fixes #34964
Sparse cuda add was implemented by just concatenating the indices and values for the tensor. If called repeatedly in a tight loop this will let
nnz
grow unbounded. In the worst case ofx.add_(x)
it grows exponentially.