Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent unbounded growth of sparse tensor in add operation #36030

Closed
wants to merge 3 commits into from

Conversation

peterbell10
Copy link
Collaborator

Fixes #34964

Sparse cuda add was implemented by just concatenating the indices and values for the tensor. If called repeatedly in a tight loop this will let nnz grow unbounded. In the worst case of x.add_(x) it grows exponentially.

@dr-ci
Copy link

dr-ci bot commented Apr 4, 2020

💊 Build failures summary and remediations

As of commit cfa142e (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages (reran 1 job to discount flakiness):

See CircleCI build pytorch_linux_bionic_py3_6_clang9_test (1/2)

Step: "Test" (full log | pattern match details | 🔁 rerun) <confirmed not flaky by 2 failures>

Apr 30 18:58:23 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at /var/lib/jenkins/workspace/aten/src/TH/generic/THTensorMath.cpp:41
Apr 30 18:58:20   test_debug_info (__main__.DistAutogradTestWithSpawn) ... skip (0.004s) 
Apr 30 18:58:21   test_dist_autograd_profiling (__main__.DistAutogradTestWithSpawn) ... ok (1.124s) 
Apr 30 18:58:22   test_embedding_bag_with_no_grad_tensors (__main__.DistAutogradTestWithSpawn) ... [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
Apr 30 18:58:22 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
Apr 30 18:58:22 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
Apr 30 18:58:22 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
Apr 30 18:58:22 ok (1.323s) 
Apr 30 18:58:23   test_error_in_context (__main__.DistAutogradTestWithSpawn) ... [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at /var/lib/jenkins/workspace/aten/src/TH/generic/THTensorMath.cpp:41 
Apr 30 18:58:23 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at /var/lib/jenkins/workspace/aten/src/TH/generic/THTensorMath.cpp:41 
Apr 30 18:58:23 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at /var/lib/jenkins/workspace/aten/src/TH/generic/THTensorMath.cpp:41 
Apr 30 18:58:23 [E request_callback_impl.cpp:99] Received error while processing request type 15: size mismatch, m1: [3 x 3], m2: [6 x 6] at /var/lib/jenkins/workspace/aten/src/TH/generic/THTensorMath.cpp:41 
Apr 30 18:58:23 ok (1.122s) 
Apr 30 18:58:24   test_grad_copy_sparse_indices_extra_ref (__main__.DistAutogradTestWithSpawn) ... [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
Apr 30 18:58:24 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
Apr 30 18:58:24 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
Apr 30 18:58:24 [W pybind_utils.h:712] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) 
Apr 30 18:58:24 /opt/conda/lib/python3.6/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 
Apr 30 18:58:24   warnings.warn("Argument order of nn.functional.embedding_bag was changed. " 
Apr 30 18:58:24 /opt/conda/lib/python3.6/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 
Apr 30 18:58:24   warnings.warn("Argument order of nn.functional.embedding_bag was changed. " 
Apr 30 18:58:24 /opt/conda/lib/python3.6/site-packages/torch/nn/functional.py:1850: UserWarning: Argument order of nn.functional.embedding_bag was changed. Usage `embedding_bag(weight, input, ...)` is deprecated, and should now be `embedding_bag(input, weight, ...)`. 

See CircleCI build pytorch_linux_xenial_py3_clang5_mobile_custom_build_static (2/2)

Step: "Build" (full log | pattern match details | 🔁 rerun) <confirmed not flaky by 2 failures>

error pulling image configuration: Get https://prod-us-east-1-starport-layer-bucket.s3.us-east-1.amazonaws.com/307d-308535385114-c0b158d2-8c22-ad64-0178-fb03bd1a4b33/a0352ba9-620b-4257-a272-ddd51dadf83c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200430T180323Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3599&X-Amz-Credential=AKIAI7KZ4NTCV2EWBNUQ%2F20200430%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=9f404b1baa536bf5b9fd85ac8dd9f08d9f95ae8b65bf3c921e47ddbdbb3fbabe: EOF
DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:8fcf46ef-4a34-480b-a8ee-b0a30a4d3e59 
error pulling image configuration: Get https://prod-us-east-1-starport-layer-bucket.s3.us-east-1.amazonaws.com/307d-308535385114-c0b158d2-8c22-ad64-0178-fb03bd1a4b33/a0352ba9-620b-4257-a272-ddd51dadf83c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200430T180323Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3599&X-Amz-Credential=AKIAI7KZ4NTCV2EWBNUQ%2F20200430%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=9f404b1baa536bf5b9fd85ac8dd9f08d9f95ae8b65bf3c921e47ddbdbb3fbabe: EOF 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 12 times.

@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 6, 2020
zou3519
zou3519 previously approved these changes Apr 6, 2020
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a reasonable heuristic. Does CPU sparse tensor add have this problem too?

@peterbell10
Copy link
Collaborator Author

I wasn't getting the same growth with CPU sparse tensors. Looking at the code though, add_out_sparse_non_contiguous also just concatenates the values,

LongTensor r_indices = at::cat({t._indices(), src._indices()}, 1);
Tensor r_values = at::cat({t_values, s_values}, 0).to(r.scalar_type());
alias_into_sparse(r, r_indices, r_values);

However, it's only triggered for sparse tensors with non-contiguous indices or value tensors. The CPU add for contiguous index & value tensors seems to do full addition.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Apr 6, 2020

Oh, it's the uncoalesced addition situation. I'm not fundamentally opposed to some sort heuristic here, but I want to explore a few other options first. It sounds like part of the problem is that CUDA and CPU don't have equivalent coalescing behavior. Do you think it would be difficult for CUDA to be made to behave the same way as CPU (I could believe this is hard, due to CUDA's model, but it would be helpful if you could confirm.)

@ezyang
Copy link
Contributor

ezyang commented Apr 6, 2020

I looked over the heuristic and I think it's pretty good.

@zou3519 zou3519 dismissed their stale review April 6, 2020 20:24

Dismissing my review based on @ezyang's request to explore other options first

@peterbell10
Copy link
Collaborator Author

The CPU implementation does a variation of merging two sorted lists. A similar thing could be done for coalesced inputs in CUDA using thrust::merge_by_key and then coalescing adjacent values. However, this needs multiple passes to do the final coalescing step. I had started implementing this but noticed this comment:

We deliberately choose to simply concat the indices and values tensors rather than merging them. This removes the need to synchronously fetch nnz at the end of the operation, at the cost of having a non-coalesced result. This trade-off is preferable for the common use-case of gradient accumulation.

One thought could be to concat for small nnz where synchronisation would be more costly than the kernel itself. But do a full merge for large nnz, avoiding the memory growth issues. Although, that's not really that much different from this PR; except that it does a merge_by_key instead of sort_by_key.

@peterbell10
Copy link
Collaborator Author

peterbell10 commented Apr 30, 2020

@ezyang, @zou3519 is there anything left to do here?

@zou3519
Copy link
Contributor

zou3519 commented Apr 30, 2020

Sorry, landing this fell through the cracks. Could you rebase the PR, @peterbell10, just to get test signal again, and then I'll land this once the tests look fine?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zou3519
Copy link
Contributor

zou3519 commented May 1, 2020

Test failures look unrelated

@facebook-github-bot
Copy link
Contributor

@zou3519 merged this pull request in 675b3fc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gradient update of a sparse matrix results in a memory leak
5 participants