Skip to content

count_nonzero #23907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
botcs opened this issue Aug 6, 2019 · 4 comments
Closed

count_nonzero #23907

botcs opened this issue Aug 6, 2019 · 4 comments
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: numpy Related to numpy support, and also numpy compatibility of our operators module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@botcs
Copy link

botcs commented Aug 6, 2019

🚀 Feature

An efficient implementation for counting nonzero elements

Pitch

However in some situations (MaskRCNN) you don't need the exact positions of the nonzero elements, but the sum of them and the method is called quite frequently. So far any workaround is faster than retrieving the indices for the elements and taking it's length.

Some may want the differentiable count of these values, which effectively requires to not use the current nonzero method.

Related links:
It was previously mentioned on Discuss [1 2] and on #14848 #15190

Alternatives

import torch
x = torch.randint(2, [20, 1080, 1920]) # e.g. 20 binary mask maps in int64

[len(torch.nonzero(x[i])) for i in range(len(x))]  # 311 ms ± 11.5 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
torch.sum(x != 0, dim=[1,2])                       # 136 ms ± 13.6 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
torch.sum(x.clamp(0, 1), dim=[1,2])                # 49.9 ms ± 5.34 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

On a 1080Ti, these times are respectively 10.9 ms, 3.54 ms, 2.68 ms (used torch.cuda.synchronize before and after operation)

Additional context

A few other non-trivial things that popped up when I dived in finding out what is the fastest way:

  1. torch.clamp_max() and torch.clamp_min() is 5x times slower than torch.clamp(). Time on x 261 ms ± 78 ms, 202 ms ± 17.8 ms, 47.1 ms ± 437 µs)
  2. There's no significant difference if I use uint8 or int64 dtype

Thanks @gchanan for asking to report this issue, hope this will help others.

cc @VitalyFedyunin @ngimel @mruberry

@ailzhang ailzhang added module: operators module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 7, 2019
@gchanan gchanan added the enhancement Not as big of a feature, but technically not a bug. Should be easy to fix label Oct 24, 2019
@mruberry
Copy link
Collaborator

mruberry commented Oct 24, 2019

This is a great and natural idea, but...

  1. It seems straightforward to create your own (suboptimal) version of this function, like in your examples.
  2. Even if there was a version of torch.count_nonzero() that ran instantly, how much would e2e MaskRCNN training time be reduced?

PyTorch is designed to let you build functions like count_nonzero. Will users see an impactful benefit from PyTorch providing an implementation instead?

@botcs
Copy link
Author

botcs commented Oct 26, 2019

thanks for the questions @mruberry

  1. I doubt that it is intuitive, and as I was able to come up with multiple implementations that have the same effect but came with different efficiency, it serves as a good proof that it is not straightforward.
  2. This problem scales. When an image can be inferred in approx 1 sec (upper bound for maskrcnn) having a 261 msec per instance count_nonzero algorithm for evaluating 100k images, it scales up and adds 7.25 extra hours.

PyTorch is designed to let you build functions like count_nonzero. Will users see an impactful benefit from PyTorch providing an implementation instead?

I really appreciate the flexibility and expressivity of PyTorch, but I have encountered many make-shift count non-zeros while browsing codes that requires an extra line of description in the comments. Also NumPy has a built-in

@mruberry
Copy link
Collaborator

@botcs You make a great case.

@mruberry mruberry added the module: numpy Related to numpy support, and also numpy compatibility of our operators label Apr 27, 2020
@mruberry
Copy link
Collaborator

We have added torch.count_nonzero, which I believe addresses this issue. See: https://pytorch.org/docs/master/generated/torch.count_nonzero.html?highlight=count_nonzero#torch.count_nonzero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: numpy Related to numpy support, and also numpy compatibility of our operators module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants