Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Euclidean distance transform: fix bad choice of block parameters #393

Merged

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Aug 22, 2022

closes #392

This PR fixes #392 and also makes it more friendly for use with user-provided block_params. In general, most users should not be providing that argument, but it can be used to compare different settings for performance optimization. In case of user-provided block_params, the implementation now automatically pad the shape to an appropriate least common multiple of the warp_size and the m1, m2 and m3 block parameters.

More extensive unit tests over a range of image sizes and block_params settings are now implemented.

@grlee77 grlee77 added bug Something isn't working non-breaking Introduces a non-breaking change labels Aug 22, 2022
@grlee77 grlee77 added this to the v22.08.01 milestone Aug 22, 2022
@grlee77 grlee77 requested a review from a team as a code owner August 22, 2022 13:56
@grlee77 grlee77 changed the title fix bad choice of m1 parameter for larger 2D image sizes Euclidean distance transform: fix bad choice of block parameters Aug 22, 2022
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Greg! 🙏

Had a couple comments below

Comment on lines +19 to +20
def _lcm(a, b):
return abs(b * (a // math.gcd(a, b)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumPy also has an lcm implementation FWIW

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I didn't know that. This way is not quite as fast as math.lcm for scalar a, b but still faster than numpy.lcm in that case.

Comment on lines +22 to +23
@functools.lru_cache()
def lcm(*args):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what the motivation is for caching here

Copy link
Contributor Author

@grlee77 grlee77 Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using math.lcm directly in Python 3.9+ it is much faster (~150 ns vs. 2.8 µs for this fallback) so the cache was just to avoid that overhead prior to kernel launch. I think in practice it is not very important and can remove it if you prefer.

I figured often the (m1, m2, m3) block parameters would usually be the same. This function is not used at all when block_params is left at the default of None, though.

m1, m2, m3 = block_params
if any(p < 1 for p in block_params):
raise ValueError("(m1, m2, m3) in blockparams must be >= 1")
m1, m2, m3 = map(int, block_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these allowed to be floats or something? Should we do a type check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they definitely should be integers. I am not sure why I had that explicit map call. I can just remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-float m1, m2, m3 the user will get an error on kernel launch as CUDA will not accept non-integer block/grid size. I think that is fine. In general the recommendation is to use the default block_params=None and use the automated m1, m2, m3 that gets chosen in that case.

Copy link
Contributor

@gigony gigony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me besides John's comments and a minor question.

# m2 must also be a power of two
m2 = 2**math.floor(math.log2(m2))
if padded_size % m2 != 0:
raise RuntimeError("error in setting default m2")

# should be <= 64. texture size must be a multiple of m3
# should be <= 64. image size must be a multiple of m3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a statement for checking the value is <=64 somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was an outdated/stray comment and will just remove it. The check should be relative to "block_size" as in the check shortly below that:

    if m3 > padded_size // block_size:
        raise ValueError("m3 too large. must be <= arr.shape[1] // 32")

I will update that string to use block_size instead of 32 in case we change block_size in the future.

@grlee77
Copy link
Contributor Author

grlee77 commented Sep 1, 2022

Thanks for reviewing, I think I have addressed the comments.

@jakirkham jakirkham changed the base branch from branch-22.10 to branch-22.08 September 1, 2022 15:58
@jakirkham jakirkham requested a review from a team as a code owner September 1, 2022 15:58
In case of user-provided block_params, automatically pad the shape to an
appropriate multiple.
@raydouglass raydouglass merged commit 0b4f061 into rapidsai:branch-22.08 Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] distance_transform_edt gives bad output for 2D images > 1024 in size on either axis
4 participants