-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Euclidean distance transform: fix bad choice of block parameters #393
Euclidean distance transform: fix bad choice of block parameters #393
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Greg! 🙏
Had a couple comments below
def _lcm(a, b): | ||
return abs(b * (a // math.gcd(a, b))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumPy also has an lcm
implementation FWIW
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I didn't know that. This way is not quite as fast as math.lcm
for scalar a
, b
but still faster than numpy.lcm
in that case.
@functools.lru_cache() | ||
def lcm(*args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what the motivation is for caching here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using math.lcm
directly in Python 3.9+ it is much faster (~150 ns vs. 2.8 µs for this fallback) so the cache was just to avoid that overhead prior to kernel launch. I think in practice it is not very important and can remove it if you prefer.
I figured often the (m1, m2, m3) block parameters would usually be the same. This function is not used at all when block_params
is left at the default of None, though.
m1, m2, m3 = block_params | ||
if any(p < 1 for p in block_params): | ||
raise ValueError("(m1, m2, m3) in blockparams must be >= 1") | ||
m1, m2, m3 = map(int, block_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these allowed to be float
s or something? Should we do a type check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they definitely should be integers. I am not sure why I had that explicit map
call. I can just remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-float m1, m2, m3 the user will get an error on kernel launch as CUDA will not accept non-integer block/grid size. I think that is fine. In general the recommendation is to use the default block_params=None
and use the automated m1, m2, m3 that gets chosen in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me besides John's comments and a minor question.
# m2 must also be a power of two | ||
m2 = 2**math.floor(math.log2(m2)) | ||
if padded_size % m2 != 0: | ||
raise RuntimeError("error in setting default m2") | ||
|
||
# should be <= 64. texture size must be a multiple of m3 | ||
# should be <= 64. image size must be a multiple of m3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a statement for checking the value is <=64
somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was an outdated/stray comment and will just remove it. The check should be relative to "block_size" as in the check shortly below that:
if m3 > padded_size // block_size:
raise ValueError("m3 too large. must be <= arr.shape[1] // 32")
I will update that string to use block_size
instead of 32 in case we change block_size in the future.
Thanks for reviewing, I think I have addressed the comments. |
In case of user-provided block_params, automatically pad the shape to an appropriate multiple.
27bf8a6
to
6227ba2
Compare
closes #392
This PR fixes #392 and also makes it more friendly for use with user-provided
block_params
. In general, most users should not be providing that argument, but it can be used to compare different settings for performance optimization. In case of user-providedblock_params
, the implementation now automatically pad the shape to an appropriate least common multiple of the warp_size and them1
,m2
andm3
block parameters.More extensive unit tests over a range of image sizes and
block_params
settings are now implemented.