Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

naturalsize np.int32 multiplication overflow #217

Closed
Toprak2 opened this issue Nov 5, 2024 · 3 comments · Fixed by #218
Closed

naturalsize np.int32 multiplication overflow #217

Toprak2 opened this issue Nov 5, 2024 · 3 comments · Fixed by #218
Labels
bug Something isn't working

Comments

@Toprak2
Copy link
Contributor

Toprak2 commented Nov 5, 2024

What did you do?

I was using the torchio library, which relies on humanize to return the memory size of image arrays. The images I processed had dimensions 512x512x166, with each pixel being a 32-bit (4-byte) integer.

What did you expect to happen?

torchio calls the naturalsize function with the occupied memory size in bytes and sets binary=True. Manually calculating the expected value:
512×512×166×4÷(1024×1024)=166 MiB

So, I expected the function to return approximately 166 MiB.

What actually happened?

Instead, the returned value was -2 MiB, accompanied by this warning:

RuntimeWarning: overflow encountered in scalar multiply
ret: str = format % ((base * bytes_ / unit)) + s

This overflow occurs because the input to the function was of type np.int32 instead of Python's native int. Since np.int32 has a maximum value of 2^31−1, the multiplication of base and bytes_ results in an overflow.

Steps to Reproduce

import humanize
import numpy as np

print(humanize.naturalsize(512*512*166*4, binary=True))  
# Expected: 166.0 MiB
# Works as expected with Python’s built-in int type

print(humanize.naturalsize(np.int32(512*512*166*4), binary=True))
# Returns: -2.0 MiB
# RuntimeWarning: overflow encountered in scalar multiply ret: str = format % ((base * bytes_ / unit)) + s

Proposed Solutions

  1. Change the Order of Operations
    Adjusting the order of operations can avoid overflow. In the current line:
ret: str = format % ((base * bytes_ / unit)) + s

when bytes_ is np.int32, multiplying base and bytes_ produces an np.int32 result, which overflows before it’s divided by unit. By dividing either base or bytes_ by unit before the multiplication, each sub-operation remains a float:

ret: str = format % ((base * (bytes_ / unit))) + s
# or
ret: str = format % ((base / unit * bytes_)) + s
  1. Convert Input to Float
    Alternatively, cast value to float without checking its type. Currently, the casting applies only if value is a string:
# Current approach
if isinstance(value, str):
    bytes_ = float(value)
else:
    bytes_ = value

Updating it to cast all inputs to float could resolve the issue:

bytes_ = float(value)

I haven’t created a pull request since I’m unsure if you’d prefer developers to ensure input compatibility or handle this within the function.
Environment

OS: Windows 11
Python: 3.12.1
Humanize: 4.11.0
Numpy: 1.26.3
@hugovk hugovk added the bug Something isn't working label Nov 6, 2024
@hugovk
Copy link
Member

hugovk commented Nov 6, 2024

Thank you for the well explained report!

As you suggest, we could say that np.int32 inputs are not supported and the caller should make sure they're using an int, float or str, as documented, but the suggested fixes appear quite straightforward. I'll reserve the right to revert if it causes problems in the future :)

Let's go for option 1, because option 2 fails this test:

    def test_naturalsize(test_args: list[int] | list[int | bool], expected: str) -> None:
>       assert humanize.naturalsize(*test_args) == expected
E       AssertionError: assert '1000.0 ZB' == '1.0 YB'
E
E         - 1.0 YB
E         + 1000.0 ZB

Would you like to open a PR?

@hugovk hugovk changed the title naturalsize np.int32 Multiplication Overflow naturalsize np.int32 multiplication overflow Nov 6, 2024
@Toprak2
Copy link
Contributor Author

Toprak2 commented Nov 6, 2024

Thank you for reviewing the issue quickly.

I’ll proceed with implementing option 1 and will create a PR soon. If there are any additional tests you’d like me to add, please let me know.

@hugovk
Copy link
Member

hugovk commented Nov 6, 2024

Yeah, I was considering if we should add a test using NumPy, but np.int32 isn't really a supported input, and installing NumPy can be tricky during the Python pre-release phase when there's no wheel, so maybe not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants