Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: numpy 2.0 dividing np.uint8 by python int leads to OverflowError #25639

Closed
MatthewFlamm opened this issue Jan 19, 2024 · 10 comments · Fixed by #25662
Closed

BUG: numpy 2.0 dividing np.uint8 by python int leads to OverflowError #25639

MatthewFlamm opened this issue Jan 19, 2024 · 10 comments · Fixed by #25662

Comments

@MatthewFlamm
Copy link

MatthewFlamm commented Jan 19, 2024

Describe the issue:

When preparing for upgrading to numpy 2.0, I'm seeing differences in handling of division of numpy integer types with python integer types.

Reproduce the code example:

import numpy as np

a = np.uint8(15)

# results in a np.float64 in both versions (not shown explicitly in code)
print(a / 100)

# Gives the following error only in numpy 2.0
#   OverflowError: Python integer 256 out of bounds for uint8
print(a / 256)

Error message:

OverflowError: Python integer 256 out of bounds for uint8

Python and NumPy Versions:

2.0.0.dev0+git20240113.d2f60ff
3.10.13 (main, Jan 10 2024, 19:45:45) [GCC 9.4.0]

Runtime Environment:

[{'numpy_version': '2.0.0.dev0+git20240113.d2f60ff',
  'python': '3.10.13 (main, Jan 10 2024, 19:45:45) [GCC 9.4.0]',
  'uname': uname_result(system='Linux', node='codespaces-c4dd1c', release='6.2.0-1018-azure', version='#18~22.04.1-Ubuntu SMP Tue Nov 21 19:25:02 UTC 2023', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Zen',
  'filepath': '/workspaces/dotfiles/numpy_2/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so',
  'internal_api': 'openblas',
  'num_threads': 2,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23.dev'}]

Context for the issue:

Is this a planned change for numpy 2.0? I could not find any indication in the release notes or migration guide. If so, would the recommended workaround be to first cast a, in the code example, to float64 or a uint type that can handle larger values? Or perhaps convert the python int into a suitable numpy integer type?

@MatthewFlamm
Copy link
Author

In more testing, the same difference in version behavior occurs for the // operator and/or with signed numpy integer types.

@ngoldbaum
Copy link
Member

This is happening because we enabled NEP 50 behavior by default, see NEP 50 for more details: https://numpy.org/neps/nep-0050-scalar-promotion.html.

Going forward, we will no longer do value-based promotion for scalars. If you instead would like it to silently truncate, you could use a 0-d array. If you want neither, you need to make sure that both dtypes are big enough such that no overflow would happen.

@seberg
Copy link
Member

seberg commented Jan 19, 2024

That said: it would be nice to make it work for the first example (when the result is a float). That would require a careful promoter I guess.

@MatthewFlamm
Copy link
Author

Going forward, we will no longer do value-based promotion for scalars. If you instead would like it to silently truncate, you could use a 0-d array.

It isn't that obvious to me that there should be any value-based promotion needed here. I might expect this behavior using //, but a division involving two integers results in a float in Python. This is conspicuously missing in https://numpy.org/neps/nep-0050-scalar-promotion.html. Only addition and multiplication are discussed. Those are different than division in Python due to the change in type. For now since I haven't thought deeply about this, I do not feel strongly one way or the other, but I don't see any evidence to expect this is an intended effect.

Either way, I cannot find this change clearly called out in the release notes or migration guide. I suggest that this overall change, including all operator behavior, needs a much larger description there.

@seberg
Copy link
Member

seberg commented Jan 22, 2024

This is conspicuously missing in https://numpy.org/neps/nep-0050-scalar-promotion.html. Only addition and multiplication are discussed

It does say that this applies to operations "like" addition and multiplication, I can understand it is maybe confusing, but listing all affected functions also doesn't make sense (since basically everything is affected). Do you have a suggestion?

As to // it could work if the divisor is positive (or the first integer is signed), it would need quite special handling though, since I doubt we want to make to change int8 // int64 -> int8 even if that would work always.

So, while it would be great to generally just use floats for operations that are undefined on integers, I am unsure if I want to prioritize //. We do that for comparisons, but comparisons are very clear about always returning bools.

@MatthewFlamm
Copy link
Author

MatthewFlamm commented Jan 22, 2024

I am unsure if I want to prioritize //.

This makes sense to me since the operation is performed with two integers and outputs a new integer.

It does say that this applies to operations "like" addition and multiplication, I can understand it is maybe confusing, but listing all affected functions also doesn't make sense (since basically everything is affected). Do you have a suggestion?

After reading again, the line you might have quoted in the NEP helps (quoted below), although adding another example to np.multiply that isn't related to * would help more. I would suggest np.divide based on this discussion.

Note that the examples apply also to operations like multiplication, addition, comparisons, and their corresponding functions like np.multiply.

@MatthewFlamm
Copy link
Author

MatthewFlamm commented Jan 22, 2024

When I read the first figure in the NEP, I might expect this flow of promotion

np.uint8(128) / 256 -> np.uint8(128) / np.uint16(256) -> np.float64(0.5)

This works when being explicit about the type, i.e. np.uint8(128) / np.uint16(256), but not when using the python int.

Another observation is that one of the goals is that:

NumPy scalars and 0-D arrays should behave consistently with their N-D counterparts.

In my testing this is not true for / between np.uint(8) and np.array([8], dtype=np.uint8) in numpy 2.0. That is:

# a np.ndarray[np.float64] in both v1 and v2
a = np.array([15], dtype=np.uint8) / 256

# OverflowError only in v2
a = np.uint8(15) / 256

@seberg
Copy link
Member

seberg commented Jan 22, 2024

Aha, thanks for noticing that! That isn't intentional of course. Seems at least the divide ufunc (which arrays use for the / operator) works, so np.divide(np.uint8(15), 256) already goes to float64 as described (because division apparently was always defined special enough).

(There should be a few other ufuncs similar to division, although most of them are probably unary, and for unary ones there is a hack in place that should them work, IIRC.)

So we need to align the scalar division. Not quite sure how easy that is (the scalar code is confusing), although in the worst case we probably can make it punt to the ufunc (which is however much slower).

@seberg
Copy link
Member

seberg commented Jan 22, 2024

It's annoying. This works, by making things slow (it also makes float64(5) // 10**100 slow (i.e. using the full ufunc), but only when the integer doesn't fit int64. Likely that is just the solution:

diff --git a/numpy/_core/src/umath/scalarmath.c.src b/numpy/_core/src/umath/scalarmath.c.src
index 5b60ebff16..3cf551cb51 100644
--- a/numpy/_core/src/umath/scalarmath.c.src
+++ b/numpy/_core/src/umath/scalarmath.c.src
@@ -1291,6 +1291,18 @@ static PyObject *
              */
 #if defined(IS_longdouble) || defined(IS_clongdouble)
             Py_RETURN_NOTIMPLEMENTED;
+#endif
+        case CONVERT_PYSCALAR:
+            /*
+             * Python float, int, complex.  For true-divide, we fall through
+             * to the ufunc (we reach here for integers mainly).
+             * TODO: This is slow, but makes `uint(8) / 10` work.
+             */
+#ifndef IS_true_divide
+            if (@NAME@_setitem(other, (char *)&other_val, NULL) < 0) {
+                return NULL;
+            }
+            break;
 #endif
         case PROMOTION_REQUIRED:
             /*
@@ -1302,11 +1314,6 @@ static PyObject *
              *       correctly.  (e.g. `uint8 * int8` cannot warn).
              */
             return PyGenericArrType_Type.tp_as_number->nb_@oper@(a,b);
-        case CONVERT_PYSCALAR:
-            if (@NAME@_setitem(other, (char *)&other_val, NULL) < 0) {
-                return NULL;
-            }
-            break;
         default:
             assert(0);  /* error was checked already, impossible to reach */
             return NULL;

@ngoldbaum
Copy link
Member

ngoldbaum commented Jan 25, 2024

There is a PR to add a section on this to the migration guide, if you would like to comment, see #25683. Thanks for the prompting to add it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants