Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-121115: Skip __index__ in PyLong_AsNativeBytes by default #121118

Merged
merged 4 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions Doc/c-api/long.rst
Original file line number Diff line number Diff line change
Expand Up @@ -405,14 +405,13 @@ distinguished from a number. Use :c:func:`PyErr_Occurred` to disambiguate.

Passing zero to *n_bytes* will return the size of a buffer that would
be large enough to hold the value. This may be larger than technically
necessary, but not unreasonably so.
necessary, but not unreasonably so. If *n_bytes=0*, *buffer* may be
``NULL``.

.. note::

Passing *n_bytes=0* to this function is not an accurate way to determine
the bit length of a value.

If *n_bytes=0*, *buffer* may be ``NULL``.
the bit length of the value.

To get at the entire Python value of an unknown size, the function can be
called twice: first to determine the buffer size, then to fill it::
Expand Down Expand Up @@ -462,6 +461,7 @@ distinguished from a number. Use :c:func:`PyErr_Occurred` to disambiguate.
.. c:macro:: Py_ASNATIVEBYTES_NATIVE_ENDIAN ``3``
.. c:macro:: Py_ASNATIVEBYTES_UNSIGNED_BUFFER ``4``
.. c:macro:: Py_ASNATIVEBYTES_REJECT_NEGATIVE ``8``
.. c:macro:: Py_ASNATIVEBYTES_ALLOW_INDEX ``16``
============================================= ======

Specifying ``Py_ASNATIVEBYTES_NATIVE_ENDIAN`` will override any other endian
Expand All @@ -483,6 +483,13 @@ distinguished from a number. Use :c:func:`PyErr_Occurred` to disambiguate.
provided there is enough space for at least one sign bit, regardless of
whether ``Py_ASNATIVEBYTES_UNSIGNED_BUFFER`` was specified.

If ``Py_ASNATIVEBYTES_ALLOW_INDEX`` is specified and a non-integer value is
passed, its :meth:`~object.__index__` method will be called first. This may
result in Python code executing and other threads being allowed to run, which
could cause changes to other objects or values in use. When *flags* is
``-1``, this option is not set, and non-integer values will raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect it to be set by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default behaviour is defined as "most like C", and this is unlike C. Having it enabled by default also risks causing crashes, whereas having it not set only results in a TypeError.

But I initially started expecting it to be set by default too. It only took a bit of thinking about it to convince myself otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "most like C" mean in context of conversion from Python integer to C integer? I expected it to be like most common converters PyLong_AsLong() and PyArg_Parse("i"), but seems I understand it wrong.

__index__ should be called by default, because the purpose of the __index__ method is that integer-like object should be able to substitute a real int in almost all cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "most like C" mean in context of conversion from Python integer to C integer?

It's deliberately vague, but a C cast is not going to trigger other code to run (unlike a C++ cast), and it's going to succeed even if it truncated the value and lost information. There are ways to detect this, and the existing functions force you to detect it, but AsNativeBytes deliberately allows you to bypass those if you know what you're doing.

__index__ should be called by default, because the purpose of the __index__ method is that integer-like object should be able to substitute a real int in almost all cases.

As a low level API, this can go in the "almost" situation. If this goes in, then Victor's PR with higher-level APIs for specific types should start passing this flag to get that behaviour.

As I said above, if you call this wrong and the default is to reenter Python, you'll get a hard crash. If you call this wrong and the default is to raise TypeError, you get a safe exception. We should choose the safe path for the default. (I think we should choose it for the other APIs as well, but am willing to concede different logic for high/low level APIs.)

Copy link
Member

@serhiy-storchaka serhiy-storchaka Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that users which will use this API for concrete C integer type will miss such detail and produce extensions that only work with int and subclasses, but not with other integer-like objects.

I have no objection if this is the intention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the intent. Their users will get a TypeError when it matters and can request a change.

:exc:`TypeError`.

.. note::

With the default *flags* (``-1``, or *UNSIGNED_BUFFER* without
Expand Down
7 changes: 5 additions & 2 deletions Include/cpython/longobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ PyAPI_FUNC(PyObject*) PyLong_FromUnicodeObject(PyObject *u, int base);
#define Py_ASNATIVEBYTES_NATIVE_ENDIAN 3
#define Py_ASNATIVEBYTES_UNSIGNED_BUFFER 4
#define Py_ASNATIVEBYTES_REJECT_NEGATIVE 8
#define Py_ASNATIVEBYTES_ALLOW_INDEX 16

/* PyLong_AsNativeBytes: Copy the integer value to a native variable.
buffer points to the first byte of the variable.
Expand All @@ -20,8 +21,10 @@ PyAPI_FUNC(PyObject*) PyLong_FromUnicodeObject(PyObject *u, int base);
* 2 - native endian
* 4 - unsigned destination (e.g. don't reject copying 255 into one byte)
* 8 - raise an exception for negative inputs
If flags is -1 (all bits set), native endian is used and value truncation
behaves most like C (allows negative inputs and allow MSB set).
* 16 - call __index__ on non-int types
If flags is -1 (all bits set), native endian is used, value truncation
behaves most like C (allows negative inputs and allow MSB set), and non-int
objects will raise a TypeError.
Big endian mode will write the most significant byte into the address
directly referenced by buffer; little endian will write the least significant
byte into that address.
Expand Down
11 changes: 9 additions & 2 deletions Lib/test/test_capi/test_long.py
Original file line number Diff line number Diff line change
Expand Up @@ -496,8 +496,9 @@ def test_long_asnativebytes(self):
"PyLong_AsNativeBytes(v, <unknown>, 0, -1)")
self.assertEqual(buffer, b"\x5a",
"buffer overwritten when it should not have been")
# Also check via the __index__ path
self.assertEqual(expect, asnativebytes(Index(v), buffer, 0, -1),
# Also check via the __index__ path.
# We pass Py_ASNATIVEBYTES_NATIVE_ENDIAN | ALLOW_INDEX
self.assertEqual(expect, asnativebytes(Index(v), buffer, 0, 3 | 16),
"PyLong_AsNativeBytes(Index(v), <unknown>, 0, -1)")
self.assertEqual(buffer, b"\x5a",
"buffer overwritten when it should not have been")
Expand Down Expand Up @@ -607,6 +608,12 @@ def test_long_asnativebytes(self):
with self.assertRaises(ValueError):
asnativebytes(-1, buffer, 0, 8)

# Ensure omitting Py_ASNATIVEBYTES_ALLOW_INDEX raises on __index__ value
with self.assertRaises(TypeError):
asnativebytes(Index(1), buffer, 0, -1)
with self.assertRaises(TypeError):
asnativebytes(Index(1), buffer, 0, 3)

# Check a few error conditions. These are validated in code, but are
# unspecified in docs, so if we make changes to the implementation, it's
# fine to just update these tests rather than preserve the behaviour.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
:c:func:`PyLong_AsNativeBytes` no longer uses :meth:`~object.__index__`
methods by default. The ``Py_ASNATIVEBYTES_ALLOW_INDEX`` flag has been added
to allow it.
6 changes: 5 additions & 1 deletion Objects/longobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -1128,13 +1128,17 @@ PyLong_AsNativeBytes(PyObject* vv, void* buffer, Py_ssize_t n, int flags)
if (PyLong_Check(vv)) {
v = (PyLongObject *)vv;
}
else {
else if (flags != -1 && (flags & Py_ASNATIVEBYTES_ALLOW_INDEX)) {
v = (PyLongObject *)_PyNumber_Index(vv);
if (v == NULL) {
return -1;
}
do_decref = 1;
}
else {
PyErr_Format(PyExc_TypeError, "expect int, got %T", vv);
return -1;
}

if ((flags != -1 && (flags & Py_ASNATIVEBYTES_REJECT_NEGATIVE))
&& _PyLong_IsNegative(v)) {
Expand Down
Loading