-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-102471, PEP 757: Add PyLong import and export API #121339
Conversation
See also issue #111415 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my 2c.
The gmpy2 code, used for benchmarking, can be found in my fork:
https://github.com/skirpichev/gmpy/tree/trying-py-import-export
Objects/longobject.c
Outdated
PyUnstable_Long_Export(PyObject *obj, PyUnstable_LongExport *long_export) | ||
{ | ||
if (!PyLong_Check(obj)) { | ||
PyErr_Format(PyExc_TypeError, "expect int, got %T", obj); | ||
return -1; | ||
} | ||
PyLongObject *self = (PyLongObject*)obj; | ||
|
||
long_export->obj = (PyLongObject*)Py_NewRef(obj); | ||
long_export->negative = _PyLong_IsNegative(self); | ||
long_export->ndigits = _PyLong_DigitCount(self); | ||
if (long_export->ndigits == 0) { | ||
long_export->ndigits = 1; | ||
} | ||
long_export->digits = self->long_value.ob_digit; | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this mostly give a direct access to the PyLongObject - it almost as fast as using private stuff before.
Old code:
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**2' 'mpz(x)'
1000000 loops, best of 20: 232 nsec per loop
$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=10**100' 'mpz(x)'
500000 loops, best of 11: 500 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**1000' 'mpz(x)'
100000 loops, best of 20: 2.53 usec per loop
With proposed API:
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**2' 'mpz(x)'
1000000 loops, best of 20: 258 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**100' 'mpz(x)'
500000 loops, best of 20: 528 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**1000' 'mpz(x)'
100000 loops, best of 20: 2.56 usec per loop
Objects/longobject.c
Outdated
PyObject* | ||
PyUnstable_Long_Import(int negative, size_t ndigits, Py_digit *digits) | ||
{ | ||
return (PyObject*)_PyLong_FromDigits(negative, ndigits, digits); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is something I would like to avoid. This requires allocation of a temporary buffer and using memcpy. Can we offer a writable layout to use it's digits in the mpz_export directly?
Benchmarks for old code:
$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'
2000000 loops, best of 11: 111 nsec per loop
$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'
500000 loops, best of 11: 475 nsec per loop
$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'
100000 loops, best of 11: 2.39 usec per loop
With new API:
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'
2000000 loops, best of 20: 111 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'
500000 loops, best of 20: 578 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'
100000 loops, best of 20: 2.53 usec per loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires allocation of a temporary buffer and using memcpy.
Right, PyLongObject has to manage its own memory.
Can we offer a writable layout to use it's digits in the mpz_export directly?
That sounds strange from the Python point of view and make the internals "less opaque". I would prefer to leak less implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, PyLongObject has to manage its own memory.
I'm not trying to change that. More complete proposal: vstinner#4
gmpy2 patch: https://github.com/skirpichev/gmpy/tree/trying-py-import-export-v2
New benchmarks:
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'
2000000 loops, best of 20: 111 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'
500000 loops, best of 20: 509 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'
100000 loops, best of 20: 2.44 usec per loop
I would prefer to leak less implementation details.
I don't think this leak anything. It doesn't leak memory management details. PyLong_Import
will just do allocation memory. Writting digits will be job for mpz_export, as before.
Without this, it seems - there are noticeable performance regression for integers of intermediate range. Something up to 20% vs 7% on my branch.
Edit: currently, proposed PyUnstable_Long_ReleaseImport()
match PyUnstable_Long_ReleaseExport()
. Perhaps, it could be one function (say, PyUnstable_Long_ReleaseDigitArray()), but I unsure - maybe it puts some constraints on internals of the PyLongObject.
CC @tornaria, as Sage people might be interested in this feature. |
CC @oscarbenjamin, you may want this for python-flint |
I updated my PR:
|
Misc/NEWS.d/next/C API/2024-07-03-17-26-53.gh-issue-102471.XpmKYk.rst
Outdated
Show resolved
Hide resolved
Absolutely. Currently python-flint uses a hex-string as an intermediate format when converting between large int and |
@skirpichev: I added a PyLongWriter API similar to what @encukou proposed. Example: PyLongObject *
_PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits)
{
PyLongWriter *writer = PyLongWriter_Create();
if (writer == NULL) {
return NULL;
}
if (negative) {
PyLongWriter_SetSign(writer, -1);
}
Py_digit *writer_digits = PyLongWriter_AllocDigits(writer, digit_count);
if (writer_digits == NULL) {
goto error;
}
memcpy(writer_digits, digits, digit_count * sizeof(digit));
return (PyLongObject*)PyLongWriter_Finish(writer);
error:
PyLongWriter_Discard(writer);
return NULL;
} The >>> import _testcapi; _testcapi.pylong_import(0, [100, 0, 0]) is 100
True |
I mark the PR as a draft until we agree on the API. |
Yes, that looks better and should fix speed regression. I'll try to benchmark that, perhaps tomorrow. But cost is 5 (!) public functions and one new struct, additionally to |
I updated the PR to remove the |
My concern is to avoid the problem capi-workgroup/api-evolution#36 : avoid exposing |
@skirpichev: Would it be useful to add a |
@pitrou @encukou @picnixz: The Steering Council approved the PEP 757. The PR is now ready for your final review (this time, for real) :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Couple of places where I think the docs can give a little more info, and one assert that I think ought to be an exception, but otherwise feel free to ignore the rest of my comments. Consider me approved-with-minor-changes.
digit *digits = PyMem_Malloc((size_t)ndigits * sizeof(digit)); | ||
if (digits == NULL) { | ||
return PyErr_NoMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note - reading this made me realise that I've been assuming that PyMem_Malloc
sets MemoryError
itself on failure. I see now (from the code) that it doesn't. The docs aren't as explicit as they could be. Perhaps we should update something?
Topic for another discussion I guess, unless someone sees this and gets inspired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs say that we should check for NULL but agreed that we could update them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe bug is somewhere else? Newly added functions usually are explicitly say if they set an exception on error.
int overflow; | ||
#if SIZEOF_LONG == 8 | ||
long value = PyLong_AsLongAndOverflow(obj, &overflow); | ||
#else | ||
// Windows has 32-bit long, so use 64-bit long long instead | ||
long long value = PyLong_AsLongLongAndOverflow(obj, &overflow); | ||
#endif | ||
Py_BUILD_ASSERT(sizeof(value) == sizeof(int64_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ought to be equally fast and portable without any platform assumptions.
int64_t value;
int r = PyLong_AsNativeBytes(obj, &value, sizeof(value), Py_ASNATIVEBYTES_NATIVE_ENDIAN);
assert (!(r < 0 && PyErr_Occurred());
if (r <= sizeof(value)) {
// fast path success
export_long->value = value;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an extra fast path, replace sizeof(value)
with the size of a compact value (IIRC, sizeof(Py_ssize_t)-1
is the best you'll get) and then it's not going to do the bit conversion for values close to 2**64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And yes, I believe we should replace the implementation of PyLong_AsCompilerSpecificType()
functions with PyLong_AsNativeBytes(o, &retval, sizeof(retval), ...)
calls, apart from laziness and the guaranteed compatibility. Can't break it if you never change it!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is performance critical and we spend significant time to find the most efficient function calls, it's about a few nanoseconds: https://peps.python.org/pep-0757/#export-pylong-export-with-gmpy2
PyLong_AsLongAndOverflow() and PyLong_AsLongLongAndOverflow() are simple and efficient. PyLong_AsNativeBytes() is less complex if I recall correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason for using this API was "no-error" contract. Maybe we can specify that in docs as a CPython implementation detail? IIUC, we are free to change such things in new releases without deprecation period.
Note also, that gmpy2 benchmarks measure not just CPython side, but the whole conversion path (int->gmpy2.mpz in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're no more simple or efficient than AsNativeBytes
, and with the options I specified and the fact we know the value is PyLong_Check
ed then there are no exceptions possible from AsNativeBytes
(you'll notice in my alternate code I kept the assert
for exceptions).
If we're actually going for greatest speed, then we should be checking for a compact value here anyway, and exposing values too big for a CV but still within 64-bits as digits, rather than converting them at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, we tested several variants. Here some benchmarks:
#121339 (comment)
FYI, "current implementation" essentially refers to d70a121 (inlined version of PyLong_AsInt64).
@zooba, do you think we need to repeat tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a benchmark: PyLong_AsNativeBytes()
is 1.12x slower overall (up to 1.27x slower in the worst case).
$ python -m pyperf compare_to ref.json as_native.json --table
+----------------+---------+-----------------------+
| Benchmark | ref | as_native |
+================+=========+=======================+
| 1<<38 | 51.2 ns | 64.9 ns: 1.27x slower |
+----------------+---------+-----------------------+
| 1<<300 | 108 ns | 127 ns: 1.17x slower |
+----------------+---------+-----------------------+
| 1<<3000 | 513 ns | 533 ns: 1.04x slower |
+----------------+---------+-----------------------+
| Geometric mean | (ref) | 1.12x slower |
+----------------+---------+-----------------------+
Benchmark hidden because not significant (1): 1<<7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is so close to the PyLong implementation, that I think we should use _PyLong_IsCompact()
+ _PyLong_CompactValue()
to be sure that it matches the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use _PyLong_IsCompact() + _PyLong_CompactValue()
Such variant was tested in the referenced above comment.
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For users, I think it'd be great if they know (in the docs) whether something can be NULL or not, especially since we have a similar interface for unicode objects.
digit *digits = PyMem_Malloc((size_t)ndigits * sizeof(digit)); | ||
if (digits == NULL) { | ||
return PyErr_NoMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs say that we should check for NULL but agreed that we could update them.
digit *digits = PyMem_Malloc((size_t)ndigits * sizeof(digit)); | ||
if (digits == NULL) { | ||
return PyErr_NoMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe bug is somewhere else? Newly added functions usually are explicitly say if they set an exception on error.
int overflow; | ||
#if SIZEOF_LONG == 8 | ||
long value = PyLong_AsLongAndOverflow(obj, &overflow); | ||
#else | ||
// Windows has 32-bit long, so use 64-bit long long instead | ||
long long value = PyLong_AsLongLongAndOverflow(obj, &overflow); | ||
#endif | ||
Py_BUILD_ASSERT(sizeof(value) == sizeof(int64_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason for using this API was "no-error" contract. Maybe we can specify that in docs as a CPython implementation detail? IIUC, we are free to change such things in new releases without deprecation period.
Note also, that gmpy2 benchmarks measure not just CPython side, but the whole conversion path (int->gmpy2.mpz in this case).
|
||
If *export_long->digits* is not ``NULL``, :c:func:`PyLong_FreeExport` must | ||
be called when the export is no longer needed. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. impl-detail:: | |
This function always succeeds if *obj* is a Python :class:`int` object | |
or a subclass. | |
Lets see if we can restore this in a that way. It might be helpful for e.g. Sage, which doesn't support PyPy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not add this note. It was controversial during PEP 757 design.
@serhiy-storchaka @encukou: What do you think? Would you be ok to declare that the PyLong_Export() function cannot fail if the argument is a Python int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was controversial during PEP 757 design.
It was proposed unconditionally, not as CPython's implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it, as an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@picnixz: I addressed your review. Please review the updated PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final nits and LGTM. Since I'm on mobile I don't know whether the spaces are correct or not so please check it manually.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Thanks for all reviews. I think that the change is now ready to be merged, I addressed all comments. I plan to merge the PR Friday. |
I would appreciate this, but... Isn't now an approval from other core developer is required for this type of pr's? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update also Doc/data/refcounts.dat
.
int overflow; | ||
#if SIZEOF_LONG == 8 | ||
long value = PyLong_AsLongAndOverflow(obj, &overflow); | ||
#else | ||
// Windows has 32-bit long, so use 64-bit long long instead | ||
long long value = PyLong_AsLongLongAndOverflow(obj, &overflow); | ||
#endif | ||
Py_BUILD_ASSERT(sizeof(value) == sizeof(int64_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is so close to the PyLong implementation, that I think we should use _PyLong_IsCompact()
+ _PyLong_CompactValue()
to be sure that it matches the specification.
if (export_long->ndigits == 0) { | ||
export_long->ndigits = 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to make the API easier to use: the consumer doesn't have to both with ndigits==0 special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact it's assumed (see e.g. _PyLong_New
), that the digit array always has at least one digit (and it's initialized for 0 too). And also, if you pass ndigits==0 to the mpz_import() as count
parameter, then it just calls malloc(0).
But I think it's safe just drop this check. This condition is unreachible here, as 0
handled in the !overflow
case.
if (export_long->ndigits == 0) { | |
export_long->ndigits = 1; | |
} |
PyLongWriter* | ||
PyLongWriter_Create(int negative, Py_ssize_t ndigits, void **digits) | ||
{ | ||
if (ndigits <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not allow 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gpshead asked to reject this case: https://discuss.python.org/t/pep-757-c-api-to-import-export-python-integers/63895/74
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, he asked to document that case (i.e. when ndigits==0).
We don't think it's a good idea to overbloat docs with this edge case (different functions should be used for small integers). PEP has a dedicated section to discuss import for small integers, suggesting different functions. (In fact, the whole API is about import/export for big integers.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, though I have a few suggestions.
Also, please test that PyLong_FreeExport
can be used after PyLong_Export
fills in the compact value
. AFAICS, the tests now skip it if they can.
|
||
If *export_long->digits* is not ``NULL``, :c:func:`PyLong_FreeExport` must | ||
be called when the export is no longer needed. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it, as an implementation detail.
@serhiy-storchaka and @encukou: I addressed your reviews. Please review the updated PR.
I added
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please do not forget to edit the commit message before merging.
Ok, I merged the PR. Big thanks to everyone who reviewed this PR which got 272 comments and 54 commits! Special thanks to @skirpichev who wrote a big part of this work. |
Add PyLong_Export() and PyLong_Import() functions and PyLong_LAYOUT structure.
📚 Documentation preview 📚: https://cpython-previews--121339.org.readthedocs.build/