gh-111140: Adds PyLong_AsNativeBytes and PyLong_FromNative[Unsigned]Bytes functions #114886

zooba · 2024-02-02T02:31:38Z

Issue: C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions #111140

zooba · 2024-02-02T02:36:32Z

Sharing work so far for feedback/concerns.

I like the unsigned/signed semantics (basically, using "unsigned" suppresses an error where you need one more bit to be a sign bit), and I dislike the mess needed to support byteswapping for non-native endian on arbitrary length buffers. But it's written now and seems to work, so I guess it can stay.

The "FromByteArray" prototypes are a lie right now, but they're coming in next :) Probably need a "WithOptions" variant as well, though I have a feeling the implementation can't be any better than _PyLong_FromByteArray so it should be simple.

Include/cpython/longobject.h

bedevere-bot · 2024-02-07T21:13:34Z

🤖 New build scheduled with the buildbot fleet by @zooba for commit 6e1a89a 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

zooba · 2024-02-08T21:24:34Z

The only unhappy buildbot seems to be the s390x Fedora one, and that looks unrelated to this (if int conversions in posixmodule were incorrect, I'd expect to see other buildbots fail as well).

Any further feedback?

Doc/c-api/long.rst

Doc/whatsnew/3.13.rst

Objects/longobject.c

zooba · 2024-02-08T22:57:38Z

Thanks @erlend-aasland! Great feedback, and even found a bug (this is why we get reviews!)

I left one open question about doc wording, as I know you're more broadly involved in that than I am.

encukou · 2024-02-09T08:47:36Z

Include/cpython/longobject.h

+PyAPI_FUNC(int) PyLong_AsNativeBytes(PyObject* v, void* buffer, size_t n_bytes,
+    int endianness);


Shouldn't this return size_t rather than int?

Negative values are required, so we'd have to do Py_ssize_t and not size_t. The cast is just as annoying either way unless we make n_bytes also signed, which is then inconsistent with other APIs (but probably less bad than making it accept int).

We might need some agreed upon guidelines for choosing types for these kinds of purposes. int is a very common return type, and IMHO that makes things easier for people trying to call this from non-C languages, but they probably have no choice but to support Py_ssize_t as a return type too and so it really wouldn't make a huge difference.

I pity the poor CPU that has to convert a 32 billion bit number 🙃

I really need to focus on not mixing up signed/unsigned when I write a comment...

Yes, Py_ssize_t is one of the types that users need to support.
I'd much prefer using Py_ssize_t for sizes. These are arbitrary-sized integers, after all; limited by available memory.

Objects/longobject.c

encukou · 2024-02-09T09:14:47Z

How should I use PyLong_AsNativeBytes if I have an existing signed long I want to fill -- say, as part of a struct?
It seems that I'd need to allocate a buffer with one more byte, and then check that the extra byte is zero. And then memcpy to the destination.

zooba · 2024-02-09T10:28:53Z

How should I use PyLong_AsNativeBytes if I have an existing signed long I want to fill -- say, as part of a struct?

n = PyLong_AsNativeBytes(v, &st.long_value, sizeof(long), -1);
if (n < 0) {
  // exception raised
} else if (n <= sizeof(long)) {
  // success
} else {
  // value overflows, but st.long_value is set correctly, just truncated
}

What you do in the overflow case is up to you. Maybe you can overallocate? Maybe you just have to fail. But you don't have to overallocate to fit precisely into a signed variable.

For an unsigned variable, you may require one extra bit to store the sign bit, e.g. 2**256-1 requires 257 bits (33 bytes) of storage, and so does -(2**256-1). The extra byte will contain all sign bits in both cases, but in the former we don't need it when we know the value is going to be treated as unsigned (the MSB of the 32-byte value is set in both cases).

There really isn't a consistent way for us to handle this, lacking knowledge of what the value actually is. The current (private) function just fails on all negative values if you ask for unsigned, but that doesn't actually touch this issue at all - it just excludes a whole lot of otherwise valid conversions (e.g. +/-(2**255-1) and smaller always fits in 32 bytes with no loss of data).

So the suggestion there is if you know the magnitude of the value fits into your destination, or you don't care, then ignore positive returns from the function. But again, it only affects positive integers that require precisely the number of bits you've provided when treated as unsigned, and so treating as signed requires one extra.

encukou

Thanks.
IMO it's hard to implement a conversion to native unsigned integer on top of this, but, that could be solved by adding PyLong_AsNativeUnsignedBytes. No need to block this PR.

I trust your judgment on int/*size_t as well; no need for that to block the PR.

zooba · 2024-02-12T17:28:07Z

IMO it's hard to implement a conversion to native unsigned integer on top of this, but, that could be solved by adding PyLong_AsNativeUnsignedBytes. No need to block this PR.

I briefly implemented that. It took the size of the provided buffer, allocated a new one with one extra byte, did exactly the same thing, and copied the result into the original buffer or failed if it wasn't going to fit. It seems wasteful, especially compared to the caller doing a static allocation of a larger buffer themselves and calling this API, but we can't really improve on it that much - calculating that the resulting number is going to be in that one-bit worth of grey area isn't obvious. (And IMHO not offering it makes it more likely that people will implement it efficiently for themselves, rather than complaining that we did it inefficiently.)

The thing that isn't trivial is to reject negative values entirely, which someone may want to do. But that's more likely to be an application requirement than an integration/adapter requirement, so I'd expect they'll have a comparison to zero in their own language before they convert anyway, which means they'll go to a native signed integer rather than directly to unsigned, or they'll do the comparison using a Python comparison.

zooba · 2024-02-12T20:14:13Z

I trust your judgment on int/*size_t as well; no need for that to block the PR.

@encukou It didn't block, I changed them all to Py_ssize_t 😄 My only concern was that comparing the result to sizeof(...) shouldn't cause a warning, but then I realised that the warning would also occur for (signed) int, so it doesn't matter.

Doc/c-api/long.rst

gpshead · 2024-02-12T23:17:45Z

additional follow on example and test improvements in #115380

gpshead · 2024-02-12T23:26:16Z

Looking this API over the main thing I don't like is... unrelated to the API. It appears to be doing the right thing twos compliment wise given that it is returning data in whole bytes with the sign extension filling an entire additional byte when necessary as \xff or \x00.

I do find the need for an additional byte if someone is dealing exclusively with a value they already know to be non-negative yet large enough to occupy the high bit "annoying"... but this API isn't primarily for use by people wanting to merely fill a native type like that. They can add a range check of their own and be happy. ie: The prior understanding of the underlying value as discussed above.

Thus my PR making the example not use it to fill in a mere int to not encourage that pattern given native APIs for many types like that exist.

Granted we could change to encouraging this API everywhere in the future instead of people forgetting to if ((value = PyLong_AsLong(x)) == -1 && PyErr_Occurred()) { ... } being a super common flaw in code. This API maybe more readily encourages checking the return value.

Objects/longobject.c

…gned]Bytes functions (pythonGH-114886)

pythongh-111140: Adds PyLong_AsByteArray functions

c86821e

bedevere-app bot mentioned this pull request Feb 2, 2024

C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions #111140

Closed

Statics

cd6754a

encukou reviewed Feb 2, 2024

View reviewed changes

Include/cpython/longobject.h Outdated Show resolved Hide resolved

zooba added 3 commits February 2, 2024 13:38

Better options handling and tests

bf86f85

Rework into PyLong_CopyBits

22f3d14

Merge remote-tracking branch 'upstream/main' into pythongh-111140

0dc0bcc

zooba changed the title ~~gh-111140: Adds PyLong_AsByteArray functions~~ gh-111140: Adds PyLong_CopyBits function Feb 2, 2024

zooba added 3 commits February 2, 2024 15:37

Not static

c30e65a

Add motivating example to tests

9661b65

Add FromBits functions

79d4942

zooba changed the title ~~gh-111140: Adds PyLong_CopyBits function~~ gh-111140: Adds PyLong_CopyBits and PyLong_FromBits functions Feb 2, 2024

zooba added 4 commits February 2, 2024 17:00

A test and a const

bc9e48c

Space

b4761be

Merge remote-tracking branch 'upstream/main' into pythongh-111140

043947f

Docs

761db5c

zooba marked this pull request as ready for review February 5, 2024 17:35

zooba requested review from corona10, rhettinger, berkerpeksag and erlend-aasland as code owners February 5, 2024 17:35

bedevere-app bot added the awaiting core review label Feb 5, 2024

rhettinger removed their request for review February 5, 2024 20:57

zooba mentioned this pull request Feb 5, 2024

Add PyLong_CopyBits and _FromBits functions capi-workgroup/decisions#12

Closed

5 tasks

zooba added 2 commits February 7, 2024 20:19

Merge remote-tracking branch 'upstream/main' into pythongh-111140

22c2a64

Rename to PyLong_AsNativeBytes

6e1a89a

zooba added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Feb 7, 2024

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Feb 7, 2024

erlend-aasland reviewed Feb 8, 2024

View reviewed changes

Review feedback

79dd452

zooba added 2 commits February 8, 2024 22:59

_resolve_endianness

9693afe

More reliable test

376e358

encukou reviewed Feb 9, 2024

View reviewed changes

Doc tweaks

1f2cf3a

erlend-aasland approved these changes Feb 12, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Feb 12, 2024

encukou approved these changes Feb 12, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/main' into pythongh-111140

7f4431d

zooba force-pushed the gh-111140 branch from 66ece6c to 7f4431d Compare February 12, 2024 17:29

python deleted a comment from cpython-cla-bot bot Feb 12, 2024

zooba added 2 commits February 12, 2024 17:30

Merge remote-tracking branch 'upstream/main' into pythongh-111140

2649d94

Switch to Py_ssize_t consistently instead of int and size_t

39962de

zooba merged commit 7861dfd into python:main Feb 12, 2024
35 checks passed

bedevere-app bot removed the awaiting merge label Feb 12, 2024

zooba deleted the gh-111140 branch February 12, 2024 20:14

scoder reviewed Feb 12, 2024

View reviewed changes

Doc/c-api/long.rst Show resolved Hide resolved

AlexWaygood reviewed Feb 13, 2024

View reviewed changes

Objects/longobject.c Show resolved Hide resolved

fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this pull request Feb 14, 2024

pythongh-111140: Adds PyLong_AsNativeBytes and PyLong_FromNative[Unsi…

52e065a

…gned]Bytes functions (pythonGH-114886)

davidhewitt mentioned this pull request Feb 17, 2024

Deprecate / remove _Py internal APIs from pyo3-ffi PyO3/pyo3#3762

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-111140: Adds PyLong_AsNativeBytes and PyLong_FromNative[Unsigned]Bytes functions #114886

gh-111140: Adds PyLong_AsNativeBytes and PyLong_FromNative[Unsigned]Bytes functions #114886

zooba commented Feb 2, 2024 •

edited by bedevere-app bot

Loading

zooba commented Feb 2, 2024

bedevere-bot commented Feb 7, 2024

zooba commented Feb 8, 2024

zooba commented Feb 8, 2024

encukou Feb 9, 2024

zooba Feb 9, 2024

encukou Feb 12, 2024

encukou commented Feb 9, 2024 •

edited

Loading

zooba commented Feb 9, 2024

encukou left a comment

zooba commented Feb 12, 2024

zooba commented Feb 12, 2024

gpshead commented Feb 12, 2024

gpshead commented Feb 12, 2024

		PyAPI_FUNC(int) PyLong_AsNativeBytes(PyObject* v, void* buffer, size_t n_bytes,
		int endianness);

gh-111140: Adds PyLong_AsNativeBytes and PyLong_FromNative[Unsigned]Bytes functions #114886

gh-111140: Adds PyLong_AsNativeBytes and PyLong_FromNative[Unsigned]Bytes functions #114886

Conversation

zooba commented Feb 2, 2024 • edited by bedevere-app bot Loading

zooba commented Feb 2, 2024

bedevere-bot commented Feb 7, 2024

zooba commented Feb 8, 2024

zooba commented Feb 8, 2024

encukou Feb 9, 2024

Choose a reason for hiding this comment

zooba Feb 9, 2024

Choose a reason for hiding this comment

encukou Feb 12, 2024

Choose a reason for hiding this comment

encukou commented Feb 9, 2024 • edited Loading

zooba commented Feb 9, 2024

encukou left a comment

Choose a reason for hiding this comment

zooba commented Feb 12, 2024

zooba commented Feb 12, 2024

gpshead commented Feb 12, 2024

gpshead commented Feb 12, 2024

zooba commented Feb 2, 2024 •

edited by bedevere-app bot

Loading

encukou commented Feb 9, 2024 •

edited

Loading