Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vote on PEP 757 – C API to import-export Python integers #45

Open
5 tasks
vstinner opened this issue Oct 7, 2024 · 18 comments
Open
5 tasks

Vote on PEP 757 – C API to import-export Python integers #45

vstinner opened this issue Oct 7, 2024 · 18 comments
Labels

Comments

@vstinner
Copy link

vstinner commented Oct 7, 2024

Vote on PEP 757 – C API to import-export Python integers.

Since I co-authored PEP 757, I abstain from voting on my own PEP :-)

I closed the previous issue since it had a long history and the API changed multiple times.

@encukou
Copy link
Collaborator

encukou commented Oct 7, 2024

I'd prefer being able to deprecate PyLong_Export -- there's no version directly in struct PyLongExport, so if things change too much we'll need to add new API. And that's OK, but we will want deprecation/removal in that case.
Otherwise, +1 from me!

@skirpichev
Copy link

skirpichev commented Oct 8, 2024

@encukou,

  1. How long API should live? E.g. mpz_import/export API was in the GMP since 2002.
  2. PyLongLayout has no version field too. Are you ok with that?
  3. Which changes do you want to address? I think we should keep something in mind, even highly hypothetical, if going to add a version field.

I address d.p.o comment also here:

This function always succeeds if obj is a Python int object or a subclass. [...] This means we can never deprecate this function. We might want to do that if, for example, we introduce an internal format that doesn’t fit in the struct PyLongExport parameters. I suggest removing the line.

Probably, this concern also could be resolved with a new version field, right? We did our best to preserve "always successful" constraint and I would like to keep it. (And it requires much less code adaptation on the gmpy2 side.)

Edit: lets look closer on the structure

typedef struct PyLongExport {
    int64_t value;
    uint8_t negative;
    Py_ssize_t ndigits;
    const void *digits;
    // Member used internally, must not be used for other purpose.
    Py_uintptr_t _reserved;
} PyLongExport;

Most public fields (negative/ndigits/digits) are required, else the whole API doesn't make sense. So, we left with one field: value - that part might be versioned. How about this (which is closer to the original proposal by @pitrou):

typedef struct PyLongExport2 {
    uint8_t is_compact;
    union {
        struct {
            uint8_t version;
            int64_t value;
        } compact_value;
        struct {
            uint8_t negative;
            Py_ssize_t ndigits;
            const void *digits;
            // _reserved could be actually here.
        } digit_array;
    };
    // Member used internally, must not be used for other purpose.
    Py_uintptr_t _reserved;
} PyLongExport2;

Or we could use just reserve enough room for compact_value:

struct {
    int64_t value[2];
} compact_value;

@encukou
Copy link
Collaborator

encukou commented Oct 8, 2024

How long API should live?

As long as we can support it, but it shouldn't be forced to live forever :)

mpz_import/export API was in the GMP since 2002.

Sure, but AFAIK there's no expectation of that being a zero-copy mechanism, and it is in a library that's expected to provide conversions between arbitrary formats. It seems that adding a mpz_export2 with an extra argument would be too much trouble for GMP.

But, not being able to fail with MemoryError, PyLong_Export practically can't switch away from being a zero-copy mechanism. The API prevents future CPython (and any other implementations that use the C API) from using alternative int representations internally.

PyLongLayout has no version field too. Are you ok with that?

Oh, right -- a version field would be better on PyLongLayout; then PyLongExport wouldn't need one.

No version anywhere means that if/when PyLongLayout/PyLongExport no longer fit, we'll need to deprecate PyLong_Export itself. That's an OK plan, if we allow ourselves to deprecate it (which means raising exceptions under -Werror).

Which changes do you want to address?

Examples are 2's complement, or a 64-bit compact value with a separate sign bit.

But the ability to deprecate functions is mainly useful for unknown unknowns. I can't see decades ahead.

@skirpichev
Copy link

But, not being able to fail with MemoryError, PyLong_Export practically can't switch away from being a zero-copy mechanism. The API prevents future CPython (and any other implementations that use the C API) from using alternative int representations internally.

Not all. Only if this representation can't fit in 64-bit value. But if this field is too small - we can make it versioned or just bigger from start.

Oh, right -- a version field would be better on PyLongLayout; then PyLongExport wouldn't need one.

Then I'm lost :( The PyLongLayout has nothing to say about representation for "compact" values, that fit in some machine-sized integer.

It's used to describe "asymptotic" representation of CPython integers as "digit array". (Maybe PEP isn't clear enough here?) Which else field we might want to add to this structure?

Examples are 2's complement, or a 64-bit compact value with a separate sign bit.

I'm assuming you meant 2's complement for "compact" values, not for representation of big integers. Then both variants could be solved as described above:

  1. versioned structure compact_value in the union
  2. or just more big structure, say 128-bit

If this complication is required - I'm slightly biased to (1).

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

versioned structure compact_value in the union

I dislike anonymous union, it caused us some troubles in the past.

or just more big structure, say 128-bit

I dislike having two numbers, you have to define the order. I don't think that it's needed. I suggest to revisit the API once Python will need 128-bit. For now, it only needs 60-bit.

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

Examples are 2's complement, or a 64-bit compact value with a separate sign bit.

We can make a small change to PEP 757:

  • Change PyLongExport.value type from signed int64_t to unsigned uint64_t.
  • Use also PyLongExport.negative when digits is NULL.

It doubles the capacity of PyLongExport.value.

@pitrou
Copy link

pitrou commented Oct 8, 2024

We can make a small change to PEP 757:

What would be the point though? How likely is it that CPython switches to an uncommon "uint64 + sign bit" representation? @markshannon what do you think?

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

What would be the point though?

It may simplify the code using this API: https://peps.python.org/pep-0757/#export-pylong-export-with-gmpy2

Currently, this code is needed:

            mpz_import(z, 1, -1, sizeof(int64_t), 0, 0, &value);
            if (value < 0) {
                mpz_t tmp;
                mpz_init(tmp);
                mpz_ui_pow_ui(tmp, 2, 64);
                mpz_sub(z, z, tmp);
                mpz_clear(tmp);
            }

I expect that it can be simplified to the following code which should be more efficient:

            mpz_import(z, 1, -1, sizeof(uint64_t), 0, 0, &value);
            if (long_export.negative) {
                mpz_neg(z, z);
            }

Note: currently, only Windows uses this code path, since Windows only has 32-bit C long for mpz_set_si() fast-path.

@encukou
Copy link
Collaborator

encukou commented Oct 8, 2024

I should never have mentioned any examples.

A PyLong_Export that cannot fail (for ints), cannot be deprecated, and cannot switch to copying data, limits the internal int representation (in CPython and other implementations of the C API) to only what's exportable through PyLongExport.
IMO, that is unacceptable.
We need a way to evolve in the future, not guess now what the future might be.

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

I'd prefer being able to deprecate PyLong_Export -- there's no version directly in struct PyLongExport, so if things change too much we'll need to add new API. And that's OK, but we will want deprecation/removal in that case.

I wrote python/peps#4025 to allow PyLong_Export() to fail.

@pitrou
Copy link

pitrou commented Oct 8, 2024

It may simplify the code using this API: https://peps.python.org/pep-0757/#export-pylong-export-with-gmpy2

The goal of this API is not to simplify consumer code, it's to make the export as efficient as possible.

To make the export as efficient as possible, it must be as close as possible to CPython's internal representation.

CPython is much more likely to adopt a int64 representation internally, than a "uint64 + sign bit" representation. The latter simply doesn't make sense, because it can't fit in a machine register.

Thus the export API should favor int64 over "uint64 + sign bit".

@skirpichev
Copy link

I dislike anonymous union, it caused us some troubles in the past.

Was it before or after adding the C11 compiler requirement? With union, the structure will be also smaller (both in the current variant & with version field).

I dislike having two numbers, you have to define the order.

Native. Something like that does make sense for me with hypothetical 128-bit machine int's. Using e.g. uint64_t + sign doesn't look as a possible future for CPython integers.

I don't think that it's needed.

What about version field?

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

Was it before or after adding the C11 compiler requirement?

In short, the Python C API only requires C99, not C11: see the long discussion.

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

@encukou:

I'd prefer being able to deprecate PyLong_Export -- there's no version directly in struct PyLongExport, so if things change too much we'll need to add new API. And that's OK

I don't think that we should add a version member. I don't see how users would be supposed to use it. If things change, I would prefer a new API, as @encukou wrote.

@encukou
Copy link
Collaborator

encukou commented Oct 8, 2024

The union doesn't need to be anonymous, let's just name it.

How about:

typedef struct PyLongExport2 {
    // type also serves as a version field: on unknown type,
    // users must fall back to e.g. PyLong_AsNativeBytes.
    // (And it's big because the next field is
    // pointer-aligned, there's a lot of padding anyway.)
    uint32_t type;  
    union {
        int64_t compact_value;
        struct {
            Py_ssize_t ndigits;
            const void *digits;
            uint8_t negative;

            // Member used internally, must not be used for other purpose.
            Py_uintptr_t _reserved;
        } digit_array;
    } data;
} PyLongExport2;

(edit: the union should be last so the struct can grow)

Maybe this discussion should be on Discourse, not here...

@vstinner
Copy link
Author

vstinner commented Oct 8, 2024

The union doesn't need to be anonymous, let's just name it.

Which problem are you trying to address with this different structure?

@encukou
Copy link
Collaborator

encukou commented Oct 8, 2024

Which problem are you trying to address with this different structure?

Compared to @skirpichev's earlier proposal:

  • Avoiding an anonymous union
  • Adding a “free” version field (by specifying what to do with an unknown union tag)
  • Making unused padding bits potentially useful in the future

@skirpichev
Copy link

The union doesn't need to be anonymous, let's just name it.

I'm not sure if anonymous unions are banned from public API. (The long discussion seems to be unfinished.)

But the rest clearly does make sense. This even more closer to the original proposal by Antoine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants