More efficient implementation of integers #147

markshannon · 2021-12-02T16:01:38Z

markshannon
Dec 2, 2021
Collaborator

This is a reboot of #42. Any comments mentioning range will be deleted!

This might be a useful precursor to #138 as we will need to implement distinct paths for small integers vs larger integers for #138 as well.

Currently, int is implemented as an array of digits, where a digit is (approx) half a machine word.

typedef struct {
    OBJECT_HEADER;
    Py_ssize_t size_and_sign;
    digit digits[1];
} PyLongObject;

I would suggest changing this to:

typedef struct {
    OBJECT_HEADER;
    Py_ssize_t tagged_bits;
}  PyIntObject;

typedef struct {
    PyIntObject header;
    digit digits[1];
} PyLongObject;

Values that, if represented as PyLongObject, would have -2 <= size <= 2 would be represented as a PyIntObject with obj->tagged_value_or_size_and_sign = value << 2.
Other values would be represented as a PyLongObject as they are now, except that the size would be stored as (size<<1)+1.

Intrinsic functions for add-with-overflow will help with performance. GCC has those. Windows has them for unsigned values only (I don't know why).

Even without intrinsics, the overflow checks can be implemented in only a few instructions, so should still be faster than what we have now.

markshannon · 2021-12-02T16:03:08Z

markshannon
Dec 2, 2021
Collaborator Author

@pitrou had a link to portable safe overflow checks: portable-snippets

3 replies

lpereira Dec 2, 2021

There are probably more efficient ways to implement some of these checks (e.g. for a "add two signed integers", you can do the math as unsigned and check if the sign bit for the inputs match the result's sign bit), but this is a good start.

For instance, this untested piece of code:

bool add_int_overflow(int a, int b, int *r)
{
        unsigned int ua = (unsigned int)a;
        unsigned int ub = (unsigned int)b;
        unsigned int ur = ua + ub;
        *r = (int)ur;
        return ua & ub & ur & 0x80000000u;
}

Leverages the fact that & is commutative and that the pipeline won't be stalled waiting for a few different branches like in the portable code from that header. Generated code is similar in both MSVC and GCC, although GCC and Clang will have the builtins that will use things like JA, JO, or SETO on x86 and whatnot.

pitrou Dec 2, 2021

Instead of reasoning on C source code, it's better to watch the compiled assembly, since the compiler may apply various optimizations.
(also, IIRC psnip tries to use custom intrinsics rather than compatibility code by default)

lpereira Dec 2, 2021

I know. I looked at the code and saw how it's generated by MSVC on Godbolt. It's not as efficient as the snippet above (which is branchless/makes better use of a CPU's pipeline).

For GCC/Clang we can of course use the intrinsics (and psnip does define them -- even if undefined behavior as it's defining things beginning with two underscores -- for compilers that do not support these intrinsics, so if you include the header, you can just use the GCC/Clang intrinsics and it should work).

Best thing to do here, IMO, is to send a pull request to psnip implementing better overflow checking, and use that.

gvanrossum · 2021-12-02T16:06:32Z

gvanrossum
Dec 2, 2021
Maintainer

I’ve been trying to get @lpereira interested.

1 reply

lpereira Dec 2, 2021

Yeah, this is something I can take a look after finishing the thing I'm working on.

markshannon · 2022-01-27T19:49:55Z

markshannon
Jan 27, 2022
Collaborator Author

Any progress on this?

1 reply

lpereira Jan 27, 2022

Yes, I've got quite a bit of changes in my local tree already.

markshannon · 2022-02-24T14:21:05Z

markshannon
Feb 24, 2022
Collaborator Author

@lpereira did you push a branch to https://github.com/faster-cpython/cpython?

4 replies

markshannon Feb 25, 2022
Collaborator Author

?

gramster Feb 25, 2022
Maintainer

I'm trying to get an update.

lpereira Feb 26, 2022

Opened a draft PR with the changes in the cpython repository: python/cpython#31595

This is a draft because I still haven't finished going through every function in longobject.cyet (there are a few of those), and I still haven't even compiled/tested this yet because I need to get everything ported over to the new representation before trying to start the interpreter.

At this point I haven't implemented what @cfbolz mentioned (having multiple implementations for "long+int", "long*int", etc); I don't think that, at this point, this is going to be an issue. This can be improved later if we find that the workloads we care about would be improved with this additional overhead.

I would appreciate feedback.

cfbolz Feb 26, 2022

when we started this scheme, we added the extra functions as a second step (much) later. at first, all our long+int operations converted the int to a long before calling the actual logic. so maybe the work in CPython could be split up in that way as well.

cfbolz · 2022-02-25T12:56:58Z

cfbolz
Feb 25, 2022

just as a comment, this is essentially how pypy does it. two representations of int, one storing a machine word with the value, one being a pointer to a list of digits. we have it a bit simpler in that we can hide the tag bit to distinguish the two cases in the header.

there's a secondary benefit for algorithms that really operate on big integers. even in such programs, operations that mix a huge int and a machine-word sized one are common, so we have specialized implementations for long+int, long*int, etc.

0 replies

markshannon · 2022-02-28T16:42:08Z

markshannon
Feb 28, 2022
Collaborator Author

We need a better plan for this. I don't think it realistic to expect to merge very large changes, and long lived branches are a pain to maintain.

Once python/cpython#30496 is merged, we should consider a way to break this into manageable chunks.

One possible plan is this:

Add PyLongValue struct, such that PyLongObject is defined as follows:

typedef struct _ {
     uintptr_t size_plus;
     digit ob_digits[1];
} PyLongValue;

typedef struct _ {
    HEADER;
    PyLongValue value;
} PyLongObject;

Move most of the operations in longobject.c into longmath.c, where the operations would (in general) take PyLongValues and return a PyLongObject.
Convert the format of size_plus to this:

     bit 0: 1
     bit 1: sign
     bit 2+: size

size == size_plus >> 2, sign == (size_plus & 2) - 1, size_plus & 1 == 1.

Create a new file intobject.c containing

typedef struct _ {
    HEADER;
    union {
         intptr_t medium;
         PyLongValue big;
    }     
} PyIntObject;

With longmath.c in place, it should be relatively painless to implement PyIntObject with a couple of helper functions:
PyIntValue *normalize(PyLongValue *l); and PyLongValue *as_long_value(PyIntObject *, PyLongValue3 *temp)

typedef struct _ {
     uintptr_t size_plus;
     digit ob_digits[3];  // Make sure we have space.
} PyLongValue3;

PyIntObject *int_add(PyIntObject *a, PyIntObject *b) {
    if ((a->medium | b->medium) & 1) {
        PyLongValue3 temp;
        PyLongValue *la = as_long_value(a, &temp);
        PyLongValue *lb = as_long_value(b, &temp);
        return normalize(long_add(la, lb));
    }
    // do fast add...
}

5 replies

pitrou Feb 28, 2022

Why would you have both PyLongObject and PyIntObject? The latter seems sufficient.

markshannon Feb 28, 2022
Collaborator Author

We'd need to add PyIntObject before removing PyLongObject, so there would be some overlap, if we want to avoid giant changes.

lpereira Feb 28, 2022

I've been poking at other objects with multiple representations (like the unicode object), and I'm seeing this pattern of using multiple structs quite a bit. I kinda like this and I'll take a stab at doing this for integers this week.

lpereira Apr 12, 2022

I took a closer look at this suggestion, and I'm not sure about some things:

We pay for a single digit even if we're not using it, because it's always part of the union. It's only sizeof(digit), but it adds up, and if we're going through all this trouble, let's get this right from the get go.
Still on the topic of the digit array, I'm not sure about the legality of that as a flexible member within the suggested union in PyIntObject. C99 is too vague on this. I'm inclined to say it's fine but I need to look deeper in the standard to be at ease with this construction. However, see next point.
I do like having a PyLongValue, but because of (1), I'm not sure if it's the right way to go. Using type-punning, and having the functions in longmath.c take PyLongObjects instead (and asserting that they are, in fact, long objects) seems to me to be a better option, especially if (1) is considered. Something like this:

typedef struct _ {
    HEAD
    uintptr_t value;  /* unsigned to avoid implementation-defined behavior for bitwise ops */
} PyIntObject;

typedef struct _ {
    PyIntObject _base;
    digit digits[1];
} PyLongObject;

Minor point related to (3): with the proposed structs, the filename longmath.c doesn't make much sense anymore, as it wouldn't operate with PyLongValues anymore. It should be named something like longobject_bignum.c or something of the sort.
A disadvantage of using the type-punning method, however, is that with the "union method", code in longmath.c wouldn't need to worry about reference counts -- especially for all those temporarily "elongated" values. I'm still thinking about this (it's one of the reasons I like PyLongValue).
In the last code snippet (depicting a mocked-up int_add()), there's just one temp value, that is reused for both calls to as_long_value(). If I understood correctly, as_long_value() would be as close as possible to a no-op if the PyIntObject was already long, and use the scratch space provided in the temp variable to "elongate" the value. Is this just a typo (i.e. you intended for it to be two distinct temporary values), or is there something I'm missing here?
Although I'm positive it'll be fine, the declaration and use of PyLongValue3 seems a bit shady: digits is of type digit[1] in PyLongValue, and of type digit[3] in PyLongValue3. It's not clear, from my understanding of the C99 standard, that both arrays would be compatible to the point that you could cast a PyLongValue3 to point to a PyLongValue. (See 6.2.7.1 and 6.2.7.3.)

Although none of these points are going to block me, clarifying some of them will help us align expectations. (Or at least explain why the code ended up being the way it'll be once I'm finished.)

gvanrossum Apr 12, 2022
Maintainer

Clearly there's more than one way to skin this particular cat. I had always thought that the main point of the exercise was to avoid the digit array in the common case that the value fits (comfortably) in the size field, so we would need only 24 bytes for a medium-sized integer, rather than 28 (which presumably rounds up to 32). If the previous proposal doesn't do that, I think we need to revise it. (I'm not hopeful that we'll see a significant gain in speed after all is said and done, at least not until we introduce opcode specializations for short-to-medium sized integers.)

I wouldn't worry too much about the filename, but I have a (mild) objection against something as long and cumbersome as longobject_bignum.c. (Honestly I don't see much reason to split things into multiple files at this point.)

Regarding the C standard, I have a feeling that we already are violating the strictest possible implementation in many other places in our code. And I wouldn't assume the compilers are perverse. C does have some rules about structs with similar items being laid out similarly, and I think we can trust that rule quite a bit -- even in unions.

More efficient implementation of integers #147

Uh oh!

Uh oh!

markshannon Dec 2, 2021 Collaborator

Replies: 6 comments · 14 replies

Uh oh!

markshannon Dec 2, 2021 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gvanrossum Dec 2, 2021 Maintainer

Uh oh!

Uh oh!

markshannon Jan 27, 2022 Collaborator Author

Uh oh!

Uh oh!

markshannon Feb 24, 2022 Collaborator Author

Uh oh!

markshannon Feb 25, 2022 Collaborator Author

Uh oh!

gramster Feb 25, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markshannon Feb 28, 2022 Collaborator Author

Uh oh!

Uh oh!

markshannon Feb 28, 2022 Collaborator Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gvanrossum Apr 12, 2022 Maintainer

markshannon
Dec 2, 2021
Collaborator

Replies: 6 comments 14 replies

markshannon
Dec 2, 2021
Collaborator Author

gvanrossum
Dec 2, 2021
Maintainer

markshannon
Jan 27, 2022
Collaborator Author

markshannon
Feb 24, 2022
Collaborator Author

markshannon Feb 25, 2022
Collaborator Author

gramster Feb 25, 2022
Maintainer

markshannon
Feb 28, 2022
Collaborator Author

markshannon Feb 28, 2022
Collaborator Author

gvanrossum Apr 12, 2022
Maintainer