Skip to content

Conversation

@devnexen
Copy link
Member

No description provided.

@devnexen devnexen marked this pull request as ready for review November 21, 2024 07:09
@devnexen devnexen requested a review from Girgias as a code owner November 21, 2024 07:09
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic makes sense to me, but please address the comment.

ext/gmp/gmp.c Outdated
gmp_temp_t temp_a;

FETCH_GMP_ZVAL(gmpnum, a_arg, temp_a, 1);
long r = mpz_get_si(gmpnum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace the call to zval_get_long(a_arg) below to use the new variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use mpz_get_si() here in the first place, since it may overflow for large gmpnum. Possibly use mpz_sizeinbase(gmpnum, 2) instead; that might need some fine-tuning of maxbits, though.

ext/gmp/gmp.c Outdated
gmp_temp_t temp_a;

FETCH_GMP_ZVAL(gmpnum, a_arg, temp_a, 1);
long r = mpz_get_si(gmpnum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use mpz_get_si() here in the first place, since it may overflow for large gmpnum. Possibly use mpz_sizeinbase(gmpnum, 2) instead; that might need some fine-tuning of maxbits, though.

ext/gmp/gmp.c Outdated
Comment on lines 1284 to 1288
#if SIZEOF_SIZE_T == 4
const zend_long maxbits = ULONG_MAX / GMP_NUMB_BITS;
#else
const zend_long maxbits = INT_MAX;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to the top-level, since we're already using it in gmp_random_bits(), and may want to re-use it elsewhere. (Could also be a macro; doesn't really matter, I think.)

Comment on lines +50 to +54
#if SIZEOF_SIZE_T == 4
#define GMP_ALLOC_MAXBITS (ULONG_MAX / GMP_NUMB_BITS)
#else
#define GMP_ALLOC_MAXBITS INT_MAX
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to avoid overflow in _mpz_realloc() (libgmp 6.3.0) is:

  if (sizeof (mp_size_t) == sizeof (int))
    {
      if (UNLIKELY (new_alloc > ULONG_MAX / GMP_NUMB_BITS))
	MPZ_OVERFLOW;
    }
  else
    {
      if (UNLIKELY (new_alloc > INT_MAX))
	MPZ_OVERFLOW;
    }

Assuming that mp_size_t is actually size_t, the simplification seems to be correct (we're assuming sizeof(int) == 4 anyway). However, I do not understand how ULONG_MAX fits into the mix. That only works under the assumption that ULONG_MAX == UINT_MAX for 32bit platforms (which is what php-src assumes anyway), but why don't they use UINT_MAX in the first place? I think we should.

if (mpz_sgn(gmpnum) < 0) {
zend_argument_value_error(1, "must be greater than or equal to 0");
(void)gmpnum;
val = zval_get_long(a_arg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand how zval_get_long() fits into the mix here. At least, its usage makes FETCH_GMP__ZVAL() above superflous, and we wouldn't even need the special casing for Z_TYPE_P(a_arg) == IS_LONG (since this is already catered to by zval_get_long().

What happens if one calls gmp_fact("18446744073709551617")? Wouldn't that raise a deprecation notice ('Implicit conversion from float-string "18446744073709551617" to int loses precision') instead of throwing an ValueError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if one calls gmp_fact("18446744073709551617")? Wouldn't that raise a deprecation notice ('Implicit conversion from float-string "18446744073709551617" to int loses precision') instead of throwing an ValueError.

Nope I get the exception in this case too. I ll see the rest later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed! That is because the conversion happens in coercive mode. And that also shows that we can't easily drop the seemingly superfluous FETCH_GMP_ZVAL() since only that will report issues regarding strict typing mode (e.g. passing a float). So zval_get_long() is basically correct here; the saturating behavior is fine since we cannot compute the factorial of PHP_INT_MAX anyway.

Still, there is an issue unrelated to this PR

E.g. for gmp_fact(new GMP("18446744073709551617")) zval_get_long() returns 1 which passes the overflow check, and evaluates to GMP(1). That is because the mpz_get_si() in gmp_cast_object() overflows, while I would expect saturating behavior. This is not related to this PR, though, but rather a general issue with ext/gmp. From the mpz_get_si() documentation:

If op fits into a signed long int return the value of op. Otherwise return the least significant part of op, with the same sign as op.

If op is too big to fit in a signed long int, the returned result is probably not very useful. To find out if the value will fit, use the function mpz_fits_slong_p.

So apparently, there is an easy fix for gmp_cast_object(). However, on LLP64 mpir 3.0.0 mpz_fits_slong_p() returns false for 2147483648. ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand how zval_get_long() fits into the mix here. At least, its usage makes FETCH_GMP__ZVAL() above superflous, and we wouldn't even need the special casing for Z_TYPE_P(a_arg) == IS_LONG (since this is already catered to by zval_get_long().

What happens if one calls gmp_fact("18446744073709551617")? Wouldn't that raise a deprecation notice ('Implicit conversion from float-string "18446744073709551617" to int loses precision') instead of throwing an ValueError.

The zval_get_long() is to convert the GMP object back to an int...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zval_get_long() is to convert the GMP object back to an int...

Yeah, I figured that much, but I wondered about values outside of [ZEND_LONG_MIN, ZEND_LONG_MAX]. However, I found out. :)

@Girgias
Copy link
Member

Girgias commented Nov 22, 2024

On my machine (64bit) the largest factorial I can attempt to compute is 471778098879 the boundary where it fails is 471778098880 = 64 * 7371532795. So we could compare against this.

@Girgias
Copy link
Member

Girgias commented Nov 22, 2024

Well that still OOMs on 48GB of RAM so, that's not the upper boundary then.

@cmb69
Copy link
Member

cmb69 commented Nov 24, 2024

Roughly 4294967296! might fit into 16GB. 16GB == 2^37 bits is still quite excessive. Especially if we revert/do not introduce these bounds check for stable versions, in my opinion, we could go with far less, maybe even only 2^24 bits (in which case !2790877 would be the maximum).

@Girgias
Copy link
Member

Girgias commented Nov 25, 2024

Yeah, I was just trying to see what was theoretically possible. :)

I am happy to restrict it to !2790877 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants