Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unneessary and possibly dangerious overshadowing of mp_bitcnt_t #557

Merged
merged 1 commit into from
Apr 23, 2019

Conversation

embray
Copy link
Contributor

@embray embray commented Apr 23, 2019

AFAICT this line can be traced back to f988d3e where it wasn't added for any immediately obvious reason (perhaps it was a temporary workaround to some other problem).

It was later updated to read #define mp_limb_t ulong which is dangerous since ulong is defined in flint.h as #define ulong mp_limb_t, where it's not necessarily the case that mp_limb_t is the same underlying type as mp_bitcnt_t depending on how GMP was configured.

@wbhart
Copy link
Collaborator

wbhart commented Apr 23, 2019

Thanks.

@wbhart wbhart merged commit 18a7acb into flintlib:trunk Apr 23, 2019
@embray embray deleted the mp_bitcnt_t branch April 23, 2019 15:37
@embray
Copy link
Contributor Author

embray commented Apr 23, 2019

Thank you!

@wbhart
Copy link
Collaborator

wbhart commented Apr 25, 2019

Apparently this broke the Windows build as @thofma points out.

@embray was this causing actual problems for you? Could we add it back in just for Windows? (I realise that is likely where you are working.)

@embray
Copy link
Contributor Author

embray commented Apr 25, 2019

Could you be more specific? Just blithely overshadowing typedefs with macros is generally not a workable solution.

@embray was this causing actual problems for you? Could we add it back in just for Windows?

Yes, of course, or else I wouldn't have noticed ;) I'm working in Cygwin which depending on what you mean by "Windows" is not necessarily the same thing. The question is why anyone thinks this is needed on Windows because chances are it's an X-Y problem.

@wbhart
Copy link
Collaborator

wbhart commented Apr 25, 2019

The problem as I understand it is, GMP defines this type to be 32 bits on Windows, whereas MPIR defines it to be 64 bits. Flint was originally written for use with MPIR and later we decided to support GMP. But this type is used literally thousands of times throughout Flint, with the assumption (as per MPIR) that the size is the same as an mp_limb_t (or ulong in Flint).

I see that just redefining it as we were is a pain in the neck for you, as any code that #includes gmp.h and flint.h will probably end up with the Flint definition of this type which might not be intended.

But it's a tricky job to fix the problem, not the least source of which is that this type is used in the prototype of functions defined in .h files in Flint which are included from outside Flint.

I guess the only proper solution is to replace every instance of mp_bitcnt_t in Flint (including code_conventions.txt, all .c and .h files and the docs) with flint_mp_bitcnt_t and hope that doesn't break anyone's code outside of Flint.

@wbhart
Copy link
Collaborator

wbhart commented Apr 25, 2019

"GMP defines this type to be 32 bits on Windows" => "GMP defines this type to be 32 bits on 64 bit Windows"

@wbhart
Copy link
Collaborator

wbhart commented Apr 25, 2019

@embray I presume you are doing a 64 bit Cygwin port? As that seems to be the only way you could hit this problem.

@tthsqe12
Copy link
Contributor

tthsqe12 commented Apr 26, 2019

I use mp_bitcnt_t with the assumption that it is unsigned and can hold the bit count of any fmpz_t. no other assumptions (I hope).

@wbhart
Copy link
Collaborator

wbhart commented Apr 26, 2019

If you don't use very large mpn's that is a safe assumption.

@tthsqe12
Copy link
Contributor

tthsqe12 commented Apr 26, 2019

Are you saying that the bit count of an fmpz could overflow a mp_bitcnt_t? Because of mpir? I based my assumptions on https://gmplib.org/manual/Nomenclature-and-Types.html

@wbhart
Copy link
Collaborator

wbhart commented Apr 26, 2019

No, I'm saying the bit count of an mpn could overflow an mp_bitcnt_t on Windows 64 bit. That is not anything to do with MPIR.

@tthsqe12
Copy link
Contributor

Surely that is gmp's fault then, right?

@wbhart
Copy link
Collaborator

wbhart commented Apr 26, 2019

It's a design decision I think.

@tthsqe12
Copy link
Contributor

This design decision is a real head-scratcher. I hope that I have used mp_bitcnt_t with only the following assumptions:
(1) it is unsigned
(2) unsigned char can be safely casted to mp_bitcnt_t
(3) mp_bitcnt_t can be safely casted to ulong (mp_bitcnt_t)
(4) mp_bitcnt_t can hold the bit count of any fmpz
(5) of course the pointer types unsigned char * and ulong * cannot be safely casted to mp_bitcnt_t *.

The crux is point (4). If one squares 2^(2^32-2) on win64 with gmp, what does it in principal do? If it throws, then all is well. If it doesn't then I would consider this a bug in gmp.

@embray
Copy link
Contributor Author

embray commented Apr 29, 2019

All of this is presupposing that it makes sense to #define mp_bitcnt_t and I'm still missing why anyone is doing that in the first place given that flint requires GMP/MPIR and includes gmp.h in its headers. Why not just use whatever typedef gmp.h gives you--I don't understand why it needs to be overshadowed.

@wbhart
Copy link
Collaborator

wbhart commented Apr 30, 2019

Flint originally used only MPIR not GMP. We therefore could always assume an mp_bitcnt_t was the same size as an mp_limb_t. Not so when we decided to also support GMP.

GMP uses two types to represent counts, an mp_size_t and an mp_bitcnt_t. The first is for counting a number of limbs and is always an int. On most machines, that's 32 bits. So you might think GMP supports integers up to 2^31 limbs. But it doesn't. Because of the limitations of mp_bitcnt_t, for representing bit counts, it can't represent integers that big on a 32 bit machine.

On a 64 bit machine, it is supposed to support much larger integers. But because of its historical poor support for Windows, this is not true on a 64 bit Windows box. Hence MPIR exists, to fix some of these problems. And that is why Flint always used MPIR, and why we have these historical problems.

The shadowing was a historical patch designed to save effort in fixing something that should never have been a problem in the first place.

@wbhart
Copy link
Collaborator

wbhart commented Apr 30, 2019

@embray Perhaps you are not aware of the history of this, but I was involved in the fork of GMP (called MPIR), way back. I was for many years the maintainer of MPIR (and still am some kind of defacto maintainer, though certainly not an active one nowadays).

@wbhart
Copy link
Collaborator

wbhart commented Apr 30, 2019

Note that the GMP manual states that GMP will fix this problem in the future, i.e. an mp_bitcnt_t will be 64 bits on a 64 bit Windows box. There are still other significant issues on 64 bit Windows that they haven't fixed yet, which MPIR fixes, but that will be progress.

@embray
Copy link
Contributor Author

embray commented May 2, 2019

I'm aware of some of the history (I have submitted fixes to MPIR you know) though some of these details were helpful.

But that still doesn't explain, just on a nitty-gritty level, why that define of mp_bitcnt_t was needed. I mean even if using MPIR doesn't it still define that in its headers? And why would removing it break the windows build, as @thofma believes?

@isuruf
Copy link
Member

isuruf commented May 2, 2019

@embray, see #557 (comment). FLINT assumes that mp_bitcnt_t and mp_limb_t have the same size. Removing this define leads to overflow issues as you can see at https://ci.appveyor.com/project/isuruf/flint2/builds/24127096/job/rexkrtnuu73ybsgx

@tthsqe12
Copy link
Contributor

tthsqe12 commented May 2, 2019

After a quick search of the docs, I could only find one function that uses mp_bitcnt_t *, and that function is flint_mpn_remove_2_exp. If this all usages of this function were fixed I would be willing to bet - but still pleasantly surprised - that at least t-euler_phi would pass.

@tthsqe12
Copy link
Contributor

tthsqe12 commented May 2, 2019

Ah, maybe you missed the discussion on the other PR @embray . There are pointer casts between mp_bitcnt_t * and ulong * in flint.
https://github.com/wbhart/flint2/blob/trunk/fmpz_factor/factor_trial_range.c#L59

@isuruf
Copy link
Member

isuruf commented May 2, 2019

If this all usages of this function were fixed I would be willing to bet - but still pleasantly surprised - that at least t-euler_phi would pass.

I can confirm. s/mp_bitcnt_t/flint_mp_bitcnt_t/g and #define flint_mp_bitcnt_t ulong fixes the tests. https://ci.appveyor.com/project/isuruf/flint2/builds/24265459

@tthsqe12
Copy link
Contributor

tthsqe12 commented May 2, 2019

@isuruf I meant the actual solution of getting your hands dirty and fixing functions that use mp_bitcnt_t incorrectly (i.e. with the assumption that it is the same as ulong). If you just replace all mp_bitcnt_t with a new type - which I would rather have called flint_bitcnt_t by the way - then you have created the problem of possibility breaking other code that depends on flint. This would certainly be the easiest way to go.

@embray
Copy link
Contributor Author

embray commented May 3, 2019

@isuruf @tthsqe12 Thank you, that clarifies it. I agree, just a new flint-specific typedef would be best. I would do the same for flint_ulong and flint_slong as well (or something to that effect) but that is a much more pervasive change.

@wbhart
Copy link
Collaborator

wbhart commented Jun 17, 2019

This should be fixed now. Thanks to Daniel Schultz @tthsqe12 all the Windows tests now also pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants