-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random(...,1,2^80) test fails on SPARC Solaris #2292
Comments
Would it be possible to bisect this issue if it only appeared in the last couple of weeks, or has it been around for a long time (I made some recentish random changes). |
I don't recall seeing it on 4.9. I will bisect this. |
It's still there on 4.9.0, but it does work on 4.8.6. I will try finding the exact commit. It's also not in 4.8.10, so one has to chase the 4.9 branch. |
Could you give me a reasonable starting point on the 4.9 branch to start bisecting from? |
BTW, is that machine a big endian machine by chance? That would give an important hint: it would suggest that some of the kernel random code is not sufficiently endian safe (e.g. the mersenne twister). It then probably wasn't in GAP 4.8 either, but we also changed which random sources we use to be more consistent, which could have exposed the problem. This is just a wild guess, though. |
Yes, the endianness is big, as is the case with SPARCs. |
OK. (Note that SPARC v9 and later is bi-endian, so given the information in this issue, I was not able to able to tell the endianess). |
A quick look at the kernel code, specifically |
Just to be sure, here is the test:
|
Excellent!
If it gives identical results, we should look elsewhere; if not we have found the bug (or at least part of it). |
Both GAP 4.8.6 and 4.9.0 on SPARC Solaris 11 return
I guess it's hidden on 4.8.6 only because tests for this are not present. |
So one needs to look at
I marked the two problematic spots (roughly): this code creates a GMP bigint 32bits at a time. This ensures that we get identical results regardless of whether we are on a 32 or 64 bit system. I am not sure whether we get identical results on 32 bit LE vs BE systems. Perhaps you could try it? i.e. can you build a 32bit SPARC versions of GAP and see what it outputs? If that already is broken, we need to change the byte order when writing in those two places. But in addition, for 64bit BE, also the 32bit words need to be shuffled. To do this, in the first loop, one should call
(except in the last iteration, if the total number of 32 bit blocks is odd, then of course the high bits should be set to 0 instead. And of course the final masking must be done carefully, too. |
Why does GAP roll its own randoms here, instead of relying on GMP? Is it because GMP does not do a good job here? |
Until a couple of months ago, gmp was an optional dependency. Also, we have to be careful using gmp functionality because we need to be careful about memory manager interactions. |
I meant to say that it looks more practical to have this code replaced by an appropriate call to GMP's Mersenne's Twister |
We understood; Chris' answer still applies. I would add that it is unclear to me whether GMP's MT returns identical results across BE vs LE and 32 vs 64 bit. |
From reading GMP test code I gather it must be platform-independent: they test their generators against hard-coded sequences of numbers: Perhaps @wbhart could confirm this... |
I'm sorry, I don't know. I can't see any reason why MT should return different results across platforms. But I don't know about any other generators. It's not something I've thought about. |
My reference is that there are many GMP functions we can't use, because they can internally allocate memory, or move buffers around. These (obviously) conflict with GAP's internal memory manager. Therefore we use a very small set of functions which allow us to control the memory management ourselves. At the least we would have to carefully check none of GMP's random functions ever allocate memory, and promise not to in future. I suspect i would be easier to fix our existing code :) |
Update: Checked, GMP allocates in |
@ChrisJefferson Actually, we already use a few GMP APIs that allocate memory, though we are careful to ensure they also deallocate afterwards (and I think we discussed this several times by now ;-). But yeah, it still something to be very careful about. That said: The easiest way to find out is to try to implement it: @dimpase PRs are welcome! |
There is also the low-level API |
(Of course this will still change the overall randomness output; also, one then also has to rewrite the API for initializing the RNG state, etc., which affects quite a bit of code; it's not rocket science, but also not completely trivial). |
Well, one can call it once per GAP session, before even doing anything with GASMAN, no? |
I don't think we could clean up (easily), as I can make a random number generator, then want it GCing, so you'd have to store your own buffer somehow. |
No, you can (and I do) make many independant random sources. |
But GMP surely allows for a custom memory allocator. Naively, all one needs is to hook it up to GASMAN, no? |
In general, no, because it expects a non-moving allocator, which GASMAN is not. It might be OK in the particular case of the random number stuff, but one would have to analyze the GMP code to be sure, and that then of course is a fragile setup, because it might change in future versions of GMP. All in all, I think it's much easier to just tweak the existing, well-tested RNG code in GAP for 64bit BE, than to fiddle with GMP. |
Is there a significant difference between GASMAN and BoehmGC --- the one in HPC-GAP --- in this respect? (Excuse my ignorance). GMP/MPIR is used in Macaulay2, which also uses BoehmGC; I don't know whether they use GMP's randoms. Perhaps @DanGrayson could enlighten us here. |
Yes, there is a significant difference here: Boehm GC is non moving. It is really pointless to ask 3rd parties for expertise here. I strongly doubt that anybody knows more about the combination of GASMAN and GMP than Chris and me. We thought about similar problems a lot, and I spent considerable effort rewriting the GAP GMP integration recently. And while in principle it would be possible to use the GMP rng code here, it would not be easier than just fixing the problematic code; it would instead be a much bigger change and as sucgmh more likely to introduce regressions. |
Oh, OK, thanks, I see. So it probably would be reasonable to use GMP for HPC-GAP, but not in "classic" GAP. |
That would be possible, but I see no advantage in doing so, only disadvantages. |
Also, I'd prefer (personally) unless there is a good reason to avoid pulling in too much GMP, as we might want one day to move to another large number library, and it would be nice if that wasn't too hard -- for example I have tried compiling GAP to WebAssembly (nothing releasable yet), and one of the blocking points has been GMP. |
Isn't less code (assuming the "classic" one being retired) to maintain always an advantage? |
This is now getting into philosophical issues, but I prefer to use external libraries only when they provide significant value in terms of complexity or just amount of code (so, I wouldn't want to write my own GMP, or my own graphics library). On the other hand, once code is out of your control, then you have to deal with the owners taking it in a direction you don't like -- for example whenever GAP does something which makes Sage's life more painful :) Also, I expect non-HPC GAP to continue existing for several years yet -- it's not yet clear HPC-GAP will replace traditional GAP, or if they will continue to co-exist. |
@ChrisJefferson I apologize for the ping, but I was curious if you have had any success with the WebAssembly version of Gap |
Nothing of practical usefulness. I can make a GAP which works ( https://caj.host.cs.st-andrews.ac.uk/gap/gap.html ), but the interface is truely horrible, because I don't know anything about making Javascript interfaces |
It would be nice to have a browser-run console terminal (and there are lots of these to choose from) to run this GAP. I guess it's not something novel, just matter to find a place to copy such a setup from. |
Oh, that's actually really neat. If you need help with making a simple interface, I'd gladly assist you. |
We probably shouldn't be discussing this here :) I have a docker image here: https://github.com/ChrisJefferson/gap-javascript/ Unfortunately, it has broken but could probably be fixed. The problem with the javascript is (at least last time I looked) it was suprisingly hard to link the resulting GAP executable to a javascript interface / terminal. |
Observed behaviour
(the other tests there pass)
Expected behaviour
No diff, obviously
Copy and paste GAP banner (to tell us about your setup)
Built with vanilla gcc 7.2.
Note that as #2274 is merged, this and #2194 are the only things to be fixed for
testinstall
pass on this platform.#116 makes it hard to understand what's going on, precisely...
The text was updated successfully, but these errors were encountered: