-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use random permutation of points when selecting base in centralizer #2878
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2878 +/- ##
=========================================
Coverage ? 78.53%
=========================================
Files ? 680
Lines ? 346260
Branches ? 0
=========================================
Hits ? 271924
Misses ? 74336
Partials ? 0
|
This worries me a little bit, because it makes things less repeatable, although I admit that this is also, to some extent, the point. If we were to fix the seed of random source used here, then we would avoid that problem -- the same input would always produce the same result, but there would be (hopefully rare and unnatural) cases which performed badly every time you ran them. I'm not sure what is better, or if there is some way to provide user control. |
@stevelinton I agree this makes reproduction awkward, but the whole point is to have any selection that could deterministically bad.making the random seed depend on the input again risks having certain examples fall consistently in the bad case. (Of course the best solution would be to find out why the choice is bad and use this to make consistently a good choice.) What about an option (or flag) to turn this kind of randomization off (or set its seed)? |
Just to say, this problem is (at some level) unfixable. Maybe we can do better for this instance, but in general if we could solve the problem of "choosing the right value order for finding the conjugate" I'm fairly sure we would have solved graph isomorphism in polytime :) |
@hulpke an option would be fine. My thinking was that (like quicksort) the problem was that the bad case was somehow "natural" so using a fixed seed we could instead make it unnatural and therefore much rarer in practice, but this might be wrong. |
If you use a fixed seed, doesn't that just shift the issue, i.e., now it's other cases that might misbehave? Of course if one is lucky, then no cases misbehaves, but I don't see why that should be the case. There are also many other algorithms in GAP which depend on the RNG state. They are still useful; one can record the RNG state before any computation, if one is concerned about reproducibility; in So I don't quite follow why randomization is suddenly a big concern in this case? What am I missing? Independent of that: I am conflicted about the "backport to 4.10" label. While this PR fixes an issue, it also has the potential to break other, unforeseen things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns about a comment, but other than that, I'd approve this.
So, there l always be cases where randomisation doesn't help -- in fact it would be fairly easy to prove that over the space of all problems, any fixed randomisation strategy does nothing. However, in general people tend to specify groups in a "to them" sensible way, which will tend to lead to the natural order of the integers having some meaning to them. This means the natural order of the integers has a strong chance of being either an extremely good, or extremely bad, ordering for search. While both old and new Sort are not stable, they do tend, when given mostly sorted sequences, to produce certain obvious patterns, which can lead to us hitting these worst-case behaviours. There is past experience of this. McKay discusses this type of problem in Nauty, where he orders canonical images by their hash value exactly to avoid the problem of going through a sorted range and hitting a worst case behaviour. So the question is, is this random shuffle moving us out of a specific degenerate case (which would be good), or just bouncing us from some random place to some other random place (which would mean we weren't really improving things). As a general life rule, I tend to randomly shuffle anything where I don't have a good ordering, to avoid these kinds of degenerate behaviours anyway. As a separate issue, I would very much like to extend/change this PR with something like #2882, which would get rid of the problem that you could get different behaviour every time you ran the function (which would be annoying). |
What is the easiest way to do so? Would you be OK to merge this and then change the random generator once #2882 (with the GPL license issue resolved) has been merged? Or would you like me to merge your PR into this one? |
@fingolfin Feel free to remove (or frankly ignore) the flag if this use seems to be hopelessly naive. |
@hulpke : If you were happy I would ignore my PR, and do:
Running |
@ChrisJefferson I have changed to RandomSource(IsMersenneTwister) as you suggested. |
This gets (generically) around the problem observed in gap-system#2783. The assumption is that there are few badly behaved cases, and the random permutation makes it unlikely that any specific setup will out of habit fall into bad cases. (The prior strategy clearly only relied on lexicography of points and so is not specific).
Also expanded comment.
I'm now happy with this, except (obivously) for the manual example at grp.gd:1880 needs changing, as it is failing the manual tests. |
Backported to 4.10 via 502780e 214ed65 87022cb 61b1c5a This issue was labelled as not needing release notes. I removed that label, as I think it does warrant a release notes entry: this fixes an issue present in GAP 4.8 and 4.9; and in fact, with the fix, the sample code posted by @frankluebeck in issue #2783 gets much faster for me (from 110 seconds in GAP 4.7 down to 6 seconds in master). |
This gets (generically) around the problem observed in #2783.
The assumption is that there are few badly behaved cases, and the random
permutation makes it unlikely that any specific setup will out of habit fall
into bad cases. (The prior strategy clearly only relied on lexicography of
points and so is not specific).