Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use random permutation of points when selecting base in centralizer #2878

Merged
merged 4 commits into from
Oct 10, 2018

Conversation

hulpke
Copy link
Contributor

@hulpke hulpke commented Sep 27, 2018

This gets (generically) around the problem observed in #2783.

The assumption is that there are few badly behaved cases, and the random
permutation makes it unlikely that any specific setup will out of habit fall
into bad cases. (The prior strategy clearly only relied on lexicography of
points and so is not specific).

@hulpke hulpke added release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes backport-to-4.10 labels Sep 27, 2018
@codecov
Copy link

codecov bot commented Sep 27, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@482d81f). Click here to learn what that means.
The diff coverage is 85.71%.

@@            Coverage Diff            @@
##             master    #2878   +/-   ##
=========================================
  Coverage          ?   78.53%           
=========================================
  Files             ?      680           
  Lines             ?   346260           
  Branches          ?        0           
=========================================
  Hits              ?   271924           
  Misses            ?    74336           
  Partials          ?        0
Impacted Files Coverage Δ
lib/ctbl.gd 97.35% <ø> (ø)
lib/grp.gd 100% <ø> (ø)
lib/ctblfuns.gd 100% <ø> (ø)
lib/stbcbckt.gi 84.15% <85.71%> (ø)

@stevelinton
Copy link
Contributor

This worries me a little bit, because it makes things less repeatable, although I admit that this is also, to some extent, the point. If we were to fix the seed of random source used here, then we would avoid that problem -- the same input would always produce the same result, but there would be (hopefully rare and unnatural) cases which performed badly every time you ran them.

I'm not sure what is better, or if there is some way to provide user control.

@hulpke
Copy link
Contributor Author

hulpke commented Sep 28, 2018

@stevelinton I agree this makes reproduction awkward, but the whole point is to have any selection that could deterministically bad.making the random seed depend on the input again risks having certain examples fall consistently in the bad case. (Of course the best solution would be to find out why the choice is bad and use this to make consistently a good choice.)

What about an option (or flag) to turn this kind of randomization off (or set its seed)?

@ChrisJefferson
Copy link
Contributor

Just to say, this problem is (at some level) unfixable. Maybe we can do better for this instance, but in general if we could solve the problem of "choosing the right value order for finding the conjugate" I'm fairly sure we would have solved graph isomorphism in polytime :)

@stevelinton
Copy link
Contributor

@hulpke an option would be fine. My thinking was that (like quicksort) the problem was that the bad case was somehow "natural" so using a fixed seed we could instead make it unnatural and therefore much rarer in practice, but this might be wrong.

@fingolfin
Copy link
Member

If you use a fixed seed, doesn't that just shift the issue, i.e., now it's other cases that might misbehave? Of course if one is lucky, then no cases misbehaves, but I don't see why that should be the case.

There are also many other algorithms in GAP which depend on the RNG state. They are still useful; one can record the RNG state before any computation, if one is concerned about reproducibility; in recog I have to constantly do that in order to create bug reports that can be reliably reproduced.

So I don't quite follow why randomization is suddenly a big concern in this case? What am I missing?

Independent of that: I am conflicted about the "backport to 4.10" label. While this PR fixes an issue, it also has the potential to break other, unforeseen things.

Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about a comment, but other than that, I'd approve this.

lib/stbcbckt.gi Outdated Show resolved Hide resolved
@ChrisJefferson
Copy link
Contributor

So, there l always be cases where randomisation doesn't help -- in fact it would be fairly easy to prove that over the space of all problems, any fixed randomisation strategy does nothing.

However, in general people tend to specify groups in a "to them" sensible way, which will tend to lead to the natural order of the integers having some meaning to them. This means the natural order of the integers has a strong chance of being either an extremely good, or extremely bad, ordering for search.

While both old and new Sort are not stable, they do tend, when given mostly sorted sequences, to produce certain obvious patterns, which can lead to us hitting these worst-case behaviours.

There is past experience of this. McKay discusses this type of problem in Nauty, where he orders canonical images by their hash value exactly to avoid the problem of going through a sorted range and hitting a worst case behaviour.

So the question is, is this random shuffle moving us out of a specific degenerate case (which would be good), or just bouncing us from some random place to some other random place (which would mean we weren't really improving things).

As a general life rule, I tend to randomly shuffle anything where I don't have a good ordering, to avoid these kinds of degenerate behaviours anyway.

As a separate issue, I would very much like to extend/change this PR with something like #2882, which would get rid of the problem that you could get different behaviour every time you ran the function (which would be annoying).

@hulpke
Copy link
Contributor Author

hulpke commented Oct 2, 2018

@ChrisJefferson

As a separate issue, I would very much like to extend/change this PR with something like #2882, which would get rid of the problem that you could get different behaviour every time you ran the function (which would be annoying).

What is the easiest way to do so? Would you be OK to merge this and then change the random generator once #2882 (with the GPL license issue resolved) has been merged? Or would you like me to merge your PR into this one?

@hulpke
Copy link
Contributor Author

hulpke commented Oct 2, 2018

@fingolfin
I set the backport to 4.10 as a suggestion, indicating a change that is localized and (at least in principle) should not affect other routines.

Feel free to remove (or frankly ignore) the flag if this use seems to be hopelessly naive.

@ChrisJefferson
Copy link
Contributor

@hulpke : If you were happy I would ignore my PR, and do:

i:=FLOYDS_ALGORITHM(RandomSource(IsMersenneTwister),Length(cycles.firsts),false);

Running RandomSource(IsMersenneTwister) takes about .1ms, so I don't think it's worth worrying about -- having now benchmarked I'm not totally convinced it's work adding PCG32 (although if the licence can get sorted out, it might still be nice for this kind of situation).

@hulpke
Copy link
Contributor Author

hulpke commented Oct 7, 2018

@ChrisJefferson I have changed to RandomSource(IsMersenneTwister) as you suggested.

hulpke added 3 commits October 9, 2018 09:21
This gets (generically) around the problem observed in gap-system#2783.

The assumption is that there are few badly behaved cases, and the random
permutation makes it unlikely that any specific setup will out of habit fall
into bad cases. (The prior strategy clearly only relied on lexicography of
points and so is not specific).
@ChrisJefferson
Copy link
Contributor

I'm now happy with this, except (obivously) for the manual example at grp.gd:1880 needs changing, as it is failing the manual tests.

@hulpke hulpke merged commit 36ca67e into gap-system:master Oct 10, 2018
@fingolfin fingolfin removed the release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes label Oct 18, 2018
@fingolfin
Copy link
Member

Backported to 4.10 via 502780e 214ed65 87022cb 61b1c5a

This issue was labelled as not needing release notes. I removed that label, as I think it does warrant a release notes entry: this fixes an issue present in GAP 4.8 and 4.9; and in fact, with the fix, the sample code posted by @frankluebeck in issue #2783 gets much faster for me (from 110 seconds in GAP 4.7 down to 6 seconds in master).

@fingolfin fingolfin added kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements regression A bug that only occurs in the branch, not in a release topic: performance bugs or enhancements related to performance (improvements or regressions) topic: library release notes: to be added PRs introducing changes that should be (but have not yet been) mentioned in the release notes backport-to-4.10-DONE and removed backport-to-4.10 labels Oct 18, 2018
@fingolfin fingolfin changed the title Random permutation of points when selecting base in centralizer Use random permutation of points when selecting base in centralizer Oct 25, 2018
@fingolfin fingolfin added release notes: added PRs introducing changes that have since been mentioned in the release notes and removed release notes: to be added PRs introducing changes that should be (but have not yet been) mentioned in the release notes labels Oct 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-4.10-DONE kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements regression A bug that only occurs in the branch, not in a release release notes: added PRs introducing changes that have since been mentioned in the release notes topic: library topic: performance bugs or enhancements related to performance (improvements or regressions)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants