-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Improve GAP's sort functions #609
Conversation
This looks really good. For some reason I thought that the previous sort was stable, but having a quick look around I can't seem to confirm this (ShellSort isn't stable). I do actually think that we should be looking at least at templates from C++. Sure we can hack up most things using macros, but templates would make some things more clean (permutations, transformations, I am looking at you). |
Now I've written this, this would be an ideal time to add StableSort and StableSortParallel, if users wantwant them. |
I seem to remember vaguely that in some code in the past I did slightly dodgy things with `Sort' (using an order that is not really total), relying on the fact that the sort was stable. Thus I want a StableSort. |
@hulpke : Do you mean you want a stable sort adding, or you want GAP's sort to remain stable? Because currently,
|
@ChrisJefferson Apparently then I've been wrong before -- I erroneously thought shell sort was stable and that had been the reason for using it. So my request would be to also have a stable sort. |
Having a StableSort (and StableSortBy, and StableSortParallel) would indeed be useful. As to the implementation: I can live with the macro tricks (I did something similar, though much simpler, for the pc collector code, after all). I'd clearly prefer to rewrite this using C++ templates, but as you say, this might be a bit too much right now. Hmm, though, is it really? Couldn't we just compile individual files with C++, making sure to only invoke But don't get me wrong: since you already wrote the code, and it seems to work, I don't mind it being added for now, and rewritten (if desired) later on. |
BTW, I think this is a super cool change you are making -- all my review comments are meant as constructive criticism, as always, I hope they get across that way. If anything sounds unreasonable, please let me know |
After looking at some of the commits, I think you might want to consider squashing several of them. |
SORT_ASS_TEMP_TO_LIST( k, w ); | ||
k -= h; | ||
if ( h+(start-1) < k ) { | ||
SORT_ASS_LIST_TO_TEMP( w, k-h ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad indention (tabs?)
As a general thing, I'll squish commits, and probably just run through something like clang-format (as it's a new file). The C++, I think it's worth asking. I expect it to be rare nowadays for people to have a C compiler and not a C++ one (at least, it will be trivial for them to install one), and we can stick with C++03, which everyone with a C++ compiler will have. |
More a question than a remark: Can one compile some files with C and some with C++ and have a guarantee that everything still fits together? |
@hulpke : The short answer is yes. There is already C++ in kernel code in the The (only slightly) longer answer is that you have to wrap the prototypes of C++ functions you want to call from C with @ChrisJefferson : there could be issues with libGAP properly wrapping up C++, although not impossible to resolve, of course. As long as ./configure is generated by autoconf, finding C++ compiler is trivial... |
@ChrisJefferson: This is nice! I have made some more tests than those you mention in the first post, in particular with much shorter lists. There were only a few cases where the I remember that years ago we have made some comparisons of various sorting algorithms. The experience was that some algorithms were faster than the shell sort for certain types of input but (sometimes much) slower on other types (long vs. short lists, almost sorted vs reversely sorted vs pretty random, many different vs few different entries, ...). So, at that time our conclusion was a confirmation of the comment in The pdqsort strategy is more complicated than the previous shell sort but seems to be an even better choice for general purpose sorting. |
I have needed a stable sort on a few occasions and used the following generic workaround:
I used
|
instead of replying to a comment, I edited it, sorry @ChrisJefferson :-) |
3760dbb
to
1cdf7ed
Compare
Now featuring These methods are (not unexpectedly) slower than To get the speed higher, one could implement one of the modern stable sorts which is similar in design to pdqsort, such as timsort (used in Python, among other places). I leave such an implementation to anyone who wants to volunteer :) |
Such an implementation can be done later, no need to delay things for it ;-) There are test failures now -- interstingly, in the new small groups code I added. Perhaps this is a case of an unstable sort result differing between old and new sort implementation, leading to different choice being made in the algorithm? Hmmm |
In some quick experiments, the old sort is "more stable" in some cases, but nothing anyone could reliably count on. |
Once concern that I have is that the choice of pdqsort may not be the best for GAP (on the other hand, mergesort is now available through Some quick benchmarking seems to indicate that the fallback to insertion sort is counterproductive for mergesort; mergesort is faster without that, and can be further optimized by not merging already sorted ranges (I did not test a further optimization with handcoding the cases for ranges of length 1 or 2). For (short) integers, pdqsort performs better, while mergesort beats out pdqsort for strings, especially when those are partially sorted already. |
My mergesort should certainly be improved (python's timsort could be a good basis). If we can make a stable sort that's good enough for the general sort, that would avoid the need for two sorts. |
1cdf7ed
to
1d34563
Compare
@fingolfin (or others) I would be interested to know if the change in |
@ChrisJefferson As far as I understand StructrureDescription some decisions (on which of multiple decompositions to take) are based on selecting from a sorted list. Also the newly obtained structure descriptions seem to be as valid as the expected ones, so I'm inclined to consider this as a harmless side-effect. |
@ChrisJefferson I have checked and all three problematic (Not to mention that I suggest to circumvent the test by replacing the old values in lib/grpnames.g by the new values for the three groups for which the difference occurs. I do not mind to do it in another PR after this PR is merged, if that is the preference. I think it is a good idea to have |
Is this documented/discussed somewhere? |
Offtopic: What would be the proper forum to open a discussion about this? To open an issue? |
On Thu, Feb 18, 2016 at 12:13:35PM -0800, hungaborhorvath wrote:
If you are planning to make major changes to the GAP system the best way to go We hope that this way you will get early feedback on your project and (hopefully) If you have code, the best thing to do is open a pull-request. I don't think that discussions about design decisions are well-placed in the |
Shall we update the tests then and go ahead with a merge? |
@markuspf yes |
@ChrisJefferson this PR produces a lot of warnings about variables being set but not used. I am investigating, but assme this is due to the generecity of the macros?. |
Another thing that this turns up is of course the slight horror of reading the code to be able to understand/debug it. But since it took me only 15 minutes I hope that's not too much of a factor for most. |
I'm not seeing any warnings, did you fix them, or is my computer just not producing them? |
On Fri, Feb 26, 2016 at 12:29:26AM -0800, Christopher Jefferson wrote:
You can see them in the travis test. I didn't fix them yet. I get the follwoing warnings (compiling with gcc 5.3.1 [DragonFly]
(There are more of them, but I think you see the point). |
Just for confirmation, these have been fixed. |
@ChrisJefferson - thanks, just to confirm that nightly tests with -Werror option are now back to normal. |
Before looking at this patch, look at these soothing benchmarks. These each show a list, and how long it takes to show to sort it in master, and in this branch.
This patch aims to do 2 things:
The 8 sorts are all combinations of:
The way this works is we define a small language of macro functions which must be implemented to instantiate our sort.
The main question is, do people consider the macros worth the reduction in code duplication, or does anyone have a suggestion on how to handle this differently, before I apply final spit+polish? (The answer "turn the GAP kernel in C++" seems like too huge a step right now).