-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frustrations #5
Comments
Well, it doesn't exist. My workflow up until this point has been chaotic and very free-flowing, where commits essentially have been just "work" and "stuff" for the most part. I don't particularly feel like showing this stuff to the rest of the world.
I have to clarify this claim: this overhead is mainly in storing the state in structs.
I specified the processors in another comment. Obviously the full paper will have complete details, this was just a sneak peek.
When we move elements out from an
This structure is simply used to maintain the state of the merge (which would normally be local variables in a function), so that if a panic occurs we can restore this state.
As mentioned in my glidesort talk, a left partition is one that puts elements equal to the pivot on the left, a right partition is one that puts elements equal to the pivot on the right. I should probably add a comment here indeed though.
Yes.
The reason The
It is novel. It will be in the paper.
Any pivot selection scheme based on a sample of the array is vulnerable to things like this. All you can do is shuffle the worst cases around. I chose glidesort to be deterministic for practical reasons, you can easily add randomization to this scheme.
I use(d) it to create the visualizations. It allows you (with that feature flag) to keep track of every move glidesort makes. |
Thanks for answering the questions. They're just a sample of things I have to hop around files to understand; my point is that it would be much better to have answers such as these in the repository.
scandum's commit messages are all version numbers and "Update README.md". Nonetheless I'm glad I can see the changes.
True, but some schemes are more vulnerable to common patterns. I found when working on pdqsort that there are real problems on such patterns and later explained the problem here. The whole foundation of timsort/powersort is that some kinds of structure are common in real-world data! |
My brother in christ, the repository has been open for 3 days, chill????
EDIT: I've come to know a lot of people who fear opening an unfinished/unpolished project because someone might come and judge, well, thanks for feeding this fear 👍. |
Sorry, I'm frustrated as you can see. It is of course good work, which is why I'm taking the time to read it. The powersort integration and extension of branchless merging to unbalanced lengths seem very valuable. My complaint is that the level of polish is too much! Orson's spent many months in order to present this as a finished algorithm to the Rust community, and also given great explanations of many of the features. It's the gap between the depth and quality of his work, and the fact that it's resulted in what's honestly the scariest sorting codebase I've seen this side of ips4o, that has me upset. Of course if I was just frustrated, I'd have kept my mouth shut. I'm hoping my comments will help to make it more approachable because I know that's possible. And, yes, I've not been very kind about it and I'm sorry for that. I hope it can be taken as constructive despite the negativity. |
Well, I'll be blunt. I find the glidesort codebase hard to work with even by the standards of sorting research. I feel that in your desire to give a polished view of the algorithm to the world, you've actually made it harder for people who want to dig into the details. Given that pdqsort (Rust version included) was basically my entry point into high-performance sorting, seeing the next step like this is tough. Worse, from what I understand of the algorithm, it doesn't seem to be much more complicated than pdqsort, given that it throws out a lot of things like pivot scrambling and heapsort. I'll describe my difficulties as best as I'm able to give you the most information if you'd like to help.
I believe a real commit history would be very useful, and find the decision to publish as a single commit surprising for software presented at an open source conference. You apparently had enough of an implementation to benchmark for the talk in May, without full panic safety infrastructure. Because I can't access any version like this, I have no way to test your claim that panic safety accounts for 10-15% of time taken by the algorithm. Could it be different across processors? I have no insight into how tuning decisions were made, which is often available in the history too.
You've shared benchmarks from an ARM machine that's presumably your M1, and an unspecified AMD processor, as a png. Could you include, or link to, the processor specs and raw data in this repository?
Generally it feels that while sorting concepts are well explained, the way they are implemented isn't. As I understand it the way you use Rust isn't typical, so I expect even fluent Rust readers (I'm not one) could use some help. For example gap_guard.rs makes no attempt to explain what "the gap" is. And much of branchless_merge.rs is taken up by implementation of the
BranchlessMergeState
structure with no explanation of how this structure will be used.Other structures have no comments at all. I suppose the names are supposed to be self-documenting. Take
enum PartitionStrategy<T>
in quicksort.LeftWithPivot
is meaningless to me. What goes left? And of course there's a pivot, you're partitioning! Eventually I figured out that the un-named parameter is the pivot value to be used. Is left pivoting the variety used for the left side in a bidirectional partition? Because the block comment at the top never connects to any specific part of the code, I can't tell. AreLeftIfNewPivotEquals
andLeftIfNewPivotEqualsCopy
identical other than the way they way they store the pivot? The definition ofpartition_left
certainly suggests this, but laterless_strategy
andgeq_strategy
recognize only theCopy
version.Where does the recursive median-based strategy for pivot selection come from? Is there a reference? To me it seems obviously questionable because if just two of the three systematically-chosen regions based on
a
,b
, andc
have lower median values, then you'll get a low pivot. For example, what happens with an array consisting of three up-down patterns?What is tracking? I couldn't even google
cfg(feature = "tracking")
.The text was updated successfully, but these errors were encountered: