-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove StructBox for Value Types #549
Conversation
The StructBox makes code that contains "hard" tail calls, which means that performance suffers under the 64 bit JIT
There must be some other way to check if a type is a Value Type? I doubt if it has been removed?
Hi @manofstick, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution! The agreement was validated by Microsoft and real humans are currently evaluating your PR. TTYL, MSBOT; |
The following are some results from a timing test listed below. Note that the times for the 64-bit JIT are awful, even after this change (albeit a fair bit better). This is due to the 'hard' tail call optimizations. Removing the StructBox removed a layer of function calls that emitted tails, but there are still other performance layers to tear away (i.e. #513) (not sure if the broken build is my fault - I do need to check this [on my laptop on the tram doesn't make for good full builds], but the results from appveyor seem to not have done anything)
The code
|
I don't think this is a good way to structure exceptions, but it's to match current functionality
Impressive. I haven't done a detailed code review yet (others - please help) but there's a lot of goodness here. Which specific tailcalls are causing the 64-bit slowdown? (give links to specific lines?) We should likely add a general .NET perf bug about that, and also consider if there is a systematic way to eliminate these in the F# compiler itself rather than manually. |
Here is a cut down version which replicates the issue (could probably be a bit more minimal, but it's not too long...)
So build this with the MAKE_IT_FAST compilation symbol, and the f# compiler won't generate a tail instruction, and the performance will be basically equal for the two versions, otherwise the one with tail with be ~10x slower. (I have just exchanged the HashIdentity.Structural to EqualityComparer.Default just to simplify what is going on) I don't think this is a JIT "issue" per se - i.e. I think it is just a natural consequence of the x64 calling convention, although I haven't actually worked through the underlying assembly myself, I'm just relying on the overall gist of this and this. |
(Oh, and even without the tailcall issue, the change shows a 10-15% time decrease in the test suite on the x32 version which I think is still a reason enough improvement to warrant the change.) |
|
||
// Build the groupings | ||
for v in array do | ||
let key = Microsoft.FSharp.Core.CompilerServices.RuntimeHelpers.StructBox (projection v) | ||
let key = projection v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these guys I would consider naming the value safeKey
, to match the given type name. As it stands, the value key
does not have type 'Key
, nor could it be returned by getKey
...
Similarly throughout.
I'm happy to use "ResizeArray ()", and that is actually what I originally did, but the "problem" I found was that that decreased the performance of the use case where single allocations were occurring when there wasn't a second use of initialBucketSize to limit the call to TrimExcess(). That was why my comment/rant re: stingy allocation due to all the other baggage that is allocated. Personally I use groupBy a lot, and I can't recall the last time I used it to bucket things into single items. (And I assume my usage is not that uncommon.) So... How about we keep the restriction on the call to TrimExcess, and use the default constructor for ResizeArray ()? (I'm happy not to have the restriction on TrimExcess and use the default constructor, but it does punish people performance wise if they are using the single item case... or two or three items... Really the restriction should be there...) Does this make sense? I'm sure this would be easier to explain over a beer... (Maybe I'll just have a few and rewrite the message...) |
That sounds good to me - just use defaults for the resizearrays themselves, but maintain the new limitation on TrimExcess calls. |
Initially this had been set to 1, I had changed it to 4, but after discussion it was decided that the default is probably the correct choice. As per #549 (comment)
Done, and done. |
Oh geez, i was just sending you a PR with those, plus an adjustment to the I'll just add it myself then merge, unless you have an objection. |
SNAP! :-) no worries; do as you see fit, I am but your humble servant! |
@manofstick using your test code, I get below results on my 2 machines. On my slow box I see very minor (but consistent) improvement. On my fast box I see much more significant improvement, in line with your results. Machine 1 x86
x64 RyuJIT
x64 legacy JIT
Machine 2 x86
x64 RyuJIT
x64 legacy JIT
|
The StructBox makes code that contains "hard" tail calls, which means that performance suffers under the 64 bit JIT closes #549 commit 36f10b6214d8b73140b481e391f7999b9b8be8a3 Author: latkin <latkin@microsoft.com> Date: Wed Aug 12 12:40:46 2015 -0700 Proper ref/val type checking for all portable profiles commit 037a5e1 Author: Paul Westcott <p...w.e.s.t.c.o.t.t@gmail.com> Date: Thu Aug 13 05:50:29 2015 +1000 Using default constructor for ResizeArray Initially this had been set to 1, I had changed it to 4, but after discussion it was decided that the default is probably the correct choice. As per #549 (comment) commit 3796a55 Author: Paul Westcott <p...w.e.s.t.c.o.t.t@gmail.com> Date: Thu Aug 13 05:45:38 2015 +1000 Renamed key to safeKey where appropriate As per #549 (diff) commit b7884f8 Author: Paul Westcott <p...w.e.s.t.c.o.t.t@gmail.com> Date: Wed Jul 22 17:12:30 2015 +1000 Restored null arg exception as lazy I don't think this is a good way to structure exceptions, but it's to match current functionality commit 23cc156 Author: Paul Westcott <p...w.e.s.t.c.o.t.t@gmail.com> Date: Wed Jul 22 05:53:33 2015 +1000 Split dict by ValueType/RefType commit d4b6861 Author: Paul Westcott <p...w.e.s.t.c.o.t.t@gmail.com> Date: Wed Jul 22 05:10:39 2015 +1000 Split Array.countBy/groupBy by ValueType/RefType commit 02e6d42 Author: Paul Westcott <p...w.e.s.t.c.o.t.t@gmail.com> Date: Wed Jul 22 04:55:42 2015 +1000 Split List.groupBy for ValueType/RefType commit d80e616 Author: Paul Westcott <p...w.e.s.t.c.o.t.t@gmail.com> Date: Wed Jul 22 04:43:54 2015 +1000 Split List.countBy by RefType/ValueType commit 202e12e Author: Paul Westcott <p...w.e.s.t.c.o.t.t@gmail.com> Date: Wed Jul 22 04:27:45 2015 +1000 "Fixing" Reflection issues with Profile builds There must be some other way to check if a type is a Value Type? I doubt if it has been removed? commit c06d8e6 Author: Paul Westcott <p...w.e.s.t.c.o.t.t@gmail.com> Date: Tue Jul 21 16:07:33 2015 +1000 Split Seq.countBy for ValueType/RefType commit 1c5ce38 Author: Paul Westcott <p...w.e.s.t.c.o.t.t@gmail.com> Date: Tue Jul 21 15:42:25 2015 +1000 Split Seq.groupBy for ValueType/RefType The StructBox makes code that contains "hard" tail calls, which means that performance suffers under the 64 bit JIT
Applied to OOB branch |
The StructBox idiom is used for countBy/groupBy/dict so that reference type's with null values can work.
For Value Types this is just an extra layer of indirection which adds a performance cost without any benefit. This change request adds checks on the keys type, and then calls a customised value or reference type version.
This is related to #513, but can be considered a worthwhile optimization on it's own.