-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: Generic specialization for equality #513
Conversation
Hi @manofstick, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution! TTYL, MSBOT; |
@manofstick, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR. |
This looks pretty cool! We are definitely interested in perf optimizations in the runtime. I tried syncing this locally and running the tests, and everything seems to be working fine. |
@latkin ok, well let me finish this off; I need to also do some work for the Compare functions, as well as a bit more magic around IStructure* interfaces. I'll see how my time goes; but hopefully I'll get some peace over the weekend. Oh, and a bit more exploring I see that the the equality stuff is all around the shop; so get benefits in things like list<int> even. An example of this is:
64-bit original
32-bit-original
64-bit modified
32-bit-modified
So pretty happy with that. |
| r when r.Equals typeof<PartialEquivalenceRelation> -> | ||
match typeof<'a> with | ||
| t when t.Equals typeof<float> -> generalize (Func<_,_,_>(fun (a:float) b -> a.Equals b)) | ||
| t when t.Equals typeof<float32> -> generalize (Func<_,_,_>(fun (a:float32) b -> a.Equals b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a little uneasy about using "a.Equals b" as the base case in all of these different type-specific implementations, both here and here. For example these implementations must exactly match the bases cases implemented here https://github.com/Microsoft/visualfsharp/pull/513/files#diff-5470ddaa52d99663f8d391a77ae04f0dL1518 and here: https://github.com/Microsoft/visualfsharp/pull/513/files#diff-5470ddaa52d99663f8d391a77ae04f0dL1758
I think I'd feel more comfortable if we textually repeated (# "ceq" x y : bool #)
here. For example, I currently have to think carefully "does a.Equals(b) really get optimized down to exactly a ceq instruction?". Because "a" is a value type I'm not actually totally sure it does. If we just emit a ceq then I don't need to check.
@dsyme sure no problem. I had only avoided it as I'm not particularly comfortable with the IL voodoo syntax; but obviously a copy and paste will do the trick... |
Bit of wasted time over the weekend maybe trying to be a little bit too smart. I was trying to stop boxing of IStructuralEquality types, by using some classes dynamically created using the Reflection API, but it just didn't work. I think it is a reasonable idea, and do wonder what I did wrong, but I just don't have the time to investigate. If someone else does, then I would be appreciative. (I'm not checking it in, but the code is here; I will move on to other things that need to be done)
|
Oh, and if anyone does (did?) get around to having a look at the code that I couldn't get working (as per last comment in this thread), I should have also added thatthe Type created worked fine in an isolated test project - it was only when it was going into FSharp.Core that the problem with it surfaced. Anyways; not the end of the world; but might be some fun for someone to look at if they are bored... |
@latkin, @dsyme, I assume I'll not allowed to break existing functionality (damn!) But wondering if they following is OK for the genetic comparer:- first of all I want in preference to use the generic IComparable<> interface, but that is unmentioned in existing code as far as I can see. Now this is where I would like to make an assumption which is trying to check if the object had an f# compiler generated struct or record thus with a compiler provided implementation of istructualcomparable, icomparable and genetic version. If this is true, then I have a sealed type which have equivalent versions of comparison so it shouldn't matter which version I take (this I'd only for top level, so no comparer is being passed through the istructualcomparable call) A second case would be where only the genetic version of the interface exists on an object (this was true for the nodatime objects I believe - although I think this meant that the f# compiler didn't believe they were comparable from memory - but aiming this is not the case) (Obviously I would preferable just like to take the generic icomparable<> always, but I assume this is unacceptable due to it possible changing someone's probably buggy runtime code...) |
Yes, for now I think that's a great approach. It's possible that we would ultimately be willing to take a change along the lines of "As of F# V.v and FSharp.Core X.X.X.X, the generic comparison logic will compare types defined in freshly compiled code which implement If we backed that up by performance figures showing the reduction in boxing then it could well seem very acceptable. However that should definitely be a separate PR to the core work you're doing to improve performance without changing the preference order for comparison. |
Some more performance details... In Rounding out Visual F# 4.0 in VS 2015 RC there is a Tortoise (misspelled!) example under the "Optimized non-structural comparison operators" section. Ignoring the Hare, and just running the Tortoise on FSharp.Core.dll changes: Code
Results
|
@manofstick Those performance results are fantastic - specially the complete elimination of GC. Still not quite the Hare (so I'm glad we added NonStructuralCommparison), but the changes definitely make the default cases much faster. |
@dsyme , Here is some badness for you (in current FSharp.Core), my implementation currently doesn't match this, but I will "fix" it so that it does (I'm currently matching the non-generic version in the generic version). Badness
With the output
So "at some stage"; I think the implementation of IStructuralEquatable.Equals shouldn't used inlined IL, but rather always defer back to supplied IEqualityComparer. Obviously this will make performance much worse, but after this PR is complete, hopefully it won't be too onerous. |
These results are really impressive, unlike my ability to spell tortoise. 😊 (now fixed) |
A few of questions:
...that'll do for now... |
I have tried to follow the "first do no harm" principle as much as possible (although there will be a slight cost of construction; but minimal) but where I have failed is actually where I wanted to succeed the most; with value types. Sigh. Now this is not a new problem (I have an email conversation dated 15/6/2013 to Don about it) but I have made it a bit worse. The fact that it was already a problem though may mean that it can be ignored? So what is this problem? It's value types that are > 64 bits, tailcalls and the 64-bit JIT. So some code:
Now if this runs without BIG set, then the following times are obtained (results are in the format "time (hit count - which should be ~ equal for similar timing)":
But with BIG set, I get the following ~20 times slower (it should be a little slower, as we have two fields now rather than 1, but nowhere near this...):
Now if I turn "generate tail calls" off in this app I get:
BUT, if I also do a build of FSharp.Core with "generate tail calls" off then I get:
Which is actually the result that I desire. Now I can force this by adding a "not(not(equals))" and a "((hashcode)* -1)* -1), which is what I might do following this comment, but it is hardly desirable. So what do you think of maybe adding an attribute such as an AvoidTailCallAttribute or something like that so methods can selectively be opt out of the IL tailcall instruction being added? |
@manofstick Re your questions here
3+4. Using some code generation here is OK for the .NET 4.x profiles. Very few other platforms or PCLs will have codegen abilities I suppose. Using |
@manofstick - avoiding the tailcall looks reasonable if that's what the old code did and it affects perf. I don't mind using a hack for the purposes of preparing this PR (which feels still a bit of distance from finalized) but we should find a better solution. If you can abstract it into a (private) avoidTailcall function then that would be helpful as we can test that separately. |
RE: IStructuralEquatable.Equals, not sure if I had created a proper issue, so it is now created as #527. RE: About this PR being finalised; yeah I still have to attack Compare properly. But I'm starting to feel quite burnt out between work, kids and trying to get this over the line, so I might need a little break for a while. But, I think that it should be pretty good for GetHashCode and Equals in its current state. (And when I finally do finish this PR, I have 5 more ideas that I want to try out - I think I need a clone :-) |
Just a data point to consider - the increase in generic specialization added by this change leads to fairly significant growth in the size of FSharp.Core native images (and working set of client apps after JIT).
|
That's great! I'll be able to claim that I wrote 10% of FSharp.Core :-) Hmmm... But anyway! What tools are there to analyse the native images? 1Mb of GetHashCodes/Equals sounds like it's gone a bit haywire? I can try and play around a bit with how the code is laid out; I have more static classes than I need; possibly clumping them might be better? But, besides running "ngen" once or twice, I'm not familiar at all with native images. I'm not even sure where they live. So any assistance in analysis would be helpful. |
@manofstick just run There is nothing haywire, the increase in native code size is expected, which is why I wanted to see the numbers. The managed assembly has just one generic copy of the code, but at the native code level, JIT or AOT compilers need to produce separate native copies for every generic instantiation (ref types share 1 copy, at least). http://joeduffyblog.com/2011/10/23/on-generics-and-some-of-the-associated-overheads/ is a pretty good description. |
OK; I've clawed back about a quarter of the space used, when I get the some more time I'll claw back some more. My previous issue in regards to using reflection on types created in FSharp.Core appears worse that I originally thought, as it also appears to be an issue with non-generic types with generic methods. I didn't actually try just straight non-generic types with non-generic methods; but I probably should - hopefully time will allow at some stage. Weird anyway. |
OK, some good news; although I can't get GetMethod to work on classes as per my previous whingeing, I found that Delegate.CreateDelegate has an overload which takes a string method name (and somehow that manages to work), and so I have been able to leverage off that to achieve what I wanted to do from the start. This means that I can deprecate the Linq Expression compiled code. BUT! The portable47 profile doesn't have this overload, which is a bit of a bummer. I have just used the FX_ATLEAST_40 compilation symbol to avoid it. What I'm possibly thinking is that maybe I just wrap the whole changes in a conditional compile, and for the portable profiles we just return the slow, original, methods. Not fantastic; but maintaining the linq expression version as well as this version just seems to be a lot of pain for not much gain. Thoughts? |
Getting closer, I now just need to focus on Compare and then adding a good test suite. I haven't looked at how your tests are structured yet. Any hints as to how to approach? (Oh, and the "new" methodology using the Delegate.CreateDelegate has had vast improvements with the 32 bit code. In some of my little test runs in has halved the time, up to and beating the 64 bit impl in some cases.) |
…k, tryFindIndexBack, findIndexBack, sortWith, permute, mapFold, mapFoldBack, splitinto
add xmldoc about Seq.rev consume input sequence
Don't clobber F# SDK during CI build
reset (c) Microsoft Open Technologies, Inc. back to (c) Microsoft Cor…
Gitignore generated FSharp.Core.dll
Fix mini typo and remove commented code
Per this comment I'm requesting that this PR reverted until the code in prim-types.fs has been properly reviewed. Thanks
|
Hi @manofstick, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution! The agreement was validated by Microsoft and real humans are currently evaluating your PR. TTYL, MSBOT; |
Not longer makes sense after #966 |
NB: This is not ready for merging yet (I need to find some more time; two cheeky monkeys consume more of my free time!) but it is intended to start a conversation. I will add more to this over time, assuming you are happy with the direction.
NB 2: There are some massive performance differences between .net 4.6 and lesser versions. I have been using 4.6 (and results specified here are from that version) but the performance optimizations are still applicable to older versions (I haven't gone to the assembly code level; but it appears to my ignorant eyes that older version were not inlining code that originated in fsharp.core.... or something :-) )
So what is this optimization?
As you are aware, F# automatically creates GetHashCode/Equals/Compare for Records and structs. In the case properties of concrete types, these are optimized to IL, but where you have generic parameters the functions are delegated to helper functions in LanguagePrimitives. For complex types, the performance of this is probably not a concern, but where you have like a simple struct 2 value key it can be a bit slow.
The following is an example (OK; it's a micro-benchmark; take with a grain of salt; blah-blah-blah; but I have come across things like this multiple times in the real world where you find an inner loop that is not too dissimilar):
The unmodified results for this are (first section is concrete type, the second part after the "------" is the generic type; the first number is the time in ms, the second number is just a checksum; first batch and second batch should match, but really it's just junk in these tests....) :
32-bit:
64-bit:
Including the optimization I have trhe following results:
32-bit:
64-bit:
So to summarise; in the example of smashing a HashSet, times for both 32-bit and 64-bit have been halved (obviously it's even greater speed up than that, as that is including all the HashSet functionality).
Anyway; let me know if this is the kind of thing you are interested in. I have been running the smoketest and everything seems fine (but not really 100% happy as I haven't had time to explore the code around there). Anyway I need to head off to bed now, so hopefully I have left enough information here for you to evaluate!