Optimization: Generic specialization for equality #513

manofstick · 2015-06-25T11:22:11Z

NB: This is not ready for merging yet (I need to find some more time; two cheeky monkeys consume more of my free time!) but it is intended to start a conversation. I will add more to this over time, assuming you are happy with the direction.

NB 2: There are some massive performance differences between .net 4.6 and lesser versions. I have been using 4.6 (and results specified here are from that version) but the performance optimizations are still applicable to older versions (I haven't gone to the assembly code level; but it appears to my ignorant eyes that older version were not inlining code that originated in fsharp.core.... or something :-) )

So what is this optimization?

As you are aware, F# automatically creates GetHashCode/Equals/Compare for Records and structs. In the case properties of concrete types, these are optimized to IL, but where you have generic parameters the functions are delegated to helper functions in LanguagePrimitives. For complex types, the performance of this is probably not a concern, but where you have like a simple struct 2 value key it can be a bit slow.

The following is an example (OK; it's a micro-benchmark; take with a grain of salt; blah-blah-blah; but I have come across things like this multiple times in the real world where you find an inner loop that is not too dissimilar):

open System
open System.Diagnostics
open System.Collections.Generic

type StructureInt = 
    struct
        val A : int
        val B : int
        new(a, b) = {
            A = a
            B = b
        }
    end

type StructureGeneric<'a,'b> = 
    struct
        val A : 'a
        val B : 'b
        new(a, b) = {
            A = a
            B = b
        }
    end

let mutable r = Random ()

let a createRandomObj =
    let set = HashSet ()
    let mutable count = 0


    let sw = Stopwatch.StartNew()

    for i = 0 to 200000 do
        set.Add (createRandomObj ()) |> ignore

        for i = 0 to 10 do
            count <- count + if set.Contains (createRandomObj ()) then 1 else 0

    printfn "%A (%A)" sw.ElapsedMilliseconds count

[<EntryPoint>]
let main argv =
    r <- Random 314159265
    for i = 0 to 10 do
        a (fun () -> StructureInt (r.Next 250, r.Next 250))

    printfn "-------"

    r <- Random 314159265
    for i = 0 to 10 do
        a (fun () -> StructureGeneric (r.Next 250, r.Next 250))

    0

The unmodified results for this are (first section is concrete type, the second part after the "------" is the generic type; the first number is the time in ms, the second number is just a checksum; first batch and second batch should match, but really it's just junk in these tests....) :

32-bit:

429L (1542136)
405L (1540851)
368L (1538895)
381L (1538510)
335L (1541558)
338L (1540543)
334L (1539466)
335L (1543310)
333L (1540047)
337L (1541890)
338L (1538828)
-------
1182L (1542136)
1160L (1540851)
1166L (1538895)
1177L (1538510)
1163L (1541558)
1167L (1540543)
1176L (1539466)
1162L (1543310)
1166L (1540047)
1182L (1541890)
1163L (1538828)

64-bit:

468L (1542136)
430L (1540851)
385L (1538895)
381L (1538510)
321L (1541558)
320L (1540543)
326L (1539466)
326L (1543310)
324L (1540047)
325L (1541890)
323L (1538828)
-------
800L (1542136)
786L (1540851)
780L (1538895)
792L (1538510)
788L (1541558)
773L (1540543)
780L (1539466)
784L (1543310)
776L (1540047)
784L (1541890)
788L (1538828)

Including the optimization I have trhe following results:

32-bit:

437L (1542136)
354L (1540851)
340L (1538895)
333L (1538510)
334L (1541558)
337L (1540543)
332L (1539466)
340L (1543310)
338L (1540047)
338L (1541890)
336L (1538828)
-------
615L (1542136)
603L (1540851)
602L (1538895)
618L (1538510)
613L (1541558)
609L (1540543)
618L (1539466)
612L (1543310)
610L (1540047)
619L (1541890)
611L (1538828)

64-bit:

589L (1542136)
411L (1540851)
324L (1538895)
323L (1538510)
321L (1541558)
325L (1540543)
321L (1539466)
321L (1543310)
320L (1540047)
323L (1541890)
321L (1538828)
-------
417L (1542136)
402L (1540851)
396L (1538895)
408L (1538510)
405L (1541558)
401L (1540543)
404L (1539466)
403L (1543310)
404L (1540047)
403L (1541890)
406L (1538828)

So to summarise; in the example of smashing a HashSet, times for both 32-bit and 64-bit have been halved (obviously it's even greater speed up than that, as that is including all the HashSet functionality).

Anyway; let me know if this is the kind of thing you are interested in. I have been running the smoketest and everything seems fine (but not really 100% happy as I haven't had time to explore the code around there). Anyway I need to head off to bed now, so hopefully I have left enough information here for you to evaluate!

msftclas · 2015-06-25T11:22:15Z

Hi @manofstick, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla.microsoft.com.

TTYL, MSBOT;

msftclas · 2015-06-25T11:29:29Z

@manofstick, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, MSBOT;

latkin · 2015-06-26T03:58:42Z

This looks pretty cool! We are definitely interested in perf optimizations in the runtime.

I tried syncing this locally and running the tests, and everything seems to be working fine. GenericXIntrinsic, does not get inlined, so this should not affect user codegen?

manofstick · 2015-06-26T10:12:11Z

@latkin ok, well let me finish this off; I need to also do some work for the Compare functions, as well as a bit more magic around IStructure* interfaces. I'll see how my time goes; but hopefully I'll get some peace over the weekend.

Oh, and a bit more exploring I see that the the equality stuff is all around the shop; so get benefits in things like list<int> even. An example of this is:

// create lots of lists to try not to get cache benefits
let lists = 
    Array.init 1000000 (fun i ->
        List.init 10 (fun j ->
            if i % 2 = 0 then -j else j))

let sw = Stopwatch.StartNew ()
let r = Random 42
let mutable count = 0
for i=0 to 1000000 do
    let a = lists.[r.Next lists.Length]
    let b = lists.[r.Next lists.Length]
    count <- count + if a = b then 1 else 0

printfn "%A (%A)" sw.ElapsedMilliseconds count

64-bit original

810L (500464)
1015L (500464)
913L (500464)
902L (500464)
996L (500464)
753L (500464)
974L (500464)
762L (500464)
892L (500464)
1002L (500464)
760L (500464)

32-bit-original

1500L (500464)
1434L (500464)
1497L (500464)
1485L (500464)
1485L (500464)
1500L (500464)
1522L (500464)
1542L (500464)
1504L (500464)
1522L (500464)
1517L (500464)

64-bit modified

685L (500464)
449L (500464)
522L (500464)
431L (500464)
508L (500464)
441L (500464)
524L (500464)
420L (500464)
502L (500464)
440L (500464)
452L (500464)

32-bit-modified

743L (500464)
655L (500464)
615L (500464)
531L (500464)
485L (500464)
523L (500464)
506L (500464)
579L (500464)
530L (500464)
550L (500464)
549L (500464)

So pretty happy with that.

dsyme · 2015-06-26T13:25:29Z

src/fsharp/FSharp.Core/prim-types.fs

+                        | r when r.Equals typeof<PartialEquivalenceRelation> ->
+                            match typeof<'a> with
+                            | t when t.Equals typeof<float>   -> generalize (Func<_,_,_>(fun (a:float)   b -> a.Equals b))
+                            | t when t.Equals typeof<float32> -> generalize (Func<_,_,_>(fun (a:float32) b -> a.Equals b))


I feel a little uneasy about using "a.Equals b" as the base case in all of these different type-specific implementations, both here and here. For example these implementations must exactly match the bases cases implemented here https://github.com/Microsoft/visualfsharp/pull/513/files#diff-5470ddaa52d99663f8d391a77ae04f0dL1518 and here: https://github.com/Microsoft/visualfsharp/pull/513/files#diff-5470ddaa52d99663f8d391a77ae04f0dL1758

I think I'd feel more comfortable if we textually repeated (# "ceq" x y : bool #) here. For example, I currently have to think carefully "does a.Equals(b) really get optimized down to exactly a ceq instruction?". Because "a" is a value type I'm not actually totally sure it does. If we just emit a ceq then I don't need to check.

manofstick · 2015-06-26T19:36:51Z

@dsyme sure no problem. I had only avoided it as I'm not particularly comfortable with the IL voodoo syntax; but obviously a copy and paste will do the trick...

manofstick · 2015-06-28T10:53:20Z

Bit of wasted time over the weekend maybe trying to be a little bit too smart. I was trying to stop boxing of IStructuralEquality types, by using some classes dynamically created using the Reflection API, but it just didn't work. I think it is a reasonable idea, and do wonder what I did wrong, but I just don't have the time to investigate. If someone else does, then I would be appreciative. (I'm not checking it in, but the code is here; I will move on to other things that need to be done)

// Runtime template type specialization code
let getGenericSpecialializationTypeStaticFuncProperty<'typedef, 't, 'func> () =
    let t = typedefof<'typedef>.MakeGenericType [|typeof<'t>|]
    let func = t.GetProperty ("Func", Reflection.BindingFlags.Static ||| Reflection.BindingFlags.Public)
    let getter = func.GetGetMethod ()
    match getter.Invoke (null, [||]) with
    | :? 'func as f -> f
    | _ -> raise (Exception "Invalid logic")

type GenericSpecializeGetHashCode<'a when 'a : equality>() =
    static let _func = Func<_,_,_>(fun (_:IEqualityComparer) (a:'a) ->
        match box a with
        | null -> 0
        | _ -> a.GetHashCode ())
    static member Func = _func

let StructuralEquatableGetHashCode (iec:IEqualityComparer) (o:#IStructuralEquatable) =
    o.GetHashCode iec

type GenericSpecializeStructuralEquatableStructGetHashCode<'a when 'a : struct and 'a :> IStructuralEquatable>() =
    static let _func =
        Func<_,_,_>(fun (iec:IEqualityComparer) (a:'a) ->
            StructuralEquatableGetHashCode iec a)
    static member Func = _func

[<CustomEquality; NoComparison>]
type GenericSpecializeStructuralEquatableStructGetHashCodeDummy =
    struct
        interface IStructuralEquatable with
            member __.GetHashCode _ = raise (Exception "dummy class for typedefof")
            member __.Equals (_,_) = raise (Exception "dummy class for typedefof")
    end

type GenericSpecializeStructuralEquatableReferenceGetHashCode<'a when 'a :> IStructuralEquatable>() =
    static let _func =
        Func<_,_,_>(fun (iec:IEqualityComparer) (a:'a) ->
            match box a with
            | null -> 0
            | _ -> StructuralEquatableGetHashCode iec a)
    static member Func = _func

type GenericSpecializeHash<'a>() =
    static let generalize (func:Func<IEqualityComparer,'aa,int>) =
        match box func with
        | :? Func<IEqualityComparer,'a,int> as f -> f
        | _ -> raise (Exception "invalid logic")
    // this function must replicate GenericHashParamObj
    static let _func : Func<IEqualityComparer, 'a, int> =
        match typeof<'a> with
        | t when t.IsArray ->
            // I should optimize these as well, but probably not a particularly great gain.
            null
        | t when typeof<IStructuralEquatable>.IsAssignableFrom t ->
            if not t.IsValueType
                then getGenericSpecialializationTypeStaticFuncProperty<GenericSpecializeStructuralEquatableReferenceGetHashCode<_>, 'a, Func<IEqualityComparer, 'a, int>> ()
                else
                    // This throws a "System.NotSupportedException" from the GetProperty method of the generated type
                    // from getGenericSpecialializationTypeStaticFuncProperty. i.e. it appears that the MakeGenericType
                    // function works; because it doesn't throw an exception, which as I understand the documentation
                    // it should if the type doesn't meet the generic type constraints. So I return null, and to get
                    // fallback functionality, but would like someone else to investigate.
                    // 
                    // getGenericSpecialializationTypeStaticFuncProperty<GenericSpecializeStructuralEquatableStructGetHashCode<GenericSpecializeStructuralEquatableStructGetHashCodeDummy>, 'a, Func<IEqualityComparer, 'a, int>> ()
                    //
                    null

        | t when t.IsSealed && not t.IsValueType ->
            getGenericSpecialializationTypeStaticFuncProperty<GenericSpecializeGetHashCode<_>, 'a, Func<IEqualityComparer, 'a, int>> ()

        // A general t.IsSealed for value types "should" work I think, but suffers from the same exception as above
        | t when t.Equals typeof<bool>       -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:bool)       -> a.GetHashCode()))
        | t when t.Equals typeof<sbyte>      -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:sbyte)      -> a.GetHashCode()))
        | t when t.Equals typeof<int16>      -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:int16)      -> a.GetHashCode()))
        | t when t.Equals typeof<int32>      -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:int32)      -> a.GetHashCode()))
        | t when t.Equals typeof<int64>      -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:int64)      -> a.GetHashCode()))
        | t when t.Equals typeof<byte>       -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:byte)       -> a.GetHashCode()))
        | t when t.Equals typeof<uint16>     -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:uint16)     -> a.GetHashCode()))
        | t when t.Equals typeof<uint32>     -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:uint32)     -> a.GetHashCode()))
        | t when t.Equals typeof<uint64>     -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:uint64)     -> a.GetHashCode()))
        | t when t.Equals typeof<nativeint>  -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:nativeint)  -> a.GetHashCode()))
        | t when t.Equals typeof<unativeint> -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:unativeint) -> a.GetHashCode()))
        | t when t.Equals typeof<char>       -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:char)       -> a.GetHashCode()))
        | t when t.Equals typeof<string>     -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:string)     -> a.GetHashCode()))
        | t when t.Equals typeof<decimal>    -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:decimal)    -> a.GetHashCode()))
        | t when t.Equals typeof<float>      -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:float)      -> a.GetHashCode()))
        | t when t.Equals typeof<float32>    -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:float32)    -> a.GetHashCode()))

        | _ -> null

    static member Func = _func

/// Intrinsic for calls to depth-unlimited structural hashing that were not optimized by static conditionals.
//
// NOTE: The compiler optimizer is aware of this function (see uses of generic_hash_inner_vref in opt.fs)
// and devirtualizes calls to it based on type "T".
let GenericHashIntrinsic x =
    match GenericSpecializeHash.Func with
    | null -> GenericHashParamObj fsEqualityComparerUnlimitedHashingPER (box x)
    | func -> func.Invoke (fsEqualityComparerUnlimitedHashingPER, x)

/// Intrinsic for calls to depth-limited structural hashing that were not optimized by static conditionals.
let LimitedGenericHashIntrinsic limit x =
    match GenericSpecializeHash.Func with
    | null -> GenericHashParamObj (CountLimitedHasherPER limit) (box x)
    | func -> func.Invoke ((CountLimitedHasherPER limit), x)

/// Intrinsic for a recursive call to structural hashing that was not optimized by static conditionals.
//
// "iec" is assumed to be either fsEqualityComparerUnlimitedHashingER, fsEqualityComparerUnlimitedHashingPER or 
// a CountLimitedHasherPER.
//
// NOTE: The compiler optimizer is aware of this function (see uses of generic_hash_withc_inner_vref in opt.fs)
// and devirtualizes calls to it based on type "T".
let GenericHashWithComparerIntrinsic<'T> (iec : System.Collections.IEqualityComparer) (x : 'T) : int =
    match GenericSpecializeHash.Func with
    | null -> GenericHashParamObj iec (box x)
    | func -> func.Invoke (iec, x)

manofstick · 2015-06-29T06:01:07Z

Oh, and if anyone does (did?) get around to having a look at the code that I couldn't get working (as per last comment in this thread), I should have also added thatthe Type created worked fine in an isolated test project - it was only when it was going into FSharp.Core that the problem with it surfaced. Anyways; not the end of the world; but might be some fun for someone to look at if they are bored...

manofstick · 2015-06-30T09:58:19Z

@latkin, @dsyme, I assume I'll not allowed to break existing functionality (damn!) But wondering if they following is OK for the genetic comparer:-

first of all I want in preference to use the generic IComparable<> interface, but that is unmentioned in existing code as far as I can see.

Now this is where I would like to make an assumption which is trying to check if the object had an f# compiler generated struct or record thus with a compiler provided implementation of istructualcomparable, icomparable and genetic version. If this is true, then I have a sealed type which have equivalent versions of comparison so it shouldn't matter which version I take (this I'd only for top level, so no comparer is being passed through the istructualcomparable call)

A second case would be where only the genetic version of the interface exists on an object (this was true for the nodatime objects I believe - although I think this meant that the f# compiler didn't believe they were comparable from memory - but aiming this is not the case)

(Obviously I would preferable just like to take the generic icomparable<> always, but I assume this is unacceptable due to it possible changing someone's probably buggy runtime code...)

dsyme · 2015-06-30T20:42:52Z

Yes, for now I think that's a great approach.

It's possible that we would ultimately be willing to take a change along the lines of "As of F# V.v and FSharp.Core X.X.X.X, the generic comparison logic will compare types defined in freshly compiled code which implement IComparable<T> or IStructuralComparable<T> via the generic interface rather then the non-generic interface IComparable".

If we backed that up by performance figures showing the reduction in boxing then it could well seem very acceptable. However that should definitely be a separate PR to the core work you're doing to improve performance without changing the preference order for comparison.

manofstick · 2015-07-02T10:05:50Z

Some more performance details...

In Rounding out Visual F# 4.0 in VS 2015 RC there is a Tortoise (misspelled!) example under the "Optimized non-structural comparison operators" section. Ignoring the Hare, and just running the Tortoise on FSharp.Core.dll changes:

Code

module Tortoise = 
    let test () =
        let today = DateTime.Now
        let tomorrow = today.AddDays 1.0
        let mutable result = 0
        for i = 1 to 10000000 do
            result <- result + if today = tomorrow then 1 else 0

Results

32 bit - original
Real: 00:00:01.143, CPU: 00:00:01.138, GC gen0: 200, gen1: 2, gen2: 0

32 bit - modified
Real: 00:00:00.403, CPU: 00:00:00.405, GC gen0: 0, gen1: 0, gen2: 0

64 bit - original
Real: 00:00:00.690, CPU: 00:00:00.686, GC gen0: 305, gen1: 0, gen2: 0

64 bit - modified
Real: 00:00:00.193, CPU: 00:00:00.202, GC gen0: 0, gen1: 0, gen2: 0

dsyme · 2015-07-02T10:29:44Z

@manofstick Those performance results are fantastic - specially the complete elimination of GC. Still not quite the Hare (so I'm glad we added NonStructuralCommparison), but the changes definitely make the default cases much faster.

manofstick · 2015-07-02T10:43:29Z

@dsyme ,

Here is some badness for you (in current FSharp.Core), my implementation currently doesn't match this, but I will "fix" it so that it does (I'm currently matching the non-generic version in the generic version).

Badness

type A = { A : string }
type B<'a> = { B : 'a }

[<EntryPoint>]
let main argv =
    let a1, a2 = { A = "Hello"}, { A = "HELLO" }
    let b1, b2 = { B = "Hello"}, { B = "HELLO" }

    let compareStringWithUpper=
        let toupper (s:string) = s.ToUpper ()
        { new IEqualityComparer with
            member this.GetHashCode item =
                match item with
                | :? string as s -> (toupper s).GetHashCode ()
                | _ -> failwith "Not in this example..."
            member this.Equals (lhs, rhs) =
                match lhs, rhs with
                | (:? string as s1), (:? string as s2) -> (toupper s1).Equals(toupper s2)
                | _ -> failwith "Not in this example..." }

    let a_is_good = (a1 :> IStructuralEquatable).Equals(a2, compareStringWithUpper)
    let b_is_good = (b1 :> IStructuralEquatable).Equals(b2, compareStringWithUpper)

    printfn "A is %s" <| if a_is_good then "Good" else "Bad"
    printfn "B is %s" <| if b_is_good then "Good" else "Bad"

    if a_is_good <> b_is_good then
        printfn "But worse than those results are that they are inconsistent!"

With the output

A is Bad
B is Good
But worse than those results are that they are inconsistent!

So "at some stage"; I think the implementation of IStructuralEquatable.Equals shouldn't used inlined IL, but rather always defer back to supplied IEqualityComparer. Obviously this will make performance much worse, but after this PR is complete, hopefully it won't be too onerous.

latkin · 2015-07-02T14:34:54Z

These results are really impressive, unlike my ability to spell tortoise. 😊 (now fixed)

manofstick · 2015-07-03T10:23:24Z

@latkin ; @dsyme

A few of questions:

Where should items like the existing FSharp.Core weirdness go? It does't really affect this case; I'll just try and duplicate the weirdness; I'm happy for you to store it on your own internal backlog; I obviously not in a position to determine a change in existing functionality...
I have some other ideas of some optimizations around the Seq module (and who knows where my path takes me...). Do you want you all optimizations just lumped into this pull request; or shall I isolate them? How small/large should they be? (i.e. I probably could have just split this task into 3 - hash, equals & compare - would that have been better? - still possible I guess...)
Because I couldn't get the generic based static class code method to work (for unknown reason), I'm using some dynamic code generation techniques (emit via linq expressions). Are you happy (well...) with that? If not, I'm not sure what else can be done...
I'm ignorant about configurations - portable7 seems to be troublesome. I'm just liberally throwing #if FX_ATLEAST_40 around the shop.

...that'll do for now...

manofstick · 2015-07-04T21:09:19Z

I have tried to follow the "first do no harm" principle as much as possible (although there will be a slight cost of construction; but minimal) but where I have failed is actually where I wanted to succeed the most; with value types. Sigh.

Now this is not a new problem (I have an email conversation dated 15/6/2013 to Don about it) but I have made it a bit worse. The fact that it was already a problem though may mean that it can be ignored?

So what is this problem? It's value types that are > 64 bits, tailcalls and the 64-bit JIT.

So some code:

open System
open System.Collections.Generic
open System.Diagnostics

type TheStruct = 
    struct
        val A : int64
#if BIG
        val B : int64
        new (a, b) = { A=a; B=b }
#else
        new (a) = { A=a }
#endif
    end

type Container<'a> =
    struct
        val Item : 'a
        new (i) = { Item = i }
    end

let runTest createRandomObj =
    let set = HashSet ()
    let mutable count = 0
    let sw = Stopwatch.StartNew()
    for i = 0 to 200000 do
        set.Add (createRandomObj ()) |> ignore
        for i = 0 to 10 do
            count <- count + if set.Contains (createRandomObj ()) then 1 else 0

    printfn "    %A (%A)" sw.ElapsedMilliseconds count

[<EntryPoint>]
let main argv =
    let create = 
        let r = Random 314159265
        let n x = int64 (r.Next x)
        fun () ->
#if BIG
            Container (TheStruct (n 16, n 16))
#else
            Container (TheStruct (n 256))
#endif

    for i = 1 to 10 do
        runTest create
    0

Now if this runs without BIG set, then the following times are obtained (results are in the format "time (hit count - which should be ~ equal for similar timing)":

513L (2197320)
235L (2197168)
241L (2197198)
231L (2197114)
166L (2197153)
168L (2197158)
166L (2197361)
168L (2197090)
167L (2197199)
167L (2197400)

But with BIG set, I get the following ~20 times slower (it should be a little slower, as we have two fields now rather than 1, but nowhere near this...):

4232L (2197116)
4058L (2197293)
4075L (2197057)
4055L (2197259)
4057L (2197364)
4067L (2197354)
4060L (2197468)
4067L (2197161)
4066L (2196999)
4058L (2197368)

Now if I turn "generate tail calls" off in this app I get:

3363L (2197116)
2854L (2197293)
2870L (2197057)
2891L (2197259)
2863L (2197364)
2858L (2197354)
2880L (2197468)
2912L (2197161)
2862L (2196999)
2934L (2197368)

BUT, if I also do a build of FSharp.Core with "generate tail calls" off then I get:

430L (2197116)
243L (2197293)
258L (2197057)
241L (2197259)
237L (2197364)
237L (2197354)
237L (2197468)
238L (2197161)
237L (2196999)
237L (2197368)

Which is actually the result that I desire.

Now I can force this by adding a "not(not(equals))" and a "((hashcode)* -1)* -1), which is what I might do following this comment, but it is hardly desirable. So what do you think of maybe adding an attribute such as an AvoidTailCallAttribute or something like that so methods can selectively be opt out of the IL tailcall instruction being added?

dsyme · 2015-07-07T21:30:49Z

@manofstick Re your questions here

the inconsistency in the IStructuralEquatable.Equals implementation is not, AFAIK, intentional. It will be tricky to fix without amore direct repro - but I think you may have provided one of this in a separate issue report (assuming it's the same underlying issue). Even then it may be hard for us to fix the issue for stability reasons but I still appreciate your reporting it.
You should definitely isolate optimizations into separate PRs where possible

3+4. Using some code generation here is OK for the .NET 4.x profiles. Very few other platforms or PCLs will have codegen abilities I suppose. Using #if FX_ATLEAST_40 is OK for now. We can isolate a specific new define later.

dsyme · 2015-07-07T23:20:42Z

@manofstick - avoiding the tailcall looks reasonable if that's what the old code did and it affects perf. I don't mind using a hack for the purposes of preparing this PR (which feels still a bit of distance from finalized) but we should find a better solution. If you can abstract it into a (private) avoidTailcall function then that would be helpful as we can test that separately.

manofstick · 2015-07-08T11:13:19Z

@dsyme

RE: IStructuralEquatable.Equals, not sure if I had created a proper issue, so it is now created as #527.

RE: About this PR being finalised; yeah I still have to attack Compare properly. But I'm starting to feel quite burnt out between work, kids and trying to get this over the line, so I might need a little break for a while. But, I think that it should be pretty good for GetHashCode and Equals in its current state.

(And when I finally do finish this PR, I have 5 more ideas that I want to try out - I think I need a clone :-)

latkin · 2015-07-09T21:15:27Z

Just a data point to consider - the increase in generic specialization added by this change leads to fairly significant growth in the size of FSharp.Core native images (and working set of client apps after JIT).

FSharp.Core.ni.dll	Before	After
32 bit	6.64 MB	7.32 MB (+10.2%)
64 bit	8.77 MB	9.72 MB (+10.7%)

manofstick · 2015-07-10T09:26:21Z

@latkin

That's great! I'll be able to claim that I wrote 10% of FSharp.Core :-)

Hmmm... But anyway!

What tools are there to analyse the native images? 1Mb of GetHashCodes/Equals sounds like it's gone a bit haywire?

I can try and play around a bit with how the code is laid out; I have more static classes than I need; possibly clumping them might be better?

But, besides running "ngen" once or twice, I'm not familiar at all with native images. I'm not even sure where they live. So any assistance in analysis would be helpful.

latkin · 2015-07-10T16:40:15Z

@manofstick just run src\update.cmd <release|debug> -ngen from an admin prompt to refresh the GAC/NGEN with your latest-built stuff. Native images are stored as FSharp.Core.ni.dll under C:\windows\assembly\..., your private-built guys will be the copies with the newest timestamps.

There is nothing haywire, the increase in native code size is expected, which is why I wanted to see the numbers. The managed assembly has just one generic copy of the code, but at the native code level, JIT or AOT compilers need to produce separate native copies for every generic instantiation (ref types share 1 copy, at least). http://joeduffyblog.com/2011/10/23/on-generics-and-some-of-the-associated-overheads/ is a pretty good description.

manofstick · 2015-07-12T10:34:18Z

@latkin

OK; I've clawed back about a quarter of the space used, when I get the some more time I'll claw back some more.

My previous issue in regards to using reflection on types created in FSharp.Core appears worse that I originally thought, as it also appears to be an issue with non-generic types with generic methods. I didn't actually try just straight non-generic types with non-generic methods; but I probably should - hopefully time will allow at some stage. Weird anyway.

manofstick · 2015-07-13T10:57:53Z

@latkin

OK, some good news; although I can't get GetMethod to work on classes as per my previous whingeing, I found that Delegate.CreateDelegate has an overload which takes a string method name (and somehow that manages to work), and so I have been able to leverage off that to achieve what I wanted to do from the start. This means that I can deprecate the Linq Expression compiled code.

BUT! The portable47 profile doesn't have this overload, which is a bit of a bummer. I have just used the FX_ATLEAST_40 compilation symbol to avoid it. What I'm possibly thinking is that maybe I just wrap the whole changes in a conditional compile, and for the portable profiles we just return the slow, original, methods. Not fantastic; but maintaining the linq expression version as well as this version just seems to be a lot of pain for not much gain.

Thoughts?

manofstick · 2015-07-14T19:58:53Z

@latkin

Getting closer, I now just need to focus on Compare and then adding a good test suite. I haven't looked at how your tests are structured yet. Any hints as to how to approach?

(Oh, and the "new" methodology using the Delegate.CreateDelegate has had vast improvements with the 32 bit code. In some of my little test runs in has halved the time, up to and beating the 64 bit impl in some cases.)

…k, tryFindIndexBack, findIndexBack, sortWith, permute, mapFold, mapFoldBack, splitinto

add xmldoc about Seq.rev consume input sequence

Don't clobber F# SDK during CI build

…poration.

reset (c) Microsoft Open Technologies, Inc. back to (c) Microsoft Cor…

Gitignore generated FSharp.Core.dll

Fix mini typo and remove commented code

…arp4

dsyme · 2016-02-12T12:56:31Z

Per this comment I'm requesting that this PR reverted until the code in prim-types.fs has been properly reviewed. Thanks

"55,897 additions, 44,427 deletions not shown because the diff is too large. Please use a local Git client to view these changes"

msftclas · 2016-02-12T12:56:36Z

Hi @manofstick, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!
You've already signed the contribution license agreement. Thanks!

The agreement was validated by Microsoft and real humans are currently evaluating your PR.

TTYL, MSBOT;

manofstick · 2016-02-14T02:31:05Z

Not longer makes sense after #966

Reverting #513

msftclas added the cla-required label Jun 25, 2015

msftclas added cla-signed and removed cla-required labels Jun 25, 2015

dsyme reviewed Jun 26, 2015
View reviewed changes

This was referenced Jul 8, 2015

Equality operator causes boxing on value types #526

Closed

Generic Record Types have different Equality semantics to concrete types #527

Open

dsyme and others added 11 commits January 27, 2016 13:39

don't clobber F# SDK

f4cf7e3

add doc to sort, foldBack, reduceBack, tryFindBack, findBack, scanBac…

8ced26a

…k, tryFindIndexBack, findIndexBack, sortWith, permute, mapFold, mapFoldBack, splitinto

Merge pull request #903 from enricosada/xmldoc_seqrev

a458989

add xmldoc about Seq.rev consume input sequence

Merge pull request #904 from dsyme/fix-ci-build-1

ad8c5ee

Don't clobber F# SDK during CI build

reset (c) Microsoft Open Technologies, Inc. back to (c) Microsoft Cor…

e3d64fe

…poration.

Merge pull request #906 from KevinRansom/master

95fb299

reset (c) Microsoft Open Technologies, Inc. back to (c) Microsoft Cor…

Gitignore generated FSharp.Core.dll

fa1e55a

Merge pull request #909 from forki/ignore

b837d75

Gitignore generated FSharp.Core.dll

Fix mini typo and remove commented code

bf78c0f

Merge pull request #927 from forki/patch-4

4f761cb

Fix mini typo and remove commented code

Merge remote-tracking branch 'refs/remotes/Microsoft/master' into fsh…

c8461eb

…arp4

This was referenced Feb 3, 2016

[WIP] Non-boxing equality for enums (in generic contexts) #930

Closed

Various codegen tests failing #918

Closed

Fix failing tests (including changes codegen for failing IL generation tests for generic comparison) #944

Closed

dsyme mentioned this pull request Feb 8, 2016

Consider tailcalls and 513 (generic specializations for eq/hash/compare) #946

Closed

manofstick mentioned this pull request Feb 12, 2016

[WIP] Fix #513 for recursive types #961

Closed

dsyme reopened this Feb 12, 2016

msftclas added the cla-already-signed label Feb 12, 2016

manofstick mentioned this pull request Feb 14, 2016

Reverting #513 #966

Merged

manofstick closed this Feb 14, 2016

dsyme added a commit that referenced this pull request Mar 30, 2016

Merge pull request #966 from manofstick/manofstick-reverting-513

39280f3

Reverting #513

manofstick mentioned this pull request Jun 6, 2018

[CompilerPerf] Faster equality in generic contexts #5112

Closed

TIHan mentioned this pull request Dec 21, 2018

Ranges allocate 670 MB in 94 seconds of normal IDE usage in service.fs only #6047

Closed

manofstick mentioned this pull request May 31, 2020

The performance of Comparing and Ordering things #9348

Open

This was referenced Jan 8, 2021

Map: Optimize away isinst check #10845

Merged

Map: comparer optimization #10855

Closed

psfinaki mentioned this pull request Feb 22, 2024

Faster equality in generic contexts #16615

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: Generic specialization for equality #513

Optimization: Generic specialization for equality #513

manofstick commented Jun 25, 2015

msftclas commented Jun 25, 2015

msftclas commented Jun 25, 2015

latkin commented Jun 26, 2015

manofstick commented Jun 26, 2015

dsyme Jun 26, 2015

manofstick commented Jun 26, 2015

manofstick commented Jun 28, 2015

manofstick commented Jun 29, 2015

manofstick commented Jun 30, 2015

dsyme commented Jun 30, 2015

manofstick commented Jul 2, 2015

dsyme commented Jul 2, 2015

manofstick commented Jul 2, 2015

latkin commented Jul 2, 2015

manofstick commented Jul 3, 2015

manofstick commented Jul 4, 2015

dsyme commented Jul 7, 2015

dsyme commented Jul 7, 2015

manofstick commented Jul 8, 2015

latkin commented Jul 9, 2015

manofstick commented Jul 10, 2015

latkin commented Jul 10, 2015

manofstick commented Jul 12, 2015

manofstick commented Jul 13, 2015

manofstick commented Jul 14, 2015

dsyme commented Feb 12, 2016

msftclas commented Feb 12, 2016

manofstick commented Feb 14, 2016

Optimization: Generic specialization for equality #513

Optimization: Generic specialization for equality #513

Conversation

manofstick commented Jun 25, 2015

msftclas commented Jun 25, 2015

msftclas commented Jun 25, 2015

latkin commented Jun 26, 2015

manofstick commented Jun 26, 2015

dsyme Jun 26, 2015

Choose a reason for hiding this comment

manofstick commented Jun 26, 2015

manofstick commented Jun 28, 2015

manofstick commented Jun 29, 2015

manofstick commented Jun 30, 2015

dsyme commented Jun 30, 2015

manofstick commented Jul 2, 2015

dsyme commented Jul 2, 2015

manofstick commented Jul 2, 2015

latkin commented Jul 2, 2015

manofstick commented Jul 3, 2015

manofstick commented Jul 4, 2015

dsyme commented Jul 7, 2015

dsyme commented Jul 7, 2015

manofstick commented Jul 8, 2015

latkin commented Jul 9, 2015

manofstick commented Jul 10, 2015

latkin commented Jul 10, 2015

manofstick commented Jul 12, 2015

manofstick commented Jul 13, 2015

manofstick commented Jul 14, 2015

dsyme commented Feb 12, 2016

msftclas commented Feb 12, 2016

manofstick commented Feb 14, 2016