Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Generic specialization for equality #513

Closed
wants to merge 422 commits into from
Closed

Optimization: Generic specialization for equality #513

wants to merge 422 commits into from

Conversation

manofstick
Copy link
Contributor

NB: This is not ready for merging yet (I need to find some more time; two cheeky monkeys consume more of my free time!) but it is intended to start a conversation. I will add more to this over time, assuming you are happy with the direction.

NB 2: There are some massive performance differences between .net 4.6 and lesser versions. I have been using 4.6 (and results specified here are from that version) but the performance optimizations are still applicable to older versions (I haven't gone to the assembly code level; but it appears to my ignorant eyes that older version were not inlining code that originated in fsharp.core.... or something :-) )

So what is this optimization?

As you are aware, F# automatically creates GetHashCode/Equals/Compare for Records and structs. In the case properties of concrete types, these are optimized to IL, but where you have generic parameters the functions are delegated to helper functions in LanguagePrimitives. For complex types, the performance of this is probably not a concern, but where you have like a simple struct 2 value key it can be a bit slow.

The following is an example (OK; it's a micro-benchmark; take with a grain of salt; blah-blah-blah; but I have come across things like this multiple times in the real world where you find an inner loop that is not too dissimilar):

open System
open System.Diagnostics
open System.Collections.Generic

type StructureInt = 
    struct
        val A : int
        val B : int
        new(a, b) = {
            A = a
            B = b
        }
    end

type StructureGeneric<'a,'b> = 
    struct
        val A : 'a
        val B : 'b
        new(a, b) = {
            A = a
            B = b
        }
    end

let mutable r = Random ()

let a createRandomObj =
    let set = HashSet ()
    let mutable count = 0


    let sw = Stopwatch.StartNew()

    for i = 0 to 200000 do
        set.Add (createRandomObj ()) |> ignore

        for i = 0 to 10 do
            count <- count + if set.Contains (createRandomObj ()) then 1 else 0

    printfn "%A (%A)" sw.ElapsedMilliseconds count

[<EntryPoint>]
let main argv =
    r <- Random 314159265
    for i = 0 to 10 do
        a (fun () -> StructureInt (r.Next 250, r.Next 250))

    printfn "-------"

    r <- Random 314159265
    for i = 0 to 10 do
        a (fun () -> StructureGeneric (r.Next 250, r.Next 250))

    0

The unmodified results for this are (first section is concrete type, the second part after the "------" is the generic type; the first number is the time in ms, the second number is just a checksum; first batch and second batch should match, but really it's just junk in these tests....) :

32-bit:

429L (1542136)
405L (1540851)
368L (1538895)
381L (1538510)
335L (1541558)
338L (1540543)
334L (1539466)
335L (1543310)
333L (1540047)
337L (1541890)
338L (1538828)
-------
1182L (1542136)
1160L (1540851)
1166L (1538895)
1177L (1538510)
1163L (1541558)
1167L (1540543)
1176L (1539466)
1162L (1543310)
1166L (1540047)
1182L (1541890)
1163L (1538828)

64-bit:

468L (1542136)
430L (1540851)
385L (1538895)
381L (1538510)
321L (1541558)
320L (1540543)
326L (1539466)
326L (1543310)
324L (1540047)
325L (1541890)
323L (1538828)
-------
800L (1542136)
786L (1540851)
780L (1538895)
792L (1538510)
788L (1541558)
773L (1540543)
780L (1539466)
784L (1543310)
776L (1540047)
784L (1541890)
788L (1538828)

Including the optimization I have trhe following results:

32-bit:

437L (1542136)
354L (1540851)
340L (1538895)
333L (1538510)
334L (1541558)
337L (1540543)
332L (1539466)
340L (1543310)
338L (1540047)
338L (1541890)
336L (1538828)
-------
615L (1542136)
603L (1540851)
602L (1538895)
618L (1538510)
613L (1541558)
609L (1540543)
618L (1539466)
612L (1543310)
610L (1540047)
619L (1541890)
611L (1538828)

64-bit:

589L (1542136)
411L (1540851)
324L (1538895)
323L (1538510)
321L (1541558)
325L (1540543)
321L (1539466)
321L (1543310)
320L (1540047)
323L (1541890)
321L (1538828)
-------
417L (1542136)
402L (1540851)
396L (1538895)
408L (1538510)
405L (1541558)
401L (1540543)
404L (1539466)
403L (1543310)
404L (1540047)
403L (1541890)
406L (1538828)

So to summarise; in the example of smashing a HashSet, times for both 32-bit and 64-bit have been halved (obviously it's even greater speed up than that, as that is including all the HashSet functionality).

Anyway; let me know if this is the kind of thing you are interested in. I have been running the smoketest and everything seems fine (but not really 100% happy as I haven't had time to explore the code around there). Anyway I need to head off to bed now, so hopefully I have left enough information here for you to evaluate!

@msftclas
Copy link

Hi @manofstick, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla.microsoft.com.

TTYL, MSBOT;

@msftclas
Copy link

@manofstick, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, MSBOT;

@latkin
Copy link
Contributor

latkin commented Jun 26, 2015

This looks pretty cool! We are definitely interested in perf optimizations in the runtime.

I tried syncing this locally and running the tests, and everything seems to be working fine. GenericXIntrinsic, does not get inlined, so this should not affect user codegen?

@manofstick
Copy link
Contributor Author

@latkin ok, well let me finish this off; I need to also do some work for the Compare functions, as well as a bit more magic around IStructure* interfaces. I'll see how my time goes; but hopefully I'll get some peace over the weekend.

Oh, and a bit more exploring I see that the the equality stuff is all around the shop; so get benefits in things like list<int> even. An example of this is:

// create lots of lists to try not to get cache benefits
let lists = 
    Array.init 1000000 (fun i ->
        List.init 10 (fun j ->
            if i % 2 = 0 then -j else j))

let sw = Stopwatch.StartNew ()
let r = Random 42
let mutable count = 0
for i=0 to 1000000 do
    let a = lists.[r.Next lists.Length]
    let b = lists.[r.Next lists.Length]
    count <- count + if a = b then 1 else 0

printfn "%A (%A)" sw.ElapsedMilliseconds count

64-bit original

810L (500464)
1015L (500464)
913L (500464)
902L (500464)
996L (500464)
753L (500464)
974L (500464)
762L (500464)
892L (500464)
1002L (500464)
760L (500464)

32-bit-original

1500L (500464)
1434L (500464)
1497L (500464)
1485L (500464)
1485L (500464)
1500L (500464)
1522L (500464)
1542L (500464)
1504L (500464)
1522L (500464)
1517L (500464)

64-bit modified

685L (500464)
449L (500464)
522L (500464)
431L (500464)
508L (500464)
441L (500464)
524L (500464)
420L (500464)
502L (500464)
440L (500464)
452L (500464)

32-bit-modified

743L (500464)
655L (500464)
615L (500464)
531L (500464)
485L (500464)
523L (500464)
506L (500464)
579L (500464)
530L (500464)
550L (500464)
549L (500464)

So pretty happy with that.

| r when r.Equals typeof<PartialEquivalenceRelation> ->
match typeof<'a> with
| t when t.Equals typeof<float> -> generalize (Func<_,_,_>(fun (a:float) b -> a.Equals b))
| t when t.Equals typeof<float32> -> generalize (Func<_,_,_>(fun (a:float32) b -> a.Equals b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a little uneasy about using "a.Equals b" as the base case in all of these different type-specific implementations, both here and here. For example these implementations must exactly match the bases cases implemented here https://github.com/Microsoft/visualfsharp/pull/513/files#diff-5470ddaa52d99663f8d391a77ae04f0dL1518 and here: https://github.com/Microsoft/visualfsharp/pull/513/files#diff-5470ddaa52d99663f8d391a77ae04f0dL1758

I think I'd feel more comfortable if we textually repeated (# "ceq" x y : bool #) here. For example, I currently have to think carefully "does a.Equals(b) really get optimized down to exactly a ceq instruction?". Because "a" is a value type I'm not actually totally sure it does. If we just emit a ceq then I don't need to check.

@manofstick
Copy link
Contributor Author

@dsyme sure no problem. I had only avoided it as I'm not particularly comfortable with the IL voodoo syntax; but obviously a copy and paste will do the trick...

@manofstick
Copy link
Contributor Author

Bit of wasted time over the weekend maybe trying to be a little bit too smart. I was trying to stop boxing of IStructuralEquality types, by using some classes dynamically created using the Reflection API, but it just didn't work. I think it is a reasonable idea, and do wonder what I did wrong, but I just don't have the time to investigate. If someone else does, then I would be appreciative. (I'm not checking it in, but the code is here; I will move on to other things that need to be done)

// Runtime template type specialization code
let getGenericSpecialializationTypeStaticFuncProperty<'typedef, 't, 'func> () =
    let t = typedefof<'typedef>.MakeGenericType [|typeof<'t>|]
    let func = t.GetProperty ("Func", Reflection.BindingFlags.Static ||| Reflection.BindingFlags.Public)
    let getter = func.GetGetMethod ()
    match getter.Invoke (null, [||]) with
    | :? 'func as f -> f
    | _ -> raise (Exception "Invalid logic")

type GenericSpecializeGetHashCode<'a when 'a : equality>() =
    static let _func = Func<_,_,_>(fun (_:IEqualityComparer) (a:'a) ->
        match box a with
        | null -> 0
        | _ -> a.GetHashCode ())
    static member Func = _func

let StructuralEquatableGetHashCode (iec:IEqualityComparer) (o:#IStructuralEquatable) =
    o.GetHashCode iec

type GenericSpecializeStructuralEquatableStructGetHashCode<'a when 'a : struct and 'a :> IStructuralEquatable>() =
    static let _func =
        Func<_,_,_>(fun (iec:IEqualityComparer) (a:'a) ->
            StructuralEquatableGetHashCode iec a)
    static member Func = _func

[<CustomEquality; NoComparison>]
type GenericSpecializeStructuralEquatableStructGetHashCodeDummy =
    struct
        interface IStructuralEquatable with
            member __.GetHashCode _ = raise (Exception "dummy class for typedefof")
            member __.Equals (_,_) = raise (Exception "dummy class for typedefof")
    end

type GenericSpecializeStructuralEquatableReferenceGetHashCode<'a when 'a :> IStructuralEquatable>() =
    static let _func =
        Func<_,_,_>(fun (iec:IEqualityComparer) (a:'a) ->
            match box a with
            | null -> 0
            | _ -> StructuralEquatableGetHashCode iec a)
    static member Func = _func

type GenericSpecializeHash<'a>() =
    static let generalize (func:Func<IEqualityComparer,'aa,int>) =
        match box func with
        | :? Func<IEqualityComparer,'a,int> as f -> f
        | _ -> raise (Exception "invalid logic")
    // this function must replicate GenericHashParamObj
    static let _func : Func<IEqualityComparer, 'a, int> =
        match typeof<'a> with
        | t when t.IsArray ->
            // I should optimize these as well, but probably not a particularly great gain.
            null
        | t when typeof<IStructuralEquatable>.IsAssignableFrom t ->
            if not t.IsValueType
                then getGenericSpecialializationTypeStaticFuncProperty<GenericSpecializeStructuralEquatableReferenceGetHashCode<_>, 'a, Func<IEqualityComparer, 'a, int>> ()
                else
                    // This throws a "System.NotSupportedException" from the GetProperty method of the generated type
                    // from getGenericSpecialializationTypeStaticFuncProperty. i.e. it appears that the MakeGenericType
                    // function works; because it doesn't throw an exception, which as I understand the documentation
                    // it should if the type doesn't meet the generic type constraints. So I return null, and to get
                    // fallback functionality, but would like someone else to investigate.
                    // 
                    // getGenericSpecialializationTypeStaticFuncProperty<GenericSpecializeStructuralEquatableStructGetHashCode<GenericSpecializeStructuralEquatableStructGetHashCodeDummy>, 'a, Func<IEqualityComparer, 'a, int>> ()
                    //
                    null

        | t when t.IsSealed && not t.IsValueType ->
            getGenericSpecialializationTypeStaticFuncProperty<GenericSpecializeGetHashCode<_>, 'a, Func<IEqualityComparer, 'a, int>> ()

        // A general t.IsSealed for value types "should" work I think, but suffers from the same exception as above
        | t when t.Equals typeof<bool>       -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:bool)       -> a.GetHashCode()))
        | t when t.Equals typeof<sbyte>      -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:sbyte)      -> a.GetHashCode()))
        | t when t.Equals typeof<int16>      -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:int16)      -> a.GetHashCode()))
        | t when t.Equals typeof<int32>      -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:int32)      -> a.GetHashCode()))
        | t when t.Equals typeof<int64>      -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:int64)      -> a.GetHashCode()))
        | t when t.Equals typeof<byte>       -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:byte)       -> a.GetHashCode()))
        | t when t.Equals typeof<uint16>     -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:uint16)     -> a.GetHashCode()))
        | t when t.Equals typeof<uint32>     -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:uint32)     -> a.GetHashCode()))
        | t when t.Equals typeof<uint64>     -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:uint64)     -> a.GetHashCode()))
        | t when t.Equals typeof<nativeint>  -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:nativeint)  -> a.GetHashCode()))
        | t when t.Equals typeof<unativeint> -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:unativeint) -> a.GetHashCode()))
        | t when t.Equals typeof<char>       -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:char)       -> a.GetHashCode()))
        | t when t.Equals typeof<string>     -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:string)     -> a.GetHashCode()))
        | t when t.Equals typeof<decimal>    -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:decimal)    -> a.GetHashCode()))
        | t when t.Equals typeof<float>      -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:float)      -> a.GetHashCode()))
        | t when t.Equals typeof<float32>    -> generalize (Func<_,_,_>(fun (_:IEqualityComparer) (a:float32)    -> a.GetHashCode()))

        | _ -> null

    static member Func = _func

/// Intrinsic for calls to depth-unlimited structural hashing that were not optimized by static conditionals.
//
// NOTE: The compiler optimizer is aware of this function (see uses of generic_hash_inner_vref in opt.fs)
// and devirtualizes calls to it based on type "T".
let GenericHashIntrinsic x =
    match GenericSpecializeHash.Func with
    | null -> GenericHashParamObj fsEqualityComparerUnlimitedHashingPER (box x)
    | func -> func.Invoke (fsEqualityComparerUnlimitedHashingPER, x)

/// Intrinsic for calls to depth-limited structural hashing that were not optimized by static conditionals.
let LimitedGenericHashIntrinsic limit x =
    match GenericSpecializeHash.Func with
    | null -> GenericHashParamObj (CountLimitedHasherPER limit) (box x)
    | func -> func.Invoke ((CountLimitedHasherPER limit), x)

/// Intrinsic for a recursive call to structural hashing that was not optimized by static conditionals.
//
// "iec" is assumed to be either fsEqualityComparerUnlimitedHashingER, fsEqualityComparerUnlimitedHashingPER or 
// a CountLimitedHasherPER.
//
// NOTE: The compiler optimizer is aware of this function (see uses of generic_hash_withc_inner_vref in opt.fs)
// and devirtualizes calls to it based on type "T".
let GenericHashWithComparerIntrinsic<'T> (iec : System.Collections.IEqualityComparer) (x : 'T) : int =
    match GenericSpecializeHash.Func with
    | null -> GenericHashParamObj iec (box x)
    | func -> func.Invoke (iec, x)

@manofstick
Copy link
Contributor Author

Oh, and if anyone does (did?) get around to having a look at the code that I couldn't get working (as per last comment in this thread), I should have also added thatthe Type created worked fine in an isolated test project - it was only when it was going into FSharp.Core that the problem with it surfaced. Anyways; not the end of the world; but might be some fun for someone to look at if they are bored...

@manofstick
Copy link
Contributor Author

@latkin, @dsyme, I assume I'll not allowed to break existing functionality (damn!) But wondering if they following is OK for the genetic comparer:-

first of all I want in preference to use the generic IComparable<> interface, but that is unmentioned in existing code as far as I can see.

Now this is where I would like to make an assumption which is trying to check if the object had an f# compiler generated struct or record thus with a compiler provided implementation of istructualcomparable, icomparable and genetic version. If this is true, then I have a sealed type which have equivalent versions of comparison so it shouldn't matter which version I take (this I'd only for top level, so no comparer is being passed through the istructualcomparable call)

A second case would be where only the genetic version of the interface exists on an object (this was true for the nodatime objects I believe - although I think this meant that the f# compiler didn't believe they were comparable from memory - but aiming this is not the case)

(Obviously I would preferable just like to take the generic icomparable<> always, but I assume this is unacceptable due to it possible changing someone's probably buggy runtime code...)

@dsyme
Copy link
Contributor

dsyme commented Jun 30, 2015

Yes, for now I think that's a great approach.

It's possible that we would ultimately be willing to take a change along the lines of "As of F# V.v and FSharp.Core X.X.X.X, the generic comparison logic will compare types defined in freshly compiled code which implement IComparable<T> or IStructuralComparable<T> via the generic interface rather then the non-generic interface IComparable".

If we backed that up by performance figures showing the reduction in boxing then it could well seem very acceptable. However that should definitely be a separate PR to the core work you're doing to improve performance without changing the preference order for comparison.

@manofstick
Copy link
Contributor Author

Some more performance details...

In Rounding out Visual F# 4.0 in VS 2015 RC there is a Tortoise (misspelled!) example under the "Optimized non-structural comparison operators" section. Ignoring the Hare, and just running the Tortoise on FSharp.Core.dll changes:

Code

module Tortoise = 
    let test () =
        let today = DateTime.Now
        let tomorrow = today.AddDays 1.0
        let mutable result = 0
        for i = 1 to 10000000 do
            result <- result + if today = tomorrow then 1 else 0

Results

32 bit - original
Real: 00:00:01.143, CPU: 00:00:01.138, GC gen0: 200, gen1: 2, gen2: 0

32 bit - modified
Real: 00:00:00.403, CPU: 00:00:00.405, GC gen0: 0, gen1: 0, gen2: 0

64 bit - original
Real: 00:00:00.690, CPU: 00:00:00.686, GC gen0: 305, gen1: 0, gen2: 0

64 bit - modified
Real: 00:00:00.193, CPU: 00:00:00.202, GC gen0: 0, gen1: 0, gen2: 0

@dsyme
Copy link
Contributor

dsyme commented Jul 2, 2015

@manofstick Those performance results are fantastic - specially the complete elimination of GC. Still not quite the Hare (so I'm glad we added NonStructuralCommparison), but the changes definitely make the default cases much faster.

@manofstick
Copy link
Contributor Author

@dsyme ,

Here is some badness for you (in current FSharp.Core), my implementation currently doesn't match this, but I will "fix" it so that it does (I'm currently matching the non-generic version in the generic version).

Badness

type A = { A : string }
type B<'a> = { B : 'a }

[<EntryPoint>]
let main argv =
    let a1, a2 = { A = "Hello"}, { A = "HELLO" }
    let b1, b2 = { B = "Hello"}, { B = "HELLO" }

    let compareStringWithUpper=
        let toupper (s:string) = s.ToUpper ()
        { new IEqualityComparer with
            member this.GetHashCode item =
                match item with
                | :? string as s -> (toupper s).GetHashCode ()
                | _ -> failwith "Not in this example..."
            member this.Equals (lhs, rhs) =
                match lhs, rhs with
                | (:? string as s1), (:? string as s2) -> (toupper s1).Equals(toupper s2)
                | _ -> failwith "Not in this example..." }

    let a_is_good = (a1 :> IStructuralEquatable).Equals(a2, compareStringWithUpper)
    let b_is_good = (b1 :> IStructuralEquatable).Equals(b2, compareStringWithUpper)

    printfn "A is %s" <| if a_is_good then "Good" else "Bad"
    printfn "B is %s" <| if b_is_good then "Good" else "Bad"

    if a_is_good <> b_is_good then
        printfn "But worse than those results are that they are inconsistent!"

With the output

A is Bad
B is Good
But worse than those results are that they are inconsistent!

So "at some stage"; I think the implementation of IStructuralEquatable.Equals shouldn't used inlined IL, but rather always defer back to supplied IEqualityComparer. Obviously this will make performance much worse, but after this PR is complete, hopefully it won't be too onerous.

@latkin
Copy link
Contributor

latkin commented Jul 2, 2015

These results are really impressive, unlike my ability to spell tortoise. 😊 (now fixed)

@manofstick
Copy link
Contributor Author

@latkin ; @dsyme

A few of questions:

  • Where should items like the existing FSharp.Core weirdness go? It does't really affect this case; I'll just try and duplicate the weirdness; I'm happy for you to store it on your own internal backlog; I obviously not in a position to determine a change in existing functionality...
  • I have some other ideas of some optimizations around the Seq module (and who knows where my path takes me...). Do you want you all optimizations just lumped into this pull request; or shall I isolate them? How small/large should they be? (i.e. I probably could have just split this task into 3 - hash, equals & compare - would that have been better? - still possible I guess...)
  • Because I couldn't get the generic based static class code method to work (for unknown reason), I'm using some dynamic code generation techniques (emit via linq expressions). Are you happy (well...) with that? If not, I'm not sure what else can be done...
  • I'm ignorant about configurations - portable7 seems to be troublesome. I'm just liberally throwing #if FX_ATLEAST_40 around the shop.

...that'll do for now...

@manofstick
Copy link
Contributor Author

I have tried to follow the "first do no harm" principle as much as possible (although there will be a slight cost of construction; but minimal) but where I have failed is actually where I wanted to succeed the most; with value types. Sigh.

Now this is not a new problem (I have an email conversation dated 15/6/2013 to Don about it) but I have made it a bit worse. The fact that it was already a problem though may mean that it can be ignored?

So what is this problem? It's value types that are > 64 bits, tailcalls and the 64-bit JIT.

So some code:

open System
open System.Collections.Generic
open System.Diagnostics

type TheStruct = 
    struct
        val A : int64
#if BIG
        val B : int64
        new (a, b) = { A=a; B=b }
#else
        new (a) = { A=a }
#endif
    end

type Container<'a> =
    struct
        val Item : 'a
        new (i) = { Item = i }
    end

let runTest createRandomObj =
    let set = HashSet ()
    let mutable count = 0
    let sw = Stopwatch.StartNew()
    for i = 0 to 200000 do
        set.Add (createRandomObj ()) |> ignore
        for i = 0 to 10 do
            count <- count + if set.Contains (createRandomObj ()) then 1 else 0

    printfn "    %A (%A)" sw.ElapsedMilliseconds count

[<EntryPoint>]
let main argv =
    let create = 
        let r = Random 314159265
        let n x = int64 (r.Next x)
        fun () ->
#if BIG
            Container (TheStruct (n 16, n 16))
#else
            Container (TheStruct (n 256))
#endif

    for i = 1 to 10 do
        runTest create
    0

Now if this runs without BIG set, then the following times are obtained (results are in the format "time (hit count - which should be ~ equal for similar timing)":

513L (2197320)
235L (2197168)
241L (2197198)
231L (2197114)
166L (2197153)
168L (2197158)
166L (2197361)
168L (2197090)
167L (2197199)
167L (2197400)

But with BIG set, I get the following ~20 times slower (it should be a little slower, as we have two fields now rather than 1, but nowhere near this...):

4232L (2197116)
4058L (2197293)
4075L (2197057)
4055L (2197259)
4057L (2197364)
4067L (2197354)
4060L (2197468)
4067L (2197161)
4066L (2196999)
4058L (2197368)

Now if I turn "generate tail calls" off in this app I get:

3363L (2197116)
2854L (2197293)
2870L (2197057)
2891L (2197259)
2863L (2197364)
2858L (2197354)
2880L (2197468)
2912L (2197161)
2862L (2196999)
2934L (2197368)

BUT, if I also do a build of FSharp.Core with "generate tail calls" off then I get:

430L (2197116)
243L (2197293)
258L (2197057)
241L (2197259)
237L (2197364)
237L (2197354)
237L (2197468)
238L (2197161)
237L (2196999)
237L (2197368)

Which is actually the result that I desire.

Now I can force this by adding a "not(not(equals))" and a "((hashcode)* -1)* -1), which is what I might do following this comment, but it is hardly desirable. So what do you think of maybe adding an attribute such as an AvoidTailCallAttribute or something like that so methods can selectively be opt out of the IL tailcall instruction being added?

@dsyme
Copy link
Contributor

dsyme commented Jul 7, 2015

@manofstick Re your questions here

  1. the inconsistency in the IStructuralEquatable.Equals implementation is not, AFAIK, intentional. It will be tricky to fix without amore direct repro - but I think you may have provided one of this in a separate issue report (assuming it's the same underlying issue). Even then it may be hard for us to fix the issue for stability reasons but I still appreciate your reporting it.
  2. You should definitely isolate optimizations into separate PRs where possible

3+4. Using some code generation here is OK for the .NET 4.x profiles. Very few other platforms or PCLs will have codegen abilities I suppose. Using #if FX_ATLEAST_40 is OK for now. We can isolate a specific new define later.

@dsyme
Copy link
Contributor

dsyme commented Jul 7, 2015

@manofstick - avoiding the tailcall looks reasonable if that's what the old code did and it affects perf. I don't mind using a hack for the purposes of preparing this PR (which feels still a bit of distance from finalized) but we should find a better solution. If you can abstract it into a (private) avoidTailcall function then that would be helpful as we can test that separately.

@manofstick
Copy link
Contributor Author

@dsyme

RE: IStructuralEquatable.Equals, not sure if I had created a proper issue, so it is now created as #527.

RE: About this PR being finalised; yeah I still have to attack Compare properly. But I'm starting to feel quite burnt out between work, kids and trying to get this over the line, so I might need a little break for a while. But, I think that it should be pretty good for GetHashCode and Equals in its current state.

(And when I finally do finish this PR, I have 5 more ideas that I want to try out - I think I need a clone :-)

@latkin
Copy link
Contributor

latkin commented Jul 9, 2015

Just a data point to consider - the increase in generic specialization added by this change leads to fairly significant growth in the size of FSharp.Core native images (and working set of client apps after JIT).

FSharp.Core.ni.dll Before After
32 bit 6.64 MB 7.32 MB (+10.2%)
64 bit 8.77 MB 9.72 MB (+10.7%)

@manofstick
Copy link
Contributor Author

@latkin

That's great! I'll be able to claim that I wrote 10% of FSharp.Core :-)

Hmmm... But anyway!

What tools are there to analyse the native images? 1Mb of GetHashCodes/Equals sounds like it's gone a bit haywire?

I can try and play around a bit with how the code is laid out; I have more static classes than I need; possibly clumping them might be better?

But, besides running "ngen" once or twice, I'm not familiar at all with native images. I'm not even sure where they live. So any assistance in analysis would be helpful.

@latkin
Copy link
Contributor

latkin commented Jul 10, 2015

@manofstick just run src\update.cmd <release|debug> -ngen from an admin prompt to refresh the GAC/NGEN with your latest-built stuff. Native images are stored as FSharp.Core.ni.dll under C:\windows\assembly\..., your private-built guys will be the copies with the newest timestamps.

There is nothing haywire, the increase in native code size is expected, which is why I wanted to see the numbers. The managed assembly has just one generic copy of the code, but at the native code level, JIT or AOT compilers need to produce separate native copies for every generic instantiation (ref types share 1 copy, at least). http://joeduffyblog.com/2011/10/23/on-generics-and-some-of-the-associated-overheads/ is a pretty good description.

@manofstick
Copy link
Contributor Author

@latkin

OK; I've clawed back about a quarter of the space used, when I get the some more time I'll claw back some more.

My previous issue in regards to using reflection on types created in FSharp.Core appears worse that I originally thought, as it also appears to be an issue with non-generic types with generic methods. I didn't actually try just straight non-generic types with non-generic methods; but I probably should - hopefully time will allow at some stage. Weird anyway.

@manofstick
Copy link
Contributor Author

@latkin

OK, some good news; although I can't get GetMethod to work on classes as per my previous whingeing, I found that Delegate.CreateDelegate has an overload which takes a string method name (and somehow that manages to work), and so I have been able to leverage off that to achieve what I wanted to do from the start. This means that I can deprecate the Linq Expression compiled code.

BUT! The portable47 profile doesn't have this overload, which is a bit of a bummer. I have just used the FX_ATLEAST_40 compilation symbol to avoid it. What I'm possibly thinking is that maybe I just wrap the whole changes in a conditional compile, and for the portable profiles we just return the slow, original, methods. Not fantastic; but maintaining the linq expression version as well as this version just seems to be a lot of pain for not much gain.

Thoughts?

@manofstick
Copy link
Contributor Author

@latkin

Getting closer, I now just need to focus on Compare and then adding a good test suite. I haven't looked at how your tests are structured yet. Any hints as to how to approach?

(Oh, and the "new" methodology using the Delegate.CreateDelegate has had vast improvements with the 32 bit code. In some of my little test runs in has halved the time, up to and beating the 64 bit impl in some cases.)

@dsyme
Copy link
Contributor

dsyme commented Feb 12, 2016

Per this comment I'm requesting that this PR reverted until the code in prim-types.fs has been properly reviewed. Thanks

"55,897 additions, 44,427 deletions not shown because the diff is too large. Please use a local Git client to view these changes"

@dsyme dsyme reopened this Feb 12, 2016
@msftclas
Copy link

Hi @manofstick, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!
You've already signed the contribution license agreement. Thanks!

The agreement was validated by Microsoft and real humans are currently evaluating your PR.

TTYL, MSBOT;

@manofstick
Copy link
Contributor Author

Not longer makes sense after #966

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.