Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a MaxBuffer for Suggestions #7060

Merged
merged 4 commits into from
Jun 28, 2019
Merged

Use a MaxBuffer for Suggestions #7060

merged 4 commits into from
Jun 28, 2019

Conversation

forki
Copy link
Contributor

@forki forki commented Jun 25, 2019

This is using a constant lengeth array for Spelling Suggestions as suggested by @gerardtoconnor in #6049 (comment)

related: ionide/FsAutoComplete#427

@forki forki force-pushed the maxbuffer branch 2 times, most recently from 3b42f78 to c5cc18e Compare June 25, 2019 09:27
@gerardtoconnor
Copy link

gerardtoconnor commented Jun 25, 2019

lol, fair play @forki , got tired waiting so integrated it all yourself!

A few minor things for discussion:

  1. I called a Maxbuffer and not a PriorityQueue because PQ are usually binary heaps of all items that need to be extracted in Priority order, while this saves memory & speed by only keeping a small buffer of max entries we want to keep, hence the strange name MaxBuffer, as it also uses insertion sort over binary heap.
  2. Hashsets vs Sequences … Hashsets are good performance wise for checking distinct as you know, Array.distinct is using one under the covers (as Dict/Hashset) so what we are gaining in speed for all the disregarded checks, we are loosing a lot with use of sequences, when we are not really working with sequences, we are dealing with fixed arrays. We can do slight modify to Maxbuffer to make it a distinct collection (checking values if keys match)
  3. Filter Predictions can (& should) accept the MaxBuffer and iterate it, once we've tweaked to be distinct at compilation time, will save the extra array alloc & map operations.
  4. MaxBuffer in a way represents a subset of a greater set representing the maximum values in that set, if we want to MaxBuffer all the things, we should be divide and conquering, so every collection/seq is Maxbuffered at source, like a folded state, with all these suggestion paths returning a max buffer that can then be Unioned (unioning local max sets for global max set) with another such that we get one final max buffer to feed to Filter Predictions … this would involve calculation of distance locally though so not sure how we make this available?

Happy to discuss further but if you are happy with these modifications, I can tweak my original MaxBuffer design to include:

  1. Distinct check (key & Value)
  2. Enumerable interface for Filter Predictions
  3. Union operation to union max buffers (pending we figure out local distance calc)

Thanks again for getting this worked in!

@forki
Copy link
Contributor Author

forki commented Jun 25, 2019

that sounds good. Let's wait for 1h and see if we broke something so far

@forki forki changed the title Use a MaxBuffer for Suggestions [WIP] Use a MaxBuffer for Suggestions Jun 25, 2019
@gerardtoconnor
Copy link

Yeah, makes sense to see if your on the right track.

Looking through code, id text is available in most places where we create suggestions so we should be able to do MaxBuffer suggestion calculation on site and avoid having to return/combine large sequences by just sequentially folding in each seq of suggestions into the one buffer … afterwards we can even backtrack the seq generations to ensure optimised enumerable (iterating & mapping backing data structure without interim structures)

@gerardtoconnor
Copy link

A gift ... this makes Maxbuffer distinct and enumerable for value (seq value)

open System.Collections
open System.Collections.Generic

type internal MaxBufferValueEnumerator<'k,'V>(data: KeyValuePair<'k,'V> []) =
    let mutable current = data.Length
    interface IEnumerator<'V> with
        member x.Current with get () = 
            let kvpr = & data.[current]
            kvpr.Value
    interface System.Collections.IEnumerator with
        member x.Current with get () = box data.[current].Value
        member x.MoveNext() = 
            current <- current - 1
            current >= 0
        member x.Reset () = current <- data.Length
    interface System.IDisposable with
        member x.Dispose () = ()     

type MaxBuffer<'K,'V when 'K : comparison and 'V : equality>(max:int) = 
    let data = Array.zeroCreate<KeyValuePair<'K,'V>>(max)
    let rec backup i j =
        if i < j  then
            data.[i] <- data.[i + 1]
            backup (i + 1) j        
    let rec find i k v =
        let kvpr : byref<_> = &data.[i] //only want key (not KVP)
        let kr = kvpr.Key
        if k < kr then 
            if i = 0 then false
            else 
                backup 0 (i-1)
                data.[i-1] <- System.Collections.Generic.KeyValuePair<_,_>(k,v)
                true             
        else
            if k = kr then // add distinct
                if kvpr.Value = v then false // exact match so no insertion needed
                else 
                    backup 0 i
                    data.[i] <- System.Collections.Generic.KeyValuePair<_,_>(k,v)
                    true
            elif i = max - 1 then // end of the line, replace at head
                backup 0 i
                data.[i] <- System.Collections.Generic.KeyValuePair<_,_>(k,v)
                true
            else find (i + 1) k v
    member x.Insert (k,v) = find 0 k v
    //member x.Result () = data
    interface IEnumerable<'V> with
        member x.GetEnumerator () = new MaxBufferValueEnumerator<'k,'V>(data) :> IEnumerator<'V>
    interface System.Collections.IEnumerable with
        member x.GetEnumerator () = new MaxBufferValueEnumerator<'k,'V>(data) :> IEnumerator   

@forki
Copy link
Contributor Author

forki commented Jun 25, 2019

@gerardtoconnor I had to change the buffer, since PBTs said it did not work

@forki forki force-pushed the maxbuffer branch 2 times, most recently from 98401f2 to 03a2e44 Compare June 25, 2019 16:39
@gerardtoconnor
Copy link

gerardtoconnor commented Jun 25, 2019

PBTs

sorry, what are PBTs again? No issue with it being moved anyway.

I notice on the latest commit you have changed/simplified the algo a bit, although I agree it is better to have largest value first in the data array, I think with your implementation it is checking from the head (largest) down each time... as we expect most entries to not make it into the max data array, we check at the end of the queue as its a quick short-circuit most of the time when we expect max < (suggestions / 2).

Curious also, your new elements counter smells a little? the point of element count was/is so we can partially fill our array that is max in length if there are fewer Inserts then max, or a lot of duplicates that result in distinct values being less then maxElements requested, your's seems to increment every time an insertion is attempted, regardless of it being in data array, meaning it is far bigger then max data array? … i also realise I left this out of my enumerator implementation so is a bug in what I provided

@forki
Copy link
Contributor Author

forki commented Jun 25, 2019

PBTs = Property based tests
It just returned nulls and stuff. Now the current version is not the fastest for now, but seem to be slightly more correct

@gerardtoconnor
Copy link

gerardtoconnor commented Jun 25, 2019

Yeah that makes sense, by not using elements as threshold on partially filled array, nulls can creep in, If you like I can tweak original, using element bounds enumerator will be correct and can be used instead of .Values() method as MaxBuffer can be interpreted as distinct seq<'V>?

@forki
Copy link
Contributor Author

forki commented Jun 25, 2019

@gerardtoconnor let me finish with the current version, get it green and then we improve the buffer from there. Ok?

@forki forki force-pushed the maxbuffer branch 4 times, most recently from cb7375c to 76dd871 Compare June 25, 2019 18:45
src/fsharp/CompileOps.fs Outdated Show resolved Hide resolved
@forki forki force-pushed the maxbuffer branch 5 times, most recently from b37636e to 534d44a Compare June 26, 2019 06:33
src/fsharp/CompileOps.fs Outdated Show resolved Hide resolved
src/fsharp/NameResolution.fs Outdated Show resolved Hide resolved
src/fsharp/TypeChecker.fs Outdated Show resolved Hide resolved
src/fsharp/NameResolution.fs Outdated Show resolved Hide resolved
src/fsharp/NameResolution.fs Outdated Show resolved Hide resolved
src/fsharp/NameResolution.fs Outdated Show resolved Hide resolved
src/fsharp/NameResolution.fs Outdated Show resolved Hide resolved
src/fsharp/NameResolution.fs Outdated Show resolved Hide resolved
src/fsharp/NameResolution.fs Outdated Show resolved Hide resolved
src/fsharp/NameResolution.fs Outdated Show resolved Hide resolved
src/fsharp/NameResolution.fs Outdated Show resolved Hide resolved
src/fsharp/NameResolution.fs Outdated Show resolved Hide resolved
Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is fantastic. Thank you both for doing the work and spending time analyzing the performance. It seems to be a very significant improvement and am very much in favor.

Just a few things before we merge:

  • A few minor format nits.
  • Duplicate code bit.

@forki
Copy link
Contributor Author

forki commented Jun 28, 2019

applied all feedback. thx

@TIHan
Copy link
Contributor

TIHan commented Jun 28, 2019

Very nice work.

@TIHan TIHan merged commit 5a8f454 into dotnet:master Jun 28, 2019
@baronfel
Copy link
Member

@forki I'll try to get this integrated and released in a new FCS version this weekend so the change can be integrated into FSAC/ionide.

@forki
Copy link
Contributor Author

forki commented Jun 29, 2019

@baronfel awesome. That would be great

nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Jan 26, 2022
* Use MaxBuffer for Suggestions

* Extract SuggestNames function

* Rename Parameter

* Apply feedback for formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants