-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate #6044 and don't call suggestions for small identifiers #6049
Conversation
ccdeb58
to
e8a793a
Compare
what's the current trick to run the ci_part2 tests locally? I basically want to accept all the changes a new baseline |
Build.cmd release ci_part2 |
I wonder if some memoization would make sense. Let's say the user types the same identifier more than once, as it happens. Each time it would be the same sequence of |
@AviAvni can you pease run it for me and PR my branch? would really appreciate that because I don't get it running. |
@majocha, I think that's a good idea. In fact, memoization of, let's say the last 100 or so used identifiers (FIFO order) could be (very) beneficial, since most people will do a lot of editing in one file or function, and so will likely re-type certain identifiers multiple times. |
I don't see how memoize would help here. It's always different identifiers and suggestions |
It's green |
I'm not really convinced this will solve the problem:
Fundamentally, this feature is simply not built with IDE tooling in mind. It's great for a batch compile job/command line builds, but for a long-running process like a language server it's simply calculating and re-calculating too many things. The feature needs two "modes":
The latter is not something that FCS can do yet. We're going to pursue this sort of architecture in the future, but not right now. |
It's not a magic number. I think it's the one from the original paper.
Am Mo., 7. Jan. 2019, 19:10 hat Phillip Carter <notifications@github.com>
geschrieben:
… I'm not really convinced this will solve the problem:
- It won't show suggestions for mistyped identifiers where the actual
fix is <4 chars, and barring a statistical analysis of multiple codebases
and what suggestions get generated, there's no telling what magic number
"feels right"
- It will still incur the cost of going through the full list (and
generating that list) no matter the length of the mistyped identifier
- Tying the tooling to error-handling introduces coupling that is
difficult to maintain or evolve
Fundamentally, this feature is simply *not* built with IDE tooling in
mind. It's great for a batch compile job/command line builds, but for a
long-running process like a language server it's simply calculating and
re-calculating too many things. The feature needs two "modes":
- The current one for batch compile jobs
- An out-of-process spellchecker routine that uses a tree (such as Roslyn's
which uses a BKTree
<http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Utilities/SpellChecker.cs>)
to traverse a populated set of symbols independently of the in-proc work
The latter is not something that FCS can do yet. We're going to pursue
this sort of architecture in the future, but not right now.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#6049 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADgNCu6DJZjmboneOPMMNBlVHwVEmMfks5vA42vgaJpZM4ZdeHn>
.
|
If you're thresholding you can predict a cutoff based on the first few chars which may be helpful. If you're willing to do that you can build a lookup table ahead of time with potential matches based on the first N characters. |
About switching it off, while it would be good to have a setting, I think it's great in the CS IDE. If you type a partial word, it gives you completion items that are more useful than only when matching the start of a string. Though this should run in its own thread and not block other processes, if it's fast enough, good, otherwise, it shouldn't degrade typing speed and experience. I'm wondering why it's not a good idea to keep a dictionary of uppercased identifiers, after all, the list of in scope identifiers only occasionally changes (which itself can be done asynchronously). |
Another idea: if you can shorten the length of the strings you're comparing against it will reduce the time by a large amount. If you knew the string lengths ahead of time I think you could use static reusable fixed length arrays. I'd like to help more, but I don't understand what's being compared in the code well enough to contribute anything deeper. |
@abelbraaksma we only suggest identifiers that are valid in the current location. So the dictionary would track the position as well - and how to invalidate that when name resolution environment changes at that point? |
@forki, that makes sense. Though local scope identifiers is almost always a small list. The list of globally available identifiers, either in current scope, truly global, or in scope of current file, module, class, changes much less frequently, if only because most users can't type in different files, classes etc at the same time, and switching this scope, even at fast paced editing, often takes seconds. That time could be used to build up the locally scoped library of identifiers. Make that a kind of dictionary with a key that includes the scope, and there's a big chance of not having to convert thousands of identifiers again and again. I don't know if it's easy or hard, but it should release some strain off the GC and the CPU. But maybe I'm too optimistic, or maybe the effort isn't worth it. |
I would say it's pretty hard in the current architecture ;-) |
I was looking at his a while back and came up with few possibilities for improving perf
|
Well, sounds great ;-)
Am Fr., 18. Jan. 2019, 05:46 hat Gerard <notifications@github.com>
geschrieben:
… I was looking at his a while back and came up with few possibilities for
improving perf
1.
The suggestions are function to lazy load a large seq of
collected/filtered strings, into hashsets, into more Seq ops, that is then
tests all into an array and sorts, picking best 5 or so matches. If we use
a specialised priority queue (a 5 array priority insertion sort), we can
have the producers of results test matches as they are created so they can
be added to priority queue (or dropped if not in top 5). This would change
the suggestions Seqs to consumer pipelines and allow much smaller
memory foot print.
2.
With Levenshtein distance it is possible to do incremental
calculations on subsequent char additions using matrix. With the radius
overlap in JaroWinkler it makes it a little more complicated but it should
be possible generate a char lookup tree from scoped suggestions (each char
key has list of (pos,word)) such as a char for the current word (env) comes
in (and be case insensitive checked), the next array cursor slot fills with
collection of valid branches … and can even filter the best performers
(with some wiggle room) to quickly hone in on results. The one way
transversal (rather then both vs eachother) should be possible by keeping
track of upper & lower bounds checks at each cursor position.... the key
being the state of the traversal, for each new char coming in is kept and
built on, or rewound if backspaced so we are not re-computing things ... I
sketched this algo out on a notepad somewhere but cannot find it, there
were a few rough edges to iron out but utilizing something like it may be
very efficient for repeated typing attempts in same scope … especially when
everything is being recomputed for each char added.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6049 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADgNMRIJdCskaxPFKQkw6oL0IF5_ZYGks5vEVG1gaJpZM4ZdeHn>
.
|
… may not have made much sense apologies, I will post code and clearer algo details for point 2 when find workings, for point 1, crude queue implementation (missing IEnumerable etc), a modified Insertion sort is as follows: type MaxBuffer<'K,'V when 'K : comparison>(max:int) =
let data = Array.zeroCreate<KeyValuePair<'K,'V>>(max)
let rec backup i j =
if i < j then
data.[i] <- data.[i + 1]
backup (i + 1) j
let rec find i k v =
let kvpr : byref<_> = &data.[i] //only want key (not KVP)
let kr = kvpr.Key
if k < kr then
if i = 0 then false
else
backup 0 (i-1)
data.[i-1] <- KeyValuePair<_,_>(k,v)
true
else
if i = max - 1 then // end of the line, replace at head
backup 0 i
data.[i] <- KeyValuePair<_,_>(k,v)
true
else find (i + 1) k v
member x.Insert (k,v) = find 0 k v
member x.Result () = data The reason it's a lot faster then adding hundreds of results to a re-growing array (or even an known large array size) for sort, is that quicksort etc will sort everything in approx. log n times, number of recursive pivots increasing with number of items, the elegance with this is that it can quickly discard results that are not greater then lower bound of captured max m items, so the compares & shifts are a subset (potentially tiny) of log n. |
Closing in favor of #7060 |
No description provided.