-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit amount of tokens used to calculate LCS distance #68151
Conversation
@davidwengier PTAL |
Not sure how these ar stored today. IN the future, woudl using SegmentedList be appropriate, if the goal is to avoid the LOH? |
We only need to avoid LOH for segments that are not pooled as those will become garbage once he calculation is finished. The pooled ones will be reused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have 0 confidence that I understand what is going on here. or how it works. eg. I'm guessing this doesn't mean that we simply throw away tokens/character that are beyond the maximum, but it sure looks that way!
Yes, we do throw them away but only when we are comparing similarity of sequences of tokens ("weights" or "distances"). Not when calculating actual edits. This change limits the number of tokens we look at when calculating the distance. If first ~900 tokens are the same we consider the sequence exactly the same. |
What's the fallout of this though. Would that mean we miss an actual change, if not within that token count? Say someone edits the last statement? (Just trying to figure out the implication of assuming a match). Thanks! |
@CyrusNajmabadi Fallout is decreased precision of matching heuristic. We would still detect any change made anywhere in the tree. Say you have blocks A, B, C and the user makes a change that reorders B and C, so the new version is A, C, B. If C and B are large and only differ in the last token we would produce matches: A1 ~ A2, B1 ~ C2, C1 ~ B2, instead of more correct one: This matters for example, if B contains a lambda, a local variable definition, etc. Then we would try to match the lambda/variable/etc. in B1 to the corresponding lambda in C2 instead of B2. It seems unlikely though that this would happen often in practice. Reordering large blocks of code is not very common during EnC. Besides the blocks need to have the same 900 leading tokens in order for the matching alg to be affected. |
Cool, that's what i wanted to know. If this is just about a heuristic that says "more likely than not this is a move, given that 900 tokesn match" that seems totally reasonable to me. |
Thank you for that explanation @tmat, makes sense. |
Reduces the amount of buffers allocated on LOH during calculation of EnC/Hot Reload delta.
Improves AB#1537348