-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
internal/fuzzy: several improvements for symbol matching
Following the edge case discovered in golang/go#60201, take a more scientific approach to improving symbol match scoring: - Add a conformance test that compares Matcher with SymbolMatcher, querying all identifiers in x/tools. The two are not expected to agree in all cases, but this test helped find interesting ranking edge cases, which are added to the ranking test. - Don't count a capital letter in the middle of a sequence of capital letters (e.g. the M in YAML) as a word start. This was the inconsistency that led to golang/go#60201. - Compute the sequence bonus before role score; role score should take precedent. - Simplify the sequence scoring logic: a sequential character gets the same score as a word start, unless it is the final character in the pattern in which case we also adjust for whether it completes a word or segment. This feels like a reasonable heuristic. - Fix a bug in final-rune adjustment where we were checking the next input rune for a segment start, not a separator. Notably, the scoring improvements above were all derived from first principles, and happened to also improve the conformance rate in the new test. Additionally, make the following cleanup: - s/character/rune throughout, since that's what we mean - add debugging support for more easily understanding the match algorithm - add additional commentary - add benchmarks Fixes golang/go#60201 Change-Id: I838898c49cbb69af083a8cc837612da047778c40 Reviewed-on: https://go-review.googlesource.com/c/tools/+/531697 Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Robert Findley <rfindley@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
- Loading branch information
Showing
4 changed files
with
334 additions
and
84 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
// Copyright 2023 The Go Authors. All rights reserved. | ||
// Use of this source code is governed by a BSD-style | ||
// license that can be found in the LICENSE file. | ||
|
||
package fuzzy_test | ||
|
||
import ( | ||
"testing" | ||
|
||
. "golang.org/x/tools/internal/fuzzy" | ||
) | ||
|
||
func BenchmarkSelf_Matcher(b *testing.B) { | ||
idents := collectIdentifiers(b) | ||
patterns := generatePatterns() | ||
|
||
for i := 0; i < b.N; i++ { | ||
for _, pattern := range patterns { | ||
sm := NewMatcher(pattern) | ||
for _, ident := range idents { | ||
_ = sm.Score(ident) | ||
} | ||
} | ||
} | ||
} | ||
|
||
func BenchmarkSelf_SymbolMatcher(b *testing.B) { | ||
idents := collectIdentifiers(b) | ||
patterns := generatePatterns() | ||
|
||
for i := 0; i < b.N; i++ { | ||
for _, pattern := range patterns { | ||
sm := NewSymbolMatcher(pattern) | ||
for _, ident := range idents { | ||
_, _ = sm.Match([]string{ident}) | ||
} | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.