Skip to content

Conversation

@rainers
Copy link
Member

@rainers rainers commented Nov 15, 2014

spell checker: find the "closest" symbol for the minimum levenshtein distance by counting levels of scopes between the identifier and the found symbol. If the symbol is found through an import, another level is added.

This is needed to let the precise GC pass the test suite, otherwise "gc" is easily reported for spell checker tests.

https://issues.dlang.org/show_bug.cgi?id=13736

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document what the inputs, outputs, and return value represents. I find it rather obtuse from reading the source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated. I don't perfectly like the use of 'distance' because the speller function uses this name aswell, but for another metric. But I couldn't find a better matching name.
I also removed the use of a reference from the 'ndist' parameter at the cost of a little more verbosity at the call site.

I find it rather obtuse from reading the source.

I'm not sure the spellerX/Y functions are less obtuse. It isn't very obvious to me that 'flag' in spellerX means the levenshtein distance.
BTW: I moved the existing function comment to speller(), it seems that's where it is meant to be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is incorrect, it isn't what the code is doing. I'm also not sure how the difference can be less than 0, it doesn't make sense. I presume that distance = (number of transformations) + (scope difference) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the diff view is pretty confusing. I didn't change the existing comment, but moved it to the function it documented in line 215.
'distance' here is an arbitrary value set by the search function to somehow order results. It is set to 0 by most of them simulating previous behaviour, but scope_search_fp sets it to the number of scope levels climbed up to find the identifier.
As described in the comment that is unfortunately folded away as outdated, I'd like to name it differently than 'distance' to avoid confusion. The best alternative I could come up was 'proximity', but that seems like the opposite. Do you have a suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could call it simply "cost", as a "cost function" is a well known artifice. The idea is to come up with a "cost" for each symbol, and then pick the one with the minimum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cost" sounds fine, but somehow escaped me. I'll replace 'dist'...

- find the "closest" symbol for the minimum levenshtein distance
- do not report private symbols from import
- remove spurious error messages on private symbols found by the spell checker
- allow ambiguous symbols to be reported
@rainers
Copy link
Member Author

rainers commented Nov 19, 2014

Updated, should now also fix https://issues.dlang.org/show_bug.cgi?id=5839

@MartinNowak
Copy link
Member

Nice, I remember that we talked about this last year at dconf.

WalterBright added a commit that referenced this pull request Nov 20, 2014
Fix 13736 - Spellchecker should prefer symbols from inner scopes
@WalterBright WalterBright merged commit 5c087d7 into dlang:master Nov 20, 2014
@MartinNowak
Copy link
Member

Nice, I remember that we talked about this last year at dconf.

One other idea we had was that fewer transformations should be tried for short identifiers.
It doesn't make sense to search for b when someone typed a, but the longer the name the more typos should be "allowed".

@yebblies
Copy link
Contributor

One other idea we had was that fewer transformations should be tried for short identifiers.
It doesn't make sense to search for b when someone typed a, but the longer the name the more typos should be "allowed".

It should already do that: size_t maxdist = seedlen < 4 ? seedlen / 2 : 2;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants