Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dict): Remove only corrections if a space could be inserted as well #792

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on Aug 8, 2023

  1. fix(dict): Remove only corrections if a space could be inserted as well

    The typo dictionary words.csv previously contained
    a bunch of problematic entries such as:
    
        abouta,about
        algorithmi,algorithm
        attachen,attach
        shouldbe,should
        anumber,number
    
    Which resulted in wrong automatic corrections if the following
    spaces (indicated by ␣) were accidentally missed:
    
        about␣a
        algorithm␣i developed
        attach␣en masse
        should␣be
        a␣number
    
    Many of these entries were introduced by taking entries from the
    codespell-dict and removing corrections containing spaces (since typos
    currently doesn't support them), e.g the codespell dictionary contains:
    
        abouta->about a, about,
        shouldbe->should, should be,
    
    This commit updates `tests/verify.rs` to automatically remove
    corrections in the form of `{correction}{common_word},{correction}`
    or `{common_word}{correction},{correction}`, where `{common_word}` is
    one of the 1000 most frequent English words (except if `{correction}`
    also ends/starts in `{common_word}`, since we still want to correct e.g.
    "extrememe" to "extreme").
    
    The top-1000-most-frequent-words.csv file was generated by running:
    
        curl https://norvig.com/ngrams/count_1w.txt \
          | head -n1024 \
          | awk '{print $1;}' \
          | grep -vE '^([^ia]|al|re)$' \
          > top-1000-most-frequent-words.csv
    not-my-profile committed Aug 8, 2023
    Configuration menu
    Copy the full SHA
    60aad40 View commit details
    Browse the repository at this point in the history