Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect non-identifiers to ignore #293

Merged
merged 9 commits into from
Jun 29, 2021
Merged

Detect non-identifiers to ignore #293

merged 9 commits into from
Jun 29, 2021

Conversation

epage
Copy link
Collaborator

@epage epage commented Jun 29, 2021

We need to balance

  • false positives
  • false negatives
  • performance

These seem like a reasonable blend of cases to detect and ignore without ignoring much, if any, code.

epage added 9 commits June 29, 2021 11:41
This is prep for other items to be ignored

BREAKING CHANGE: `TokenizerBuilder` no longer takes config for ignoring
tokens.  Related, we now ignore token-ignore config flags.
We might be able to make this bail our earlier and not accidentally
detect the wrong thing by checking if the hex values are lowercase.  RFC
4122 says that UUIDs must be generated lowecase, while input accepts
any case.  The main issues are risk on the "input" part and the extra
annoyance of writing a custm `is_hex_digit` function.
I need this for hash support anyways
For now, we hardcoded a min length of 90 bytes to ensure to avoid
ambiguity with math operations on variables (generally people use
whitespace anyways).

Fixes crate-ci#287
This skips a lot of validation for being "good enough" (comment
open/closes matching, etc).

This has a chance of incorrectly matching in languages with `@` as an
operator, like Python, but Python encourages spaces arround operators,
so hopefully this won't be a problem.
We greedily matched separators, including ones that might be part of
base64.  This impacts the length calculation, so we want as much as
possible.
@epage epage merged commit effc21e into crate-ci:master Jun 29, 2021
@epage epage deleted the parse branch June 29, 2021 20:03
epage added a commit to epage/typos that referenced this pull request Jul 6, 2021
In crate-ci#293, we moved where we were filtering out results but never
switched from `filter_map` to map`, so this does that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant