Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex Chunking #895

Open
gavbarnett opened this issue Feb 7, 2021 · 0 comments
Open

Regex Chunking #895

gavbarnett opened this issue Feb 7, 2021 · 0 comments
Labels
Area: Meta Pertaining to build system, test system, infrastructure, code health, and the project itself. Issue: Task Not a bug. Even not a new feature. But we really need to do something.

Comments

@gavbarnett
Copy link
Contributor

Proposal

There is a significant amount if regex throughout the code base. It would seem sensible the move all these to a common file / object from where they can be called. The benefit of this would be easier testing of the regex expressions and consistency in expression handling (such as with whitespace).

This requires 3 main things:

  1. Create a list of the regex chunks required
    1. These shall not include flags
    2. These should be type regex
      • Or possibly string.raw?
    3. These should be bounded within either:
      1. a character class /[ ]/
      2. a group /( )/
    4. These should be built from smaller chunks where possible
      • i.e. if linkLabel regex is defined then it should be used in:
        • links
        • link reference definition
        • reference link (full, collapsed & shortcut)
  2. Create a joining/modifying function(s) for working with an array of regex chunks:
    1. Simple concatenation
    2. Concatenation with Logic (And, Or)
    3. Concatenation with lookahead / lookbehind
    4. Concatenation with quantifiers (+, *, ?, {min, max})
  3. Tests for the above

That's a fairly significant effort, but less than a full syntax parser.
Personally my focus is on the completion code but this should be applicable globally.

regex vs string.raw

Using regex gives clearer code as its syntax highlighted

But regex is hard to work with for concatenation. My main thought here is handling incomplete regex, For example say we define reg.whiteSpace and we want to check for up to 3 of then, \{0,3}\ is not valid regex. So it would need to get wrapped in a quantifier function regQuant(reg.whiteSpace, 0, 3) before concatenation with other regex chunks. I think this will end up being less readable than the string.raw equivalent. But possibly not.

References

  • This issue spawned from discussion on Reference link autocomplete #893 with @yzhang-gh and @Lemmingh
  • @yzhang-gh pointed out something similar already exists for GFM tables,
    private detectTables(text: string) {
    const lineBreak = String.raw`\r?\n`;
    const contentLine = String.raw`\|?.*\|.*\|?`;
    const leftSideHyphenComponent = String.raw`(?:\|? *:?-+:? *\|)`;
    const middleHyphenComponent = String.raw`(?: *:?-+:? *\|)*`;
    const rightSideHyphenComponent = String.raw`(?: *:?-+:? *\|?)`;
    const multiColumnHyphenLine = leftSideHyphenComponent + middleHyphenComponent + rightSideHyphenComponent;
    //// GitHub issue #431
    const singleColumnHyphenLine = String.raw`(?:\| *:?-+:? *\|)`;
    const hyphenLine = String.raw`[ \t]*(?:${multiColumnHyphenLine}|${singleColumnHyphenLine})[ \t]*`;
    const tableRegex = new RegExp(contentLine + lineBreak + hyphenLine + '(?:' + lineBreak + contentLine + ')*', 'g');
    return text.match(tableRegex);
  • Stack Overflow - How can I concatenate regex literals in JavaScript?
@Lemmingh Lemmingh added Area: Meta Pertaining to build system, test system, infrastructure, code health, and the project itself. Issue: Task Not a bug. Even not a new feature. But we really need to do something. labels Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Meta Pertaining to build system, test system, infrastructure, code health, and the project itself. Issue: Task Not a bug. Even not a new feature. But we really need to do something.
Projects
None yet
Development

No branches or pull requests

2 participants