Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace is breaking token detection #105

Open
zxpectre opened this issue Aug 2, 2022 · 6 comments
Open

Whitespace is breaking token detection #105

zxpectre opened this issue Aug 2, 2022 · 6 comments

Comments

@zxpectre
Copy link

zxpectre commented Aug 2, 2022

Hi, I'm afraid whitespace is breaking proper token detection, without whitespaces this works.
I am missing some option setup in here?

Works:

jsonToObj(replaceAll('{"hello": {"FOO": {"world": 1234}}}',"FOO",cache.foo));

Fails:

jsonToObj(replaceAll('
    {"hello": {
        "FOO": {
            "world": 1234
            }
        }
    }',"FOO",cache.foo));

producing a last compact token of
',"FOO",cache.foo));

Options:

    global.sparser.options={
        ...(global.sparser.options||{}),
        source:str,
        language:"javascript",    
        lexer:"script",    
    }
@panoply
Copy link

panoply commented Aug 2, 2022

Hey, so you are parsing JSON and internally Sparser will overwrite some global options when dealing with such a language, (for example, wrap limit will be reset to 0) so be aware of this.

Looking from your example, you are asserting newlines in a string value, which is not going to work. Simply use a template literal, eg:

jsonToObj(replaceAll(`
    {"hello": {
        "FOO": {
            "world": 1234
            }
        }
    }`,"FOO",cache.foo));

Lastly, Sparser is no longer maintained. I don't know your exact use cases, but if you don't require diffing and just want to the data structures then maybe take a peek at my hard forked variation Prettify which leverages the powerful Sparser under the hood. It's still a WIP but might help you.

@zxpectre
Copy link
Author

zxpectre commented Aug 9, 2022

Ty for the reply @panoply , im parsing js-like scripts like the one I shared, not just JSON. Wrapping text a la "template literals" is managed by my code using the " ' " token.

So I'm expecting to find js mixed with JSON on my inputs.

Prettify looks promising! I will check on it once you officially release it :)

@panoply
Copy link

panoply commented Aug 9, 2022

No problems @zxpectre happy to help!

Can you submit a detailed issue to Prettify for me (with detailed code sample/example). I will be doing some work on the script lexer this week and it would be nice to find out what is causing the issue in order to prevent it from occurring in other use cases and with some luck bring it up to a stable enough level where you can use it in your project.

@panoply
Copy link

panoply commented Aug 9, 2022

@zxpectre I must of read your issue incorrectly, I see now that you are parsing the entirety of:

jsonToObj(replaceAll('
    {"hello": {
        "FOO": {
            "world": 1234
            }
        }
    }',"FOO",cache.foo));

I assumed you were only parsing the contents of replaceAll - This should not be too difficult to fix and likely occurring in the wrap logic. Definitely forward it through to Prettify and I'll ensure to apply a patch.

@zxpectre
Copy link
Author

zxpectre commented Aug 9, 2022

I would really appreciate if you could cover my use case as I'm sure this can help everybody, this are very generic needs btw.

I'm on a hurry and using sparser right now, but I could migrate if prettify does a nice job for us!

Can I ask you to share the output of your method prettify.parse(source: string): ParseTree on a script like my shared lines?

I will try to make a detailed issue if the output is handy for me. I like the idea of returning a tree, sparser has some limitations that compliicates things when trying to nest nodes correctly (mixes global and local scopes on end tokens sometimes so is hard to nest recursively)

@panoply
Copy link

panoply commented Aug 9, 2022

Prettify will return an almost identical structure as its using Sparser under the hood (but with various bug fixes and some improved handling across the board). Don't get to married to the naming convention of ParseTree the data structures are still identical. Here is the structure returned in the code sample:

{
  begin: [
    -1, -1, 1, 1, 3,  3,  5,
     5,  5, 8, 8, 8, 11, 11,
    11, 11, 8, 5, 3
  ],
  ender: [
    -1, -1, -1, -1, -1, 17, 17,
    17, 16, 16, 16, 15, 15, 15,
    15, 15, 16, 17, -1
  ],
  lexer: [
    'script', 'script', 'script',
    'script', 'script', 'script',
    'script', 'script', 'script',
    'script', 'script', 'script',
    'script', 'script', 'script',
    'script', 'script', 'script',
    'script'
  ],
  lines: [
    0, 0, 0, 0, 0, 1, 0,
    0, 1, 2, 0, 1, 2, 0,
    1, 2, 2, 2, 0
  ],
  stack: [
    'global', 'global', 'method',
    'method', 'method', 'method',
    'object', 'object', 'object',
    'object', 'object', 'object',
    'object', 'object', 'object',
    'object', 'object', 'object',
    'method'
  ],
  token: [
    'jsonToObj',
    '(',
    'replaceAll',
    '(',
    "'\n",
    '{',
    '"hello"',
    ':',
    '{',
    '"FOO"',
    ':',
    '{',
    '"world"',
    ':',
    '1234',
    '}',
    '}',
    '}',
    `',"FOO",cache.foo));\n`
  ],
  types: [
    'word',   'start',    'word',
    'start',  'string',   'start',
    'string', 'operator', 'start',
    'string', 'operator', 'start',
    'string', 'operator', 'number',
    'end',    'end',      'end',
    'string'
  ]
}

The defect occurs at the '," character walk which occurs because Sparser assumes an unterminated string, likely because a newline character proceeds the initial single quotation character. The problem here is that sparser is behaving correctly, because newlines cannot be contained in JavaScript quotation characters, any parser will fail on it. For example, see this flems

I could introduce a rule for this, but I'd personally rather not allow invalid syntax pass through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants