Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some JSON reworks #8

Merged
merged 22 commits into from
Dec 8, 2021
Merged

Some JSON reworks #8

merged 22 commits into from
Dec 8, 2021

Conversation

deathaxe
Copy link
Collaborator

@deathaxe deathaxe commented Dec 4, 2021

No description provided.

Removes some quotes and unnecessary groups.
Use `\"` as it is more common in most syntaxes.
Use fixed order of quoted/unquoted key patterns in JSON5.
This directive is required only if contexts are designed to be pushed
or set onto stack directly.

Example:

main:
  - match:
    push: context

context:
  - meta_include_prototype: false

They are not needed if contexts are only `- include`ed.
Indexing top-level keys can be achieved by a selector in 
Symbol List.tmPreference even without `meta.toc-list`.

Beyond that a general purpose JSON syntax doesn't need to distinguish
between top-level and other mapping keys.

Note:
A common SymbolList.tmPreferences for all JSON dialects is sufficient.
Only contexts whose purpose is to set a meta scope should contain a
`meta` in their name.

Note:

Such contexts should use `...-body` or `...-content`.

The latter one feels more natural for comments/strings while the former
one should be preferred for code blocks.
Such contexts should use `...-body` or `...-content`.

The latter one feels more natural for comments/strings while the former
one should be preferred for code blocks.
This commit uses `{{identifier_start}}` and `{{identifier_break}}` to
push and pop `mapping-key-ecma` context on/off stack. By not consuming
anything, string interpolation is enabled for syntaxes which embed JSON.

An example might be PHP HEREDOCs.
There's no use for that syntax as it is nothing else as plain JSON.

Inheritance order is: JSON5 < JSONC < JSON
Just highlight the space illegal between two values, instead of the
whole next value.
Improve tokenization
@jrappen
Copy link
Owner

jrappen commented Dec 5, 2021

Thanks, I'll take a look at this tonight.

@jrappen
Copy link
Owner

jrappen commented Dec 6, 2021

I do have a few minor questions, mainly technical. I'll write them down tomorrow.

- include: comment-block

comment-line:
- match: //.*$\n?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deathaxe Could you comment on the .* usage here vs. @keith-hall's solution at sublimehq/packages#757#issuecomment-326882205

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ST3 using capture groups to scope punctuation.definition.comment was somewhat slower than using push statements. As of ST4 it seems no longer the case, so we can also use that simplified pattern. As we don't need capture groups to match punctuations here this simple pattern is very much enough.

We only want to push into a comments context if we want to support custom highlighting within it or want to enable inheriting syntaxes to inject interpolations. I saw svelte or surface doing such things with html comments for instance.

<!-- html comment with ${variables} interpolation -->

I guess we don't need to support that for illegal tokens here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about custom overrides? Or TODO plugins?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess avoiding .* isn't as important since embed/escape was introduced, but there may still be situations where you want to avoid scoping the entire rest of the line as a comment, and I believe being consistent makes sense, so I'd vote to keep using the approach I mentioned in the linked comment.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the feedback.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record: We are talking about a line of illegal comments here. No matter what any inheriting syntax might want to inject, it still keeps illegal.

push: mapping-expect-value

mapping-expect-value:
- match: ',|\s?(?=\})'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replacing \s? with \s* makes sense here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am about to rework that part to support illegal commas as in arrays, anyway. The question here is just, how many spaces to highlight illegal.

To be honest, I am not sure if we should even highlight it illegal at all, as missing values should be quite obvious to be wrong, which trailing commas probably are not.

@jrappen
Copy link
Owner

jrappen commented Dec 7, 2021

@deathaxe Could you comment on d6a5b30? I read a commit from @FichteFoll somewhere, where he added a special match for empty arrays and objects to speed up the syntax. I'll look up the reference... Maybe we should remove the special *.empty scope as you suggest here, but keep the match for speed improvements?

@jrappen
Copy link
Owner

jrappen commented Dec 7, 2021

Is the speed gain for using captures: 0: ... significant enough to change it back from using scope & captures now?

@jrappen jrappen merged commit cc78e9a into main Dec 8, 2021
@jrappen jrappen deleted the json-syntax branch December 8, 2021 09:20
@deathaxe
Copy link
Collaborator Author

deathaxe commented Dec 8, 2021

If we wanted to scope meta.sequence.empty consistently, we'd need to support something like

[

// <- meta.sequence.empty
]

I wouldn't want to introduce branching just for such a simple thing of almost no use.

The discussion about empty was mainly about languages which handle tokens like [] in a special way. Haskell and Lisp do so, IIRC. Those tokens used to have been scoped constant.language or something like that which feels odd.

From performance perspective the additional patterns including capture groups cause some penalty. That's what I also learned in Java, which uses such additional tokens to handle empty docstrings well. It was just easier to get reliable results and was faster than using negative lookaheads in the normal contexts.

As empty arrays may be rather uncommon in json and those don't have any special meaning, I find those special handlings not worth it.

@jrappen
Copy link
Owner

jrappen commented Dec 8, 2021

@deathaxe Yes, understood. I was just curious about performance. I initially hadn't done this to scope them as empty, but saw that it was used for performance. I thought it was easier/faster to just ask you, than run multiple performance benchmarks. Also, not common as expected. Thanks for feedback.

@deathaxe
Copy link
Collaborator Author

deathaxe commented Dec 8, 2021

I guess I need to revert my statement about comments performance. just duplicated jsonc comment test content to 55k lines. The caputure group style took 81ms while the push-style takes only 60ms. May have been equal only on certain dev builds.

Added a commit to main to change comments in JSONC to push style again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants