-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameterized contexts #2912
Labels
Comments
Yes, please. |
Can you describe how would a syntax definition handling nested indented blocks look like with your idea ? |
Example, which (obviously) I haven't tested and could contain errors: contexts:
main:
- include: block
statement:
- match: \s+
scope: invalid.illegal.excess-indentation
pop: true
- match: '(if)(:)$'
captures:
1: keyword.control
2: punctuation.section.block.begin
set: expect-block
- match: (?=\S)
set: expression-statement
block:
- meta_scope: meta.block
- match: '^{{$indent}}'
push: statement
- include: else-pop # dedent
expect-block:
- match: '^{{$indent}}\s+'
set:
- block
- statement
with_parameters:
indent: '\0'
- match: '' # empty block (e.g. while typing)
pop: true
expression-statement:
- match: $
pop: true
- include: expression
expression:
- match: \d+
scope: constant.numeric
- match: '[+-*/=]'
scope: keyword.operator
- match: \(
scope: punctuation.section.group.begin
push:
- match: \)
scope: punctuation.section.group.end
pop: true
- include: expression This should handle a Python-inspired toy example:
|
This looks promising, especially when later on, sbnf implements this. I am curious what the devs think ? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background
Sublime's parser is effectively a deterministic pushdown automaton. This is sufficient to parse most languages to reasonable fidelity, but there are some languages with features that require more power. Issue #2241 discusses a proposal that would handle nondeterministic languages. This proposal addresses a different issue: context sensitivity.
Many popular coding languages have some features that are “context sensitive”. In lieu of a technical explanation, I provide the following examples of context-sensitive language features:
In some cases, Sublime can handle these constructs using an obscure feature involving backreferences (inherited from classic TextMate grammars), hereafter referred to as the “backreference trick”. That feature is currently essential for handling heredocs, and with some difficulty it can be used (or abused) to handle some indentation. However, it is also unintuitive, limited in scope and power, and not generally well understood by users or developers.
This proposal defines a new sublime-syntax feature that is more powerful, handles more varieties of syntax, and is easier to use. From an implementation and performance standpoint, it should not be far removed from the existing behavior.
Proposed Solution
Motivating example
The following syntax definition matches a toy XML-like syntax, handling improper close tags. When a close tag is encountered:
For comparison, the core XML syntax cannot keep track of the stack of elements at all. It would be possible to use the backreference trick to determine whether a close tag is the correct tag, but not to determine whether it matches another tag on the stack.
Explanation
This proposal has three components: the notion of “context parameters”, new syntax for using these parameters, and new syntax for setting parameter values.
Context parameters
Ordinarily, a context consists of a list of fixed rules. These rules behave identically no matter how the context was pushed onto the stack. (
with_prototype
creates new contexts with different behavior, which is why recursion is risky.) Under this proposal, a context may have one or more “parameters” that affect its behavior.This proposal adds parameters to Sublime's parser state. A parameter state consists of a map of parameter names to string values. The parser maintains a stack of parameter states. The top parameter state on that stack is the current parameter state. When a rule pushes a context onto the stack, it may push a new parameter state. When that context is popped from the context stack, the associated parameter state is popped from the parameter stack.
If some parameter name is not assigned a value, then its value is the empty string.
Implementation notes
Each syntax definition can contain only a finite number of parameter names. These names can be interned and replaced by indices; then, a parameter state consists of a small array of pointers to parameter values. When a new parameter state is pushed, it would probably be fastest to copy the entire parameter state to optimize lookups. Alternatively, lookups could walk the stack downward until they find an assignment. Unassigned parameter names in the array should hold null pointers or another special value.
Using parameters
Currently,
match
rules may contain “variables” surrounded by double braces. These “variables” do not actually vary; they are substituted in at compile time. This proposal extends that syntax to allow a rule to refer to the current parameter state. When a context is pushed onto the stack, the engine will search its match expressions for the pattern{{$foo}}
and replace that pattern with the value of the parameter named foo.Substitution should also be performed in
escape
rules.Implementation notes
Of course, most of the work will be done ahead of time. At compile time, Sublime can tell which contexts use parameters, just as it can tell which contexts use the backreference trick. Only contexts that use parameters need be compiled dynamically.
The result of this dynamic compilation should be cached to improve performance in the common case where a context is pushed many times with the same parameters. For instance, in a typical Python file, there need only be one compiled context per indentation level. As a further optimization, Sublime could pre-compile each parameterized context with all empty parameters, as it is likely to come up in many situations.
Setting parameter values
Any
push
,set
, orembed
rule may contain a new key,with_parameters
. The value ofwith_parameters
is a mapping from parameter names to parameter value expressions. These expressions are treated like match rules — the parser will substitute variables and parameters — and, in addition, numbered backreferences will be replaced with the value of the associated capture matched by the rule.When such a rule is evaluated, a new parameter state will be pushed containing the values specified in
with_parameters
and inheriting any other values from the current parameter state. The new parameter state will apply as long as the first (bottommost) context pushed by the rule is on the stack. When that context is popped, then the parameter stack will pop and the previous parameter state will be restored.Implementation notes
The parameter stack need only be pushed when a rule explicitly pushes a new parameter state, so the overhead should be negligible. At most, each context stack record would contain an extra flag indicating whether it has a new parameter state. When popping a context, the engine would check that flag and, if true, pop the parameter stack.
Discussion
This proposal would expand Sublime's parsing capabilities significantly, allowing it to parse many context-sensitive languages. Implementation complexity and performance should be quite reasonable.
Implementation
Sublime already implements the backreference trick, and in order for that to work, many of the pieces described here must already be in place:
The new parts would be:
with_parameters
rules and interning parameter names.escape
is implemented.)Unlike #2241, which requires rewinding the token stream, there shouldn't be any substantial changes to the underlying assumptions that the parser makes. Implementation should be a purely additive process. (If the backreference trick weren't already implemented, then this would be a much bigger change.)
Performance
Performance when using parameters should be similar to the performance of the backreference trick — that is, slower than using statically compiled expressions, but reasonable when used in moderation. As with the backreference trick, parameters should only be used when they are necessary to perform a highlighting function; there is no faster alternative to compare them to.
In cases where a parameterized context would be hit very frequently, the parameterized portion could be split off into a separate smaller context. For example, in the XML example above, the
element-body
context could use a fixed lookahead to detect a close tag andset
a minimal context containing the parameterized rules. A similar approach could be used in indentation-sensitive languages so that the expensive indentation-sensitive expression is only run once per line.In some cases, the performance trade-off might not be worth it for the core syntax, but would be for third-party syntaxes. For instance, despite the use of an XML-like example above, the core XML syntax probably doesn't need that level of sophistication, but a syntax for a specific XML-based configuration format likely would. (I wrote this proposal while trying to create a syntax definition for AWS CloudFormation config files, which are YAML-based and hence extremely indentation-sensitive.)
Applications
Potential applications include:
Other thoughts
tag_name
has a value ofbobby_tables\
, then the match expression(<{{$!tag_name}}>)
could evaluate to(<bobby_tables\\>)
.{{indent/[ ?:]/ /}}
to replace indicator characters in a YAML indent with spaces. This isn't an essential feature, but it might be reason to restrict the syntax of parameter names to leave room for future expansion.That's the proposal. Questions/comments/complaints/requests for clarification?
The text was updated successfully, but these errors were encountered: