You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
inside1 will match the first capturing group ([A-Z]\w*) and inside2 will match the second capturing group (\w+).
As a shorter version of:
inside1: {'token': /[^]+/}
we could use:
inside1: 'token'
Alternatives
Maybe we could also use $n instead of insiden because e.g. inside1 is kind of hard to read and 1 looks similar to l.
For the time being, I used insiden for simplicity's sake.
Matching
Before we start: Two insides insideB -> insideA can have only two relations to each other:
insideA and insideB are disjunct.
In this case, the order doesn't matter and they can be highlighted in any order.
insideA fully contains insideB.
In this case, it's simply more useful if insideB were to be highlighted before insideA.
From these relations, a tree emerges. A InsideMatch will be a node in this tree. The root node will be inside.
If the grammar of a node is not defined by the user, the node will be created with an empty grammar. This is to prevent child nodes without a parent node.
getTextBefore will return the text before a given InsideMatch without the text before and matched by the previous match. getRemainingTextAfterInsides will return the text after the last InsideMatch or the whole text if no InsideMatch was given. cleanTokens will remove empty strings and join adjacent ones.
Implementation
Sadly, JS does not return the position of a group, so it's a little tricky to implement.
The idea is to do the following:
/a(b+)c(d+)e/->/(a)(b+)(c)(d+)e/
We rewrite the pattern adding new groups to capture everything preceding a capturing group as well. Keep track of how many groups you added and you can calculate the index of each group.
Nested capturing groups can be handled as well be doing this method described above recursively for the contents of each capturing group if the said group contains a capturing group.
One will have to be careful because lookbehind groups are sometimes preceded by things like ^ or \b. To solve this, we can use the established assumption that everything preceding the lookbehind groups will have length 0.
Backreferences might also have to be rewritten because of the new capturing group.
Rewriting the pattern won't be easy. Needless to say, we won't do this with every pattern, only where we have to.
(Maybe we could even do it with gulp?)
Limitations
Backreferences will be a fundamental limiting factor because JS only allowed backreferences to the first 10 capturing groups. This means that it might not be possible to rewrite a given backreference.
Use cases
When I want to match the return type of a function is languages like Java, C or C#, I usually write 2 patterns. One to match the function name and one to match the return type. The problem is that this is inefficient because to correctly match the return type in front of a function, I also have to match the function name itself to be sure that it is indeed the return type of a function. In the end, I match the return type once and the function name twice.
This can get even more complicated in Languages like C# where a function declaration can look like this:
ReturnTypeSuperInterface.Function();
Matching ReturnType , SuperInterface and Function would require 3 complicated patterns with lots of redundancy if it were to be done the current way.
The text was updated successfully, but these errors were encountered:
Group insides
I would like to propose a new way to highlight the insides of matched patterns similar to the current
inside
.The idea is to allow to further apply patterns inside a capturing group.
This will be a generalization of
inside
.Syntax
inside<n>
will be applied to the n-th capturing group, where 1 will be the first capturing group which is not a lookbehind group.Example:
inside1
will match the first capturing group([A-Z]\w*)
andinside2
will match the second capturing group(\w+)
.As a shorter version of:
we could use:
inside1: 'token'
Alternatives
Maybe we could also use
$n
instead ofinsiden
because e.g.inside1
is kind of hard to read and1
looks similar tol
.For the time being, I used
insiden
for simplicity's sake.Matching
Before we start: Two insides
insideB
->insideA
can have only two relations to each other:insideA
andinsideB
are disjunct.In this case, the order doesn't matter and they can be highlighted in any order.
insideA
fully containsinsideB
.In this case, it's simply more useful if
insideB
were to be highlighted beforeinsideA
.From these relations, a tree emerges. A
InsideMatch
will be a node in this tree. The root node will beinside
.If the grammar of a node is not defined by the user, the node will be created with an empty grammar. This is to prevent child nodes without a parent node.
The pseudo code will illustrate how the matching will occur.
getTextBefore
will return the text before a given InsideMatch without the text before and matched by the previous match.getRemainingTextAfterInsides
will return the text after the last InsideMatch or the whole text if no InsideMatch was given.cleanTokens
will remove empty strings and join adjacent ones.Implementation
Sadly, JS does not return the position of a group, so it's a little tricky to implement.
The idea is to do the following:
We rewrite the pattern adding new groups to capture everything preceding a capturing group as well. Keep track of how many groups you added and you can calculate the index of each group.
Nested capturing groups can be handled as well be doing this method described above recursively for the contents of each capturing group if the said group contains a capturing group.
One will have to be careful because lookbehind groups are sometimes preceded by things like
^
or\b
. To solve this, we can use the established assumption that everything preceding the lookbehind groups will have length 0.Backreferences might also have to be rewritten because of the new capturing group.
Rewriting the pattern won't be easy. Needless to say, we won't do this with every pattern, only where we have to.
(Maybe we could even do it with gulp?)
Limitations
Backreferences will be a fundamental limiting factor because JS only allowed backreferences to the first 10 capturing groups. This means that it might not be possible to rewrite a given backreference.
Use cases
When I want to match the return type of a function is languages like Java, C or C#, I usually write 2 patterns. One to match the function name and one to match the return type. The problem is that this is inefficient because to correctly match the return type in front of a function, I also have to match the function name itself to be sure that it is indeed the return type of a function. In the end, I match the return type once and the function name twice.
This can get even more complicated in Languages like C# where a function declaration can look like this:
Matching
ReturnType
,SuperInterface
andFunction
would require 3 complicated patterns with lots of redundancy if it were to be done the current way.The text was updated successfully, but these errors were encountered: