Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group insides #1472

Closed
RunDevelopment opened this issue Jul 9, 2018 · 1 comment
Closed

Group insides #1472

RunDevelopment opened this issue Jul 9, 2018 · 1 comment

Comments

@RunDevelopment
Copy link
Member

RunDevelopment commented Jul 9, 2018

Group insides

I would like to propose a new way to highlight the insides of matched patterns similar to the current inside.

The idea is to allow to further apply patterns inside a capturing group.
This will be a generalization of inside.

Syntax

inside<n> will be applied to the n-th capturing group, where 1 will be the first capturing group which is not a lookbehind group.

Example:

'method-declaration': { 
	pattern: /([A-Z]\w*)\s+(\w+)\s*\([^)]\)/,
	inside1: {
		'class-name': /.+/
	},
	inside2: {
		'function': /.+/
	},
}

inside1 will match the first capturing group ([A-Z]\w*) and inside2 will match the second capturing group (\w+).

As a shorter version of:

inside1: {
	'token': /[^]+/
}

we could use:

inside1: 'token'

Alternatives

Maybe we could also use $n instead of insiden because e.g. inside1 is kind of hard to read and 1 looks similar to l.

For the time being, I used insiden for simplicity's sake.

Matching

Before we start: Two insides insideB -> insideA can have only two relations to each other:

  1. insideA and insideB are disjunct.
    In this case, the order doesn't matter and they can be highlighted in any order.
  2. insideA fully contains insideB.
    In this case, it's simply more useful if insideB were to be highlighted before insideA.

From these relations, a tree emerges. A InsideMatch will be a node in this tree. The root node will be inside.
If the grammar of a node is not defined by the user, the node will be created with an empty grammar. This is to prevent child nodes without a parent node.

interface InsideMatch {
	index: number;
	length: number;
	text: string;
	children: InsideMatch[]; // disjunct children
	grammar: Object;
}

The pseudo code will illustrate how the matching will occur.

function matchInside(inside: InsideMatch): (string | Token)[] {
	const tokens = matchDisjunctInsides(inside.text, inside.children);
	Prism.matchGrammar(inside.text, tokens, inside.grammar);
	return tokens;
}
function matchDisjunctInsides(text: string, insides: InsideMatch[]): (string | Token)[] {
	const tokens: (string | Token)[] = [];
	for (const inside of insides) {
		tokens.push(getTextBefore(text, inside));
		tokens.push(...matchInside(inside));
	}
	tokens.push(getRemainingTextAfterInsides(text, insides));
	cleanTokens(tokens);
	return tokens;
}

getTextBefore will return the text before a given InsideMatch without the text before and matched by the previous match.
getRemainingTextAfterInsides will return the text after the last InsideMatch or the whole text if no InsideMatch was given.
cleanTokens will remove empty strings and join adjacent ones.

Implementation

Sadly, JS does not return the position of a group, so it's a little tricky to implement.

The idea is to do the following:

/a(b+)c(d+)e/ -> /(a)(b+)(c)(d+)e/

We rewrite the pattern adding new groups to capture everything preceding a capturing group as well. Keep track of how many groups you added and you can calculate the index of each group.
Nested capturing groups can be handled as well be doing this method described above recursively for the contents of each capturing group if the said group contains a capturing group.

One will have to be careful because lookbehind groups are sometimes preceded by things like ^ or \b. To solve this, we can use the established assumption that everything preceding the lookbehind groups will have length 0.

Backreferences might also have to be rewritten because of the new capturing group.

Rewriting the pattern won't be easy. Needless to say, we won't do this with every pattern, only where we have to.
(Maybe we could even do it with gulp?)

Limitations

Backreferences will be a fundamental limiting factor because JS only allowed backreferences to the first 10 capturing groups. This means that it might not be possible to rewrite a given backreference.

Use cases

When I want to match the return type of a function is languages like Java, C or C#, I usually write 2 patterns. One to match the function name and one to match the return type. The problem is that this is inefficient because to correctly match the return type in front of a function, I also have to match the function name itself to be sure that it is indeed the return type of a function. In the end, I match the return type once and the function name twice.

This can get even more complicated in Languages like C# where a function declaration can look like this:

ReturnType SuperInterface.Function();

Matching ReturnType , SuperInterface and Function would require 3 complicated patterns with lots of redundancy if it were to be done the current way.

@RunDevelopment
Copy link
Member Author

Closed because of #1679.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants