Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# improvements #1444

Merged
merged 29 commits into from
Mar 24, 2020
Merged

C# improvements #1444

merged 29 commits into from
Mar 24, 2020

Conversation

RunDevelopment
Copy link
Member

@RunDevelopment RunDevelopment commented Jun 16, 2018

This PR is mainly about an improved detection of type names and better support for generic parameters in C#.

It also adds support for the nameof operator and now highlights type parameters the same as classes.

Michael Schmidt added 6 commits June 16, 2018 21:42
Generic methods now allow for nested generic type parameters.
A generic constructor's type name is no longer highlighted as a function.
Added support for type names with generic parameters.
Type parameters are now also marked as class names.
@mAAdhaTTah
Copy link
Member

Minor note: you'll need to commit the minified file for C#. Otherwise, I'll defer to my co-maintainers to review the Regex here.

Michael Schmidt added 4 commits June 19, 2018 01:09
@RunDevelopment
Copy link
Member Author

I added support for foreach variable declarations and improved the number highlighting.

`true` and `false` are now `boolean`
Support for `as` and `is`
Copy link
Contributor

@Golmote Golmote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for contributing.

During this review, I struggled a bit to understand what each pattern was trying to do. It would be great if the test files reflected the structure of the language definition. That is, there should be one test file per pattern name ("number_feature.test", "keyword_feature.test"), and if you want you can split a complicated pattern (I'm looking at you, "class-name") between multiple test files but please refer each one to its corresponding pattern by "naming" them using a comment.

Doing so, the tests files might get a bit redundant sometimes, when a pattern needs to be next to another pattern to be valid. But at least it makes it obvious what your regexps are trying to match.

components/prism-csharp.js Outdated Show resolved Hide resolved
components/prism-csharp.js Outdated Show resolved Hide resolved
components/prism-csharp.js Outdated Show resolved Hide resolved
components/prism-csharp.js Show resolved Hide resolved
components/prism-csharp.js Outdated Show resolved Hide resolved
tests/languages/csharp/number_feature.test Show resolved Hide resolved
@RunDevelopment
Copy link
Member Author

RunDevelopment commented Jul 8, 2018

Maybe I should explain the main part that matches almost every class name. That is this:

/[A-Z]\w*(?:\.\w+)*(?:<(?:[^>\n=;{}]+|>(?=\s*(?:\[\s*(?:,\s*)*\]\s*)*[,.>?)]))+?>)?(?:\.\w+)*(?:\[\s*(?:,\s*)*\])*/

Seems complicated? Things get a little easier when you take out the part responsable for generics. Then we get:

/[A-Z]\w*(?:\.\w+)*(?:\[\s*(?:,\s*)*\])*/

Now that's friendlier. (I removed the redundant (?:\.\w+)*)
All this does is to match classes, subclasses and their namespace. At the end we include possible array brackets and that's it.

Ok back to the part for generics:

/(?:<(?:[^>\n=;{}]+|>(?=\s*(?:\[\s*(?:,\s*)*\]\s*)*[,.>?)]))+?>)?/

The main problem here is that we try to accomplish the impossible. We try to match structures like Foo<A<B<C<D>, E<F<G>, H>, I<J<K<L>>>>, M>> which require a context free grammar with regular expressions.
Have fun with that, because regular expressions can't count the pointy brackets!

And fun I had. What I did, was to say that there are only a limited number of characters which can follow after a >, if that > is not the one closing the initiale class name.
This yields quite good results but is neither perfect nor very fast. In my testing it's on average about 2~5 times slower than (?:<[^>]+>)? (which can't match nested generic types). In certain edge case it also matches too much but that can be avoided by using lookarounds and cleverly ordering of the different patterns.

I know that this is far from perfect but it work quite reliably in all of my tests and code examples.

@RunDevelopment
Copy link
Member Author

I'm also thinking of adding real support for nullables, because with C# 8.0 coming it might be important.
Needless to say, this will make thinks even more complicated...

Simplified type expressions. This dropps support for tupels.
Moved return type to fix a bug with shadowing methods.
Added and updated tests.
@Golmote
Copy link
Contributor

Golmote commented Jul 8, 2018

The main problem here is that we try to accomplish the impossible. We try to match structures like Foo<A<B<C, E<F, H>, I<J<K>>>, M>> which require a context free grammar with regular expressions.
Have fun with that, because regular expressions can't count the pointy brackets!

In my testing it's on average about 2~5 times slower than (?:<[^>]+>)? (which can't match nested generic types).

Do people really do 5 nested levels of generics in real-world code? The regexp could certainly be faster if we limited Prism's support to a certain number of nested levels (like 2... or maybe 3. Above, the regexp will get really ugly).

@RunDevelopment
Copy link
Member Author

Do people really do 5 nested levels of generics in real-world code?

I certainly hope not, but generics can get messy. (Hopefully not that messy...)

like 2... or maybe 3. Above, the regexp will get really ugly

/<(?:[^<>]+|<(?:[^<>]+|<(?:[^<>]+|<[^<>]+>)+>)+>)+>/

I settled for 4 as the maximum depth. A depth of 3 should be enough for everything I do and 4 should satisfy even hardliners. It's probably a little overkill but better safe than sorry.
I also choose to not do any other validation as before to include things like arrays, tuples and nullables. This means that it also includes trash like Foo<.............>, but we're not a code validator, are we?

Btw. That beauty above is 9 characters less than its predecessor.

@Golmote
Copy link
Contributor

Golmote commented Jul 8, 2018

but we're not a code validator, are we?

We're definitely not!

The regexp in your last message looks ok to me. I'll review again once the PR gets updated.

@RunDevelopment
Copy link
Member Author

RunDevelopment commented Jul 9, 2018

Now there is a small problem. Consider the regex for generic methods simplified for a max depth of 2:

/\w+\s*<(?:[^<>]+|<[^<>]+>)+>(?=\s*\()/

Just try to match the text

for (Int32 i = 0; i < count; i++); foreach (Foo e in Elements);

and see how long it takes for you. It will terminate returning that no match was found, but only after exponential time.
To match this text it will take my computer approximately 30 minutes...

@magkal
Copy link

magkal commented Nov 22, 2018

Any updates on this PR? Would be awesome if it could get merged within a few months. I realize you seem to be swamped with work.

@RunDevelopment
Copy link
Member Author

I will redo this PR once #1538, #1664, and #1679 are through.

@RunDevelopment RunDevelopment mentioned this pull request Mar 10, 2019
@RunDevelopment
Copy link
Member Author

RunDevelopment commented Jan 7, 2020

So here's an update for this PR. Aside from explicit implementation (which I can't do without #1679) and C# 8.0 ranges edit: added, every feature of C# I know about is supported.

Added features include:

  • String interpolation
  • New C# 8.0 keywords
  • Nullable reference types
  • Named arguments
  • Ranges

@NotoriousRebel
Copy link

any updates on this?

@RunDevelopment
Copy link
Member Author

It just needs someone to review.

components/prism-csharp.js Outdated Show resolved Hide resolved
@RunDevelopment
Copy link
Member Author

Thanks for the review @Golmote!

Apparently, Travis is taking the day off. I'll try to get it to run and then merge.

@RunDevelopment
Copy link
Member Author

Ok. Travis actually ran and succeeded but the status isn't displayed here on GH.
Interesting.

@RunDevelopment RunDevelopment merged commit 42b1546 into PrismJS:master Mar 24, 2020
quentinvernot pushed a commit to TankerHQ/prismjs that referenced this pull request Sep 11, 2020
A complete rewrite of the C# language to support almost every feature of C# 8.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants