Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing syntax behavior for single quote to allow binary and hex literal separator #3310

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Neko-Box-Coder
Copy link
Contributor

@Neko-Box-Coder Neko-Box-Coder commented May 23, 2024

Modifying the syntax highlighting behavior for binary and hex literal single quote separator which is available from C++14 and C23

My test case

char testGoodChar = 'a'
char testBadChar = 'aba'
char testGoodInvis = '\t'
char testBadInvis = '\tn'
int testBinary = 0b0000'0000;
int testHex = 0x0AB0'aA20;
int testNormal = 2;

Currently:
Note that Red is highlighted as error from the syntax highlighting
image

New Change:
image

@Neko-Box-Coder
Copy link
Contributor Author

Neko-Box-Coder commented May 24, 2024

This is a small change, could we get this merged please if there are no concerns?
@JoeKar @dmaluka

@Neko-Box-Coder
Copy link
Contributor Author

Neko-Box-Coder commented May 31, 2024

Updated skip regex. It was skipping in some cases where it shouldn't, now it won't.
Here's the updated test cases:

char testGoodChar = 'a'
char testBadChar = 'aba'
char testGoodInvis = '\t'
char testBadInvis = '\tn'
int testBinary = 0b0000'0000;
int testHex = 0x0AB0'aA20;
int testNormal = 2;

//Good char if statement
if(testChar[i] == '.')

//Good invisible char if statement
if(testChar[i] == '\t')

//Bad invisible char if statement
if(testChar[i] == '\tn')

//Bad char if statement
if(testChar[i] == 'sfs')

//Good hex If statement
if(testHex[i] == 0x1234'1234)

//Good binary If statement
if(testBin[i] == 0b0000'1111)

@Neko-Box-Coder Neko-Box-Coder force-pushed the CppAndCLiteralSeparatorFix branch from 6cbfe1c to 9ed60c8 Compare May 31, 2024 20:09
runtime/syntax/c.yaml Outdated Show resolved Hide resolved
- constant.number: "(\\b([1-9][0-9]*|0[0-7]*|0[Xx][0-9A-Fa-f]+|0[Bb][01]+)([Uu][Ll]?[Ll]?|[Ll][Ll]?[Uu]?)?\\b)"
# Decimal Floating Constants
- constant.number: "(\\b(([0-9]*[.][0-9]+|[0-9]+[.][0-9]*)([Ee][+-]?[0-9]+)?|[0-9]+[Ee][+-]?[0-9]+)[FfLl]?\\b)"
- constant.number: "(\\b([1-9][0-9']*|0[0-7']*|0[Xx][0-9a-fA-F']+|0[Bb][01']+)([Uu]?[Ll][Ll]?|[Ll][Ll]?[Uu]?)?\\b)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the switch of the order in the list:
There are still places in the file where it's written the other way around.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change from A-Fa-f to a-fA-F is just to make it "consistent" with cpp.yaml. And I would do it the other way around, i.e. fix cpp.yaml, since A-Fa-f seems to be more conventional. On the other hand, it's not so important.

What is more important is the change from [Uu][Ll]?[Ll]?|[Ll][Ll]?[Uu]? to [Uu]?[Ll][Ll]?|[Ll][Ll]?[Uu]? which is, again, just for consistency with cpp.yaml. And I believe the cpp.yaml version is actually incorrect: it doesn't highlight numbers with U suffix only, without L, e.g. 12345U. So it should be, again, the other way around: cpp.yaml should be fixed.

Also, both these changes are not related to single quote literals, and thus should be done in a separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted the accidental changes for the order

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmaluka
Yeah I was just copying from cpp.yaml cuz I thought these lines are the same (they looked similar) so I just copied and pasted.

I can fix whichever is wrong in this or another PR if needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, as I said, I think [Uu]?[Ll][Ll]?|[Ll][Ll]?[Uu]? in cpp.yaml is wrong and should be changed to the c.yaml version. No need for a separate PR for that, it can be done in this PR, but better in a separate commit (within this PR), to keep the commit history comprehensible.

(I also mean that you'd better clean up the commit history that is already present in this PR: instead of 3 commits each of which fixes mistakes of previous commits which didn't exist before this PR and thus are not interesting to anyone, the commit history should rather consist of clear logical changes, i.e. one commit fixes single-quote literals, another commit fixes the issue with U suffix in cpp.yaml, and so on.)

...And about single quote literals: I see this just copies support for them from cpp.yaml, which in itself makes sense (assuming that they have the same syntax in both C++14 and C23). But I think it is incorrect in several ways, so it's a good moment to fix it (in both cpp.yaml and here). At least:

  1. A number cannot end with ', it must have at least one digit at the end.
  2. In hex, oct and bin numbers, ' cannot be right after 0x, 0, or 0b, there must be at least one digit before it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will apply the ones in c.yaml to cpp.yaml and also see if I can get 1 and 2 working

Copy link
Contributor Author

@Neko-Box-Coder Neko-Box-Coder Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I am not sure what to do with floating points since the part before or after the decimal are optional (i.e. 1.f or .1f and hex/bin equivalent). Enforcing the rules are going to make the already long regex be longer I think. I am inclined to just leave it as it is.

As for non floating points, I can enforce 1 of them but not both in one regex. If I do both, the closest I can get is this 0[Bb][01][01']*[01] which is wrong since 0b0 won't work.

I will have to span each representation into multiple lilnes if I have to enforce these 2 rules. Unless you have another way of doing it. I am not the best at regex tbh 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to span each representation into multiple lilnes if I have to enforce these 2 rules. Unless you have another way of doing it. I am not the best at regex tbh 😅

As you can see, you can specify multiple regexes for constant.number instead of one very long regex. There are already 3 of them, you can split them further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have broken them down for each scenario. It should work now.

@Neko-Box-Coder Neko-Box-Coder force-pushed the CppAndCLiteralSeparatorFix branch 2 times, most recently from e6ed23d to 09d19cc Compare June 1, 2024 18:36
@@ -47,9 +47,9 @@ rules:
- constant.string:
start: "'"
end: "'"
skip: "\\\\."
skip: "(\\\\.)|(.*[^']$)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This almost doesn't make sense to me, and only works if the number is at the end of a line. Why $?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will reorder the commit. That's outdated change

@@ -47,9 +47,9 @@ rules:
- constant.string:
start: "'"
end: "'"
skip: "(\\\\.)|(.*[^']$)"
skip: "(\\\\.)|([^']{4}'[^']{4})"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...And this doesn't make sense to me at all, and doesn't work at all. Even a very basic case like 123'45 is not highlighted as a number.

Instead of these hacks, it seems that the skip regex should contain the actual regex for a number containing single quotes (to ensure that it, well, actually skips all such numbers). I've tried this:

        skip: "(\\\\.)|(\\b[1-9][0-9]*'[0-9']*[0-9]\\b)"

and it seems to work fine, although only for decimal integers without U and L suffixes. So also need to add the rest here: hexadecimals, floating point, U and L suffixes and so on. Yes, the resulting regex is gonna look horrifying, but it seems there is no other way to make it actually work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that 123'45 won't work. But your change will not work as well since the char highlighting will "leak" to the next line.

image

testBinary and testHex if statement is leaking in this case.
The main point of my change is to make it not "leak" and it's quite tricky to get right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the test case I am using atm.

char testGoodChar = 'a'
char testBadChar = 'aba'
char testGoodInvis = '\t'
char testBadInvis = '\tn'
int testBinary = 0b0000'0000;
int testHex = 0x0AB0'aA20;
int testNormal = 2;

//Good char if statement
if(testChar[i] == '.')

//Good invisible char if statement
if(testChar[i] == '\t')

//Bad invisible char if statement
if(testChar[i] == '\tn' && 'fs')

const std::unordered_set<char> CharTokens = {'(', ')', ';', '{', '}', '/', '#', ',', '=', '<', '>'};

int test = 123'45;

//Bad char if statement
if(testChar[i] == '.sf')

//Good hex If statement
if(testHex[i] == 0x1234'1234)

//Good binary If statement
if(testBin[i] == 0b0000'1111)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what exactly do you mean by "leaking", but I already see that my change doesn't work correctly in such a basic case:

123'45 'j'

'j' is highlighted as error, while it is obviously a valid character literal.

But, I've tried wit @JoeKar 's PR #3127 and my change works correctly with it.

So it is yet another misbehavior of micro's highlighting of syntax regions, which should be fixed by #3127 once we merge it (after I find some time to properly review it, which I'm constantly struggling to).

Copy link
Contributor Author

@Neko-Box-Coder Neko-Box-Coder Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the screenshot I replied with, line -21 (up 21) and the line my cursor is on (the one with 26) is highlighting characters to the next line until it meets the next '. Yellow text is the character highlighting.

That screenshot is using your change but this behavior exists even for current master.

This is the main thing I want to address in this PR mainly since working on a codebase with ' as a separator for binary or hex is very inconvenient because it will highlight anything until it hits the next '

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is how your example looks with the newest version of your PR:

image

And this is how it looks with #3127 + the following change:

--- a/runtime/syntax/cpp.yaml
+++ b/runtime/syntax/cpp.yaml
@@ -30,7 +30,7 @@ rules:
       # Parenthetical Color
     - symbol.brackets: "[(){}]|\\[|\\]"
       # Integer Literals
-    - constant.number: "(\\b([1-9][0-9']*|0[0-7']*|0[Xx][0-9a-fA-F']+|0[Bb][01]+)([Uu]?[Ll][Ll]?|[Ll][Ll]?[Uu]?)?\\b)"
+    - constant.number: "(\\b([1-9][0-9']*|0[0-7']*|0[Xx][0-9a-fA-F']+|0[Bb][01']+)([Uu]?[Ll][Ll]?|[Ll][Ll]?[Uu]?)?\\b)"
       # Decimal Floating-point Literals
     - constant.number: "(\\b(([0-9']*[.][0-9']+|[0-9']+[.][0-9']*)([Ee][+-]?[0-9']+)?|[0-9']+[Ee][+-]?[0-9']+)[FfLl]?\\b)"
       # Hexadecimal Floating-point Literals
@@ -47,7 +47,7 @@ rules:
     - constant.string:
         start: "'"
         end: "'"
-        skip: "\\\\."
+        skip: "(\\\\.)|(\\b[1-9][0-9]*'[0-9']*[0-9]|0[0-7]+'[0-7']*[0-7]|0[Xx][0-9A-Fa-f]+'[0-9A-Fa-f']*[0-9A-Fa-f]|0[Bb][01]+'[01']*[01]\\b)"
         rules:
             - error: "..+"
             - constant.specialChar: "\\\\([\"'abfnrtv\\\\]|[0-3]?[0-7]{1,2}|x[0-9A-Fa-f]{1,2}|u[0-9A-Fa-f]{4}|U[0-9A-Fa-f]{8})"

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that indeed works 👍
I have modified the changes to that for skip

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] after I find some time to properly review it [...]

And I fear that moment, when you finish the review. 🥲

@Neko-Box-Coder Neko-Box-Coder force-pushed the CppAndCLiteralSeparatorFix branch 3 times, most recently from 902c692 to d5ee3f1 Compare June 1, 2024 21:55
rules:
- error: "..+"
- error: "[[:graph:]]{2,}'"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this is buggy: 'ab' is no longer highlighted as an error. The point of the original "..+" regex is that any character literal containing more than one character is an error (unless it is an escaped character like '\n', which is covered by the skip regex).

Second, what is this for anyway? The skip regex should already ensure that numbers with single-quotes are skipped, so we should not need to mess with this error regex.

Actually I get what this is for: it is a nasty workaround for the bug in micro causing the skip regex being insufficient in some cases (e.g. 123'45 'a'). As I said previously, this bug seems to be fixed in #3127. I've checked that with #3127 + your PR + this error regex changed back to "..+" it works correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmaluka
Could you confirm this again? - error: "[[:graph:]]{2,}'" is highlighting 'ab' for me.

The whole point of this change is because "..+" is not working properly with

const std::unordered_set<char> CharTokens = {'(', ')', ';', '{', '}', '/', '#', ',', '=', '<', '>'};

I have not tried the change in #3127 but if that indeed fixes, I don't mind going back to "..+"
But that means this PR would need to be merged after it though otherwise the line I mentioned above will be highlighted as error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've double-checked: with your PR as is, 'ab' is correctly highlighted as error. But with your PR applied on top of #3127 it is not. And then, after I replace your error: "[[:graph:]]{2,}'" with the original error: "..+", it is correctly highlighted as error again.

I think this "[[:graph:]]{2,}'" is obviously incorrect (the error regex should not include the single quote, since the single quote is not a part of the string that is highlighted as an error), and happens to somehow work correctly just because, as I said, there is a bug in micro.

@@ -47,9 +47,9 @@ rules:
- constant.string:
start: "'"
end: "'"
skip: "\\\\."
skip: "(\\\\.)|(\\b[1-9][0-9]*'[0-9']*[0-9]|0[0-7]+'[0-7']*[0-7]|0[Xx][0-9A-Fa-f]+'[0-9A-Fa-f']*[0-9A-Fa-f]|0[Bb][01]+'[01']*[01]\\b)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean this to be the final implementation. This is still incomplete: it doesn't include U and L cases, floating point etc. (Yes, we need all that here.)

Also I've realized this is slightly incorrect (apologies for that): need to add parentheses between \b, i.e.: (\\\\.)|(\\b(...)\\b) to ensure that it has word boundaries in all cases (e.g. that it won't skip 0xAA'BBQQ, since 0xAA'BB is not separated from QQ).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I've just realized that in this skip regex we probably don't need to require the skipped number to contain a single quote, it can be a normal number literal without single quotes, and the result will be the same. IOW the patterns here can be the same as in the constant.number's regexes, e.g. [1-9][0-9']*[0-9] instead of [1-9][0-9]*'[0-9']*[0-9]. The regex then will be simpler (and will work slightly faster).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted this skip to be close to what you are saying. Unlike constant.number, this just needs to skip anything with ' in between numbers and don't need to care about decimal. It should work fine now.

- constant.number: "(\\b([1-9][0-9']*|0[0-7']*|0[Xx][0-9a-fA-F']+|0[Bb][01]+)([Uu]?[Ll][Ll]?|[Ll][Ll]?[Uu]?)?\\b)"
- constant.number: "(\\b([0-9]|0[0-7]|0[Xx][0-9A-Fa-f]|0[Bb][01])([Uu][Ll]?[Ll]?|[Ll][Ll]?[Uu]?)?\\b)"
- constant.number: "(\\b([1-9][0-9']*[0-9]|0[0-7][0-7']*[0-7])([Uu][Ll]?[Ll]?|[Ll][Ll]?[Uu]?)?\\b)"
- constant.number: "(\\b(0[Xx][0-9A-Fa-f][0-9A-Fa-f']*[0-9A-Fa-f]|0[Bb][01][01']*[01])([Uu][Ll]?[Ll]?|[Ll][Ll]?[Uu]?)?\\b)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this split looks a bit messy. I'd instead split it into 4 regexes for the 4 clearly separated cases: decimals, octals, hexadecimals and binaries. E.g. for decimals: (\\b([1-9]|[1-9][0-9']*[0-9])\\b) and for the rest similarly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitted

# Decimal Floating-point Literals
- constant.number: "(\\b(([0-9']*[.][0-9']+|[0-9']+[.][0-9']*)([Ee][+-]?[0-9']+)?|[0-9']+[Ee][+-]?[0-9']+)[FfLl]?\\b)"
- constant.number: "(\\b(([.][0-9']*[0-9]|[0-9][0-9']*[.]|[0-9][0-9']*[.][0-9']*[0-9])([Ee][+-]?[0-9']+)?|[0-9']+[Ee][+-]?[0-9']+)[FfLl]?\\b)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this still allows invalid literals like 0'.5 and 0.'5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 2, 2024

I'm thinking that maybe we shouldn't add these changes to c.yaml... I don't know anyone using even C11, let alone C23. And adding these complex regexes does have some performance cost. I've checked that when opening a large C file (~40000 lines), with this PR the CPU time spent for initial highlighting increases from ~800 ms to ~1 second.

So since there is no practical need to add this for C, maybe we should leave C alone, and just let C++ programmers suffer the performance penalty.

@Neko-Box-Coder Neko-Box-Coder force-pushed the CppAndCLiteralSeparatorFix branch from 8b62263 to 332e515 Compare June 8, 2024 20:09
@Neko-Box-Coder
Copy link
Contributor Author

Neko-Box-Coder commented Jun 8, 2024

I have just updated the branch to split into all the categories and scenarios for both interger literal and decimal literal. I have only updated for cpp.yaml but intended the same (or similar) change to c.yaml

@dmaluka
I agree that very few people are using both C11 and C23 and this change is very likely not beneficial to them.

However, I disagree on

we shouldn't add these changes to c.yaml

.h is also commonly used for C++. And if there's any ' separator used there, the syntax highlighting will just highlights anything until it hits a ' character (currently). This makes syntax highlighting worse than not highlighting anything.

Having consistency between these 2 will make it slightly easier to maintain. It is relatively rare to open/edit a file as large as ~40000 lines. Of course, I would not want to have a performance impact but I feel like if we are opening a large file like that, we should only be syntax highlighting a portion of a file (with some clever method) or even no syntax highlighting at all.

This is the updated test case btw

char testGoodChar = 'a'
char testBadChar = 'ab'
char testBadChar = 'aba'
char testGoodInvis = '\t'
char testBadInvis = '\tn'
int testBinary = 0b0000'0000;
int testHex = 0x0AB0'aA20;
int testNormal = 2;

//Good char if statement
if(testChar[i] == '.')

//Good invisible char if statement
if(testChar[i] == '\t')

//Bad invisible char if statement
if(testChar[i] == '\tn' && 'fs')

const std::unordered_set<char> CharTokens = {'(', ')', ';', '{', '}', '/', '#', ',', '=', '<', '>'};

int test = 123'45;

//Bad char if statement
if(testChar[i] == '.sf')

//Good hex If statement
if(testHex[i] == 0x1234'1234)

float testFloat = 0x1A.0x1A'1B97;

//Good binary If statement
if(testBin[i] == 0b0000'1111)

int d = 42;
int o = 052'05;
int x = 0x2a;
int X = 0X2A;
int b = 0b101010; // C++14

unsigned long long l1 = 18446744073709550592ull;       // C++11
unsigned long long l2 = 18'446'744'073'709'550'592llu; // C++14
unsigned long long l3 = 1844'6744'0737'0955'0592uLL;   // C++14
unsigned long long l4 = 184467'440737'0'95505'92LLU;   // C++14

456'456.
.13248'4456
0x.456'45
58.
4e2
123.456e-67
123.456e-67f
.1E4f
0x10.1p0
0x1p5
0x1e5
3.14'15'92
1.18e-4932l
3.4028234e38f
3.4028234e38
3.4028234e38l

After this PR, I probably don't want to touch regex for a while 😂

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 8, 2024

.h is also commonly used for C++.

If a C++ header is highlighted using c.yaml instead of cpp.yaml, that is already a problem in itself, since, obviously, it then highlighted as a C file, not as a C++ file.

Micro uses a signature to try to detect when it should highlight .h as C++ instead of C. Feel free to propose improvements to this signature to make it more reliable.

but I feel like if we are opening a large file like that, we should only be syntax highlighting a portion of a file (with some clever method)

Vim does that. As a result, it often highlights incorrectly, since it doesn't know the correct syntax state before the displayed portion of the file in the general case.

@Neko-Box-Coder
Copy link
Contributor Author

Neko-Box-Coder commented Jun 8, 2024

Micro uses a signature to try to detect when it should highlight .h as C++ instead of C. Feel free to propose improvements to this signature to make it more reliable.

Fair enough, I didn't know that. I thought it just uses file extension to determine it. I can add a few more c++ keywords to the signature to make it more robust.

Vim does that. As a result, it often highlights incorrectly, since it doesn't know the correct syntax state before the displayed portion of the file in the general case.

I still think we shouldn't do regex on the whole file if it is too large. Something like separating file by continuous empty newlines might work but that's for the future.

Anyway, I didn't know about the signature part so you are right, it is not a bad idea to just apply this change to cpp.yaml.
I can revert the changes on c.yaml.

I will revert the changes on c.yaml and the [[:graph:]] thing.
Unless #3127 is going to be merged anytime soon, could we go ahead and merge this after I finalized this PR? The error highlight may not be correct in some cases (until #3127) but I think it is better than having it highlight everything incorrectly.

@dmaluka
Copy link
Collaborator

dmaluka commented Jun 8, 2024

I hope we will merge #3127 soon. If not, we should at least add a TODO comment to not forget to remove this [[:graph:]]{2,}' workaround in the future. (BTW, why not just ..+'?)

@Neko-Box-Coder
Copy link
Contributor Author

@dmaluka
Cool, thanks for taking the time reviewing this PR.

(BTW, why not just ..+'?)

It doesn't work with

const std::unordered_set<char> CharTokens = {'(', ')', ';', '{', '}', '/', '#', ',', '=', '<', '>'};

I have no idea why . and [[:graph:]] would make a difference but it does 🤷 .

@Neko-Box-Coder Neko-Box-Coder force-pushed the CppAndCLiteralSeparatorFix branch 3 times, most recently from 70e8826 to 95dcbc0 Compare June 8, 2024 21:43
@@ -42,7 +42,8 @@ rules:
end: "'"
skip: "\\\\."
rules:
- error: "..+"
# TODO: Revert back to - error: "..+" once #3127 is merged
- error: "[[:graph:]]{2,}'"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need it in c.yaml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work with something like

const char charArray[] = {'(', ')', ';', '{', '}', '/', '#', ',', '=', '<', '>'};

right now

@@ -2,7 +2,7 @@ filetype: c++

detect:
filename: "(\\.c(c|pp|xx)$|\\.h(h|pp|xx)?$|\\.ii?$|\\.(def)$)"
signature: "namespace|template|public|protected|private"
signature: "namespace|class|public|protected|private|template|constexpr|noexcept|nullptr|try|throw"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a separate commit.
Also "try" might be too common word.
Also I believe all this should be guarded with \b.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to separate commit

@Neko-Box-Coder Neko-Box-Coder force-pushed the CppAndCLiteralSeparatorFix branch from 95dcbc0 to 7b50629 Compare June 8, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants