Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(java) Scientific notation numbers are only legal for floating points #2508

Closed
0xflotus opened this issue Apr 24, 2020 · 19 comments · Fixed by #2509
Closed

(java) Scientific notation numbers are only legal for floating points #2508

0xflotus opened this issue Apr 24, 2020 · 19 comments · Fixed by #2509
Labels
bug help welcome Could use help from community language

Comments

@0xflotus
Copy link
Contributor

0xflotus commented Apr 24, 2020

Describe the issue

HighlightJS highlights illegal number literals

Which language seems to have the issue?

java

Are you using highlight or highlightAuto?

highlight

Sample Code to Reproduce

Case a and b should be illegal, case c and d are fine.

https://jsfiddle.net/e1xnqskb/

a = 2e4L;
b = 2e3l;
c = 2e3f;
d = 1.2e4D;

Expected behavior

I would expect that illegal number literals are not highlighted

Additional context

It's a similar issue to #2507

@0xflotus 0xflotus added bug help welcome Could use help from community language labels Apr 24, 2020
@0xflotus 0xflotus changed the title (java) Scientific numbers are floating points (java) Scientific notation numbers are only legal for floating points Apr 24, 2020
@joshgoebel
Copy link
Member

You want to make an attempt at a patch? Fixing this will likely require splitting the huge regex we currently have up into variants to match the different types...

@0xflotus
Copy link
Contributor Author

0xflotus commented Apr 25, 2020

I tried the following code locally but it doesn't work.

var JAVA_NUMBER_RE = '\\b' +
    '(' +
    '(' +
    '0[bB]([01]+[01_]+[01]+|[01]+)' + // 0b...
    '|' +
    '0[xX]([a-fA-F0-9]+[a-fA-F0-9_]+[a-fA-F0-9]+|[a-fA-F0-9]+)' + // 0x...
    '|' +
    '(' +
    '([\\d]+[\\d_]+[\\d]+|[\\d]+)(\\.([\\d]+[\\d_]+[\\d]+|[\\d]+))?' +
    '|' +
    '\\.([\\d]+[\\d_]+[\\d]+|[\\d]+)' +
    ')' +
    ')' +
    '[lLfF]?' + '|' + '(' +
    '([\\d]+[\\d_]+[\\d]+|[\\d]+)(\\.([\\d]+[\\d_]+[\\d]+|[\\d]+))?' +
    '|' +
    '\\.([\\d]+[\\d_]+[\\d]+|[\\d]+)' +
    ')' +
    '([eE][-+]?\\d+)?' + '[fF]?' + ')';

Regexes are really hard. :D

Sorry for bad indentation :(

@joshgoebel
Copy link
Member

joshgoebel commented Apr 25, 2020

Yeah I wouldn't accept that anyways, we need to switch to variants. Look at how some of the other syntaxes use variant for numbers. :-)

And we'll need some markup tests for as many number combos as you want to come up with. It's possible we already have some.

@joshgoebel
Copy link
Member

There are some nice apps that can make testing them easier... on Mac: Patterns, Expressions, etc

@0xflotus
Copy link
Contributor Author

Yeah ok, I will try again with variants :-)

For sure. I used https://regex101.com/ to test manually the regex with some cases

@joshgoebel
Copy link
Member

eah ok, I will try again with variants :-)

It should be easier when you only have to get one type working at a time vs a huge regex for them all, etc...

@0xflotus
Copy link
Contributor Author

btw shouldn't it be the same like hljs.C_NUMBER_RE ?

@joshgoebel
Copy link
Member

joshgoebel commented Apr 25, 2020

That might work for some of the variants... See arcade for an example:

   variants: [
      { begin: '\\b(0[bB][01]+)' },
      { begin: '\\b(0[oO][0-7]+)' },
      { begin: hljs.C_NUMBER_RE }
    ],

@joshgoebel
Copy link
Member

We already have tests so you'd just be exepanding those: test/markup/java/numbers.txt

@0xflotus
Copy link
Contributor Author

That might work for some of the variants... See arcade for an example:

   variants: [
      { begin: '\\b(0[bB][01]+)' },
      { begin: '\\b(0[oO][0-7]+)' },
      { begin: hljs.C_NUMBER_RE }
    ],

I will try it, it's similar to javascript.

@0xflotus
Copy link
Contributor Author

It's almost hard, because of some other facts:

0779 is a valid octal number
0778 not
octal numbers may have a suffix of [fFdDlL]
binary and hexadecimal numbers are integers and may only have the suffix [lL]
but hexadecimal floating point numbers could have only suffix [dDfF]

@joshgoebel
Copy link
Member

0779 is a valid octal number

It is? How so?

@joshgoebel
Copy link
Member

Also, we don't necessarily have to NOT highlight invalid combos as long as we highlight valid ones well. If technically the regex matches "too much" but those things aren't likely to come up in programs then it really doesn't matter.

I meant to ask what led to you reporting this bug in the first place?

@0xflotus
Copy link
Contributor Author

0779 is a valid octal number

It is? How so?

Oh no, copy paste error. 0779 is invalid, 077 is valid.

I meant to ask what led to you reporting this bug in the first place?

Maybe you are right, highlightJS is not a compiler, it's only a code highlighter. Maybe this was the wrong attempt.

@joshgoebel
Copy link
Member

I'd think octal is pretty easy: it's just a leading 0 followed by 0-7, no?

@0xflotus
Copy link
Contributor Author

I'd think octal is pretty easy: it's just a leading 0 followed by 0-7, no?

'\\b(0[0-7]+)[dDfFlL]?' yeah this is easy, but with other the other rules 0779 was valid too, but this is ok?!?

@joshgoebel
Copy link
Member

Looks like a start... add some more tests, get something that passes and I'll review it. Worry about them each one at a time... octal shouldn't worry if others will match... all we care about is if it's a number or not... we don't really care WHICH rule matches it.

@0xflotus
Copy link
Contributor Author

#2509 is my try to fix #2507 and this issue

@joshgoebel
Copy link
Member

Please add these test cases also then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help welcome Could use help from community language
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants