-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Babylon] Use char codes contants #6727
Conversation
@@ -924,7 +925,7 @@ export default class Tokenizer extends LocationParser { | |||
|
|||
readNumber(startsWithDot: boolean): void { | |||
const start = this.state.pos; | |||
let octal = this.input.charCodeAt(start) === 0x30; // '0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the hex representation.
@@ -873,7 +874,7 @@ export default class Tokenizer extends LocationParser { | |||
val = code - 97 + 10; // a | |||
} else if (code >= 65) { | |||
val = code - 65 + 10; // A | |||
} else if (code >= 48 && code <= 57) { | |||
} else if (charCodes.isIn09(code)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some helpers for ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho this should be called isDigit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isDigit looks more meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
@@ -328,31 +329,31 @@ export default class Tokenizer extends LocationParser { | |||
loop: while (this.state.pos < this.input.length) { | |||
const ch = this.input.charCodeAt(this.state.pos); | |||
switch (ch) { | |||
case 32: // space | |||
case 160: // non-breaking space | |||
case charCodes.space: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the number representation.
@@ -0,0 +1,26 @@ | |||
// @flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a separate npm package.
I didn't find any module which does the inverse operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any either. We could technically publish it on NPM.
I don't want to use an existing module because we don't use every char code and I wanted to add functions for ranges. That way we only have the necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would love to have this published separately and supporting all (most?) char codes
im always adding such constants in my projects, would be cool to just have a shareable module
That way we only have the necessary.
imho keeping more is a non-issue, maintenance costs of such package should be really low too as this stuff never changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we can do this. Do we publish a module from the Babel org? It seems not really related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on what kind of control do you want to have over it. Im really fine with any "owner" for it - if you feel it doesn't belong to babel maybe just release it under ur name, make it a dep and invite whole babel as collab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created https://github.com/xtuc/charcodes, ping me if you need collab/publish.
Can I invite a whole Babel group at once?
|
||
type Char = number; | ||
|
||
const charCodes = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of exporting an object it would be better to export them as named exports to support better tree-shakeability of those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would also enable better inlining of this values i think.
Can't we just use strings? |
some strings are hard to express with a string, i.e. paragraph separator - |
@@ -4,6 +4,7 @@ | |||
|
|||
import type { Options } from "../options"; | |||
import type { Position } from "../util/location"; | |||
import charCodes from "../util/charCodes"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be charCodes, { isDigit }
?
Guess once we make them individual exports we can just do * as charCodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mentally planned to use the * import.
Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/5864/ |
There are also some cases in |
@hzoo yes of course, I just started. |
I was wondering what would be the best representation for our constants. Currently they are using @bmeurer, do you know? 😇 |
This will definitely be slower in Node currently, because
|
We could replace them at compile time (maybe with babel-macros?) |
That'd work. |
I'll take src/tokenizer/index.js |
Does anyone has a clue about the error in the CI: https://travis-ci.org/babel/babel/jobs/299171334#L929-L962? I don't think it's actually related to this PR. |
case 34: | ||
case 39: // '"', "'" | ||
case charCodes.questionMark: | ||
case charCodes.apostrophe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That error is because here it should be "
and '
, not ?
and '
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @nicolo-ribaudo, thanks you so much!
Is that any number in the source code other than charcodes? If not we could enable |
@@ -114,7 +117,7 @@ function codePointToString(code: number): string { | |||
} else { | |||
return String.fromCharCode( | |||
((code - 0x10000) >> 10) + 0xd800, | |||
((code - 0x10000) & 1023) + 0xdc00, | |||
((code - 0x10000) & charCodes.lowercaseF) + 0xdc00, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f is 102, not 1023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's due to my search-and-replace. I ended up with charCodes.lowercaseF3
at this place. Thanks, i'll update it.
@@ -230,7 +233,7 @@ export default class Tokenizer extends LocationParser { | |||
readToken(code: number): void { | |||
// Identifier or keyword. '\uXXXX' sequences are allowed in | |||
// identifiers, so '\' also dispatches to that. | |||
if (isIdentifierStart(code) || code === 92 /* '\' */) { | |||
if (isIdentifierStart(code) || code === charCodes.backslash /* '\' */) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the comment can be removed
@nicolo-ribaudo I would like to get rid of all the magic numbers. But I guess that will be an iterative process because they are still magic numbers to me (I don't know what to replace them). |
@@ -363,7 +368,7 @@ export default class Tokenizer extends LocationParser { | |||
|
|||
default: | |||
if ( | |||
(ch > 8 && ch < 14) || | |||
(ch > charCodes.backSpace && ch < charCodes.shiftOut) || | |||
(ch >= 5760 && nonASCIIwhitespace.test(String.fromCharCode(ch))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5760
can be converted to a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have it here https://github.com/xtuc/charcodes/blob/master/packages/charcodes/src/index.js#L103. But it's kind of a strange char. I would prefer to use a function which tests against a range. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would be the function's name though 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point 😂 no idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
charcodes.isUnusualWhiteSpace
? 🤔 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make it handle also the charCodes.space
and charCodes.nonBreakingSpace
and call it charCodes.isSpaceSeparator
(https://www.compart.com/en/unicode/category/Zs)
@rajzshkr any update on your files? Can I help you? |
Why do you think we should remove it? It doesn't use char codes, it just says "read an hex int of a given length".
After thinking about it a bit, I think a function with a descriptive name (like |
I don't think that's a good idea actually, we are going to support more chars than Babylon is currently doing. And I will end up just the moving nonASCIIwhitespace list from Babylon's source. And I did the JSX plugin file. |
Can we merge this? I'm not sure what to do with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im only wondering if transform-charcodes
could possibly get more generalized, probably would be useful for people to have such 'transform-inline-modules' or something
that ofc is just an idea and not something to do within this PR
@Andarist transform-inline-modules is more or less rollup 😛 |
Nice one! It won't inline variables though :P |
packages/babylon/package.json
Outdated
@@ -25,6 +25,8 @@ | |||
"devDependencies": { | |||
"@babel/helper-fixtures": "7.0.0-beta.31", | |||
"babel-plugin-transform-for-of-as-array": "1.0.4", | |||
"babel-plugin-transform-charcodes": "0.0.7", | |||
"charcodes": "0.0.6", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 0.0.8
. Also, maybe https://github.com/xtuc/charcodes/blob/c08d3c5906475f0e2e7b7d2b316c8abdef526e02/packages/babel-plugin-transform-charcodes/package.json#L45 should be a peer dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes right. The transform doesn't require a specific version of charCodes
, it's probably a good idea to use a peer dependency here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I updated with ^
Ok, I'll go ahead and merge this. Thanks for the reviews. |
Adds a constant file where every charCode is exported.