[Babylon] Use char codes contants #6727

xtuc · 2017-11-02T08:11:41Z

Q	A
Fixed Issues?	#6726 (comment)
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass?	Yes
Documentation PR
Any Dependency Changes?
License	MIT

Adds a constant file where every charCode is exported.

xtuc · 2017-11-02T08:12:08Z

packages/babylon/src/tokenizer/index.js

@@ -924,7 +925,7 @@ export default class Tokenizer extends LocationParser {

  readNumber(startsWithDot: boolean): void {
    const start = this.state.pos;
-    let octal = this.input.charCodeAt(start) === 0x30; // '0'


I removed the hex representation.

xtuc · 2017-11-02T08:12:25Z

packages/babylon/src/tokenizer/index.js

@@ -873,7 +874,7 @@ export default class Tokenizer extends LocationParser {
        val = code - 97 + 10; // a
      } else if (code >= 65) {
        val = code - 65 + 10; // A
-      } else if (code >= 48 && code <= 57) {
+      } else if (charCodes.isIn09(code)) {


I added some helpers for ranges.

imho this should be called isDigit

isDigit looks more meaningful.

Good point.

xtuc · 2017-11-02T08:12:35Z

packages/babylon/src/tokenizer/index.js

@@ -328,31 +329,31 @@ export default class Tokenizer extends LocationParser {
    loop: while (this.state.pos < this.input.length) {
      const ch = this.input.charCodeAt(this.state.pos);
      switch (ch) {
-        case 32: // space
-        case 160: // non-breaking space
+        case charCodes.space:


I removed the number representation.

rajasekarm · 2017-11-02T08:22:17Z

packages/babylon/src/util/charCodes.js

@@ -0,0 +1,26 @@
+// @flow


Can we make this a separate npm package.
I didn't find any module which does the inverse operation.

I didn't find any either. We could technically publish it on NPM.

I don't want to use an existing module because we don't use every char code and I wanted to add functions for ranges. That way we only have the necessary.

i would love to have this published separately and supporting all (most?) char codes

im always adding such constants in my projects, would be cool to just have a shareable module

That way we only have the necessary.

imho keeping more is a non-issue, maintenance costs of such package should be really low too as this stuff never changes

Ok, we can do this. Do we publish a module from the Babel org? It seems not really related.

Depends on what kind of control do you want to have over it. Im really fine with any "owner" for it - if you feel it doesn't belong to babel maybe just release it under ur name, make it a dep and invite whole babel as collab.

I created https://github.com/xtuc/charcodes, ping me if you need collab/publish.

Can I invite a whole Babel group at once?

Andarist · 2017-11-02T10:15:55Z

packages/babylon/src/util/charCodes.js

+
+type Char = number;
+
+const charCodes = {


instead of exporting an object it would be better to export them as named exports to support better tree-shakeability of those

Good point.

That would also enable better inlining of this values i think.

nicolo-ribaudo · 2017-11-02T12:34:14Z

Can't we just use strings? input[pos] instead of input.charCodeAt(pos) and then use strings in comparisons.

Andarist · 2017-11-02T12:43:27Z

some strings are hard to express with a string, i.e. paragraph separator - String.fromCharCode(0x2029) just looks like an empty string

existentialism · 2017-11-02T16:37:33Z

packages/babylon/src/tokenizer/index.js

@@ -4,6 +4,7 @@

 import type { Options } from "../options";
 import type { Position } from "../util/location";
+import charCodes from "../util/charCodes";


This should be charCodes, { isDigit }?

Guess once we make them individual exports we can just do * as charCodes

Yes, I mentally planned to use the * import.

babel-bot · 2017-11-02T17:07:06Z

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/5864/

hzoo · 2017-11-02T17:40:50Z

There are also some cases in src/plugins/jsx/index.js

xtuc · 2017-11-02T19:06:44Z

@hzoo yes of course, I just started.

Vim 🖤

xtuc · 2017-11-03T08:01:00Z

I was wondering what would be the best representation for our constants. Currently they are using numbers maybe hex would be faster?

@bmeurer, do you know? 😇

bmeurer · 2017-11-03T08:07:56Z

This will definitely be slower in Node currently, because

you penalize the baseline performance, because now you need at least two loads to get to the constant, whereas before you just had a literal there, and
even the optimized code will to do at least one load, since V8 has machinery for constant-field tracking, but it's not shipping yet, so it doesn't know that those properties are constants.

nicolo-ribaudo · 2017-11-03T08:31:52Z

We could replace them at compile time (maybe with babel-macros?)

bmeurer · 2017-11-03T09:12:24Z

That'd work.

rajasekarm · 2017-11-03T12:58:49Z

I'll take src/tokenizer/index.js

xtuc · 2017-11-08T16:44:13Z

Does anyone has a clue about the error in the CI: https://travis-ci.org/babel/babel/jobs/299171334#L929-L962? I don't think it's actually related to this PR.

nicolo-ribaudo · 2017-11-08T16:56:39Z

packages/babylon/src/tokenizer/index.js

-      case 34:
-      case 39: // '"', "'"
+      case charCodes.questionMark:
+      case charCodes.apostrophe:


That error is because here it should be " and ', not ? and '.

Good catch @nicolo-ribaudo, thanks you so much!

nicolo-ribaudo · 2017-11-09T17:00:50Z

Is that any number in the source code other than charcodes? If not we could enable no-magic-numbers

nicolo-ribaudo · 2017-11-09T17:03:44Z

packages/babylon/src/tokenizer/index.js

@@ -114,7 +117,7 @@ function codePointToString(code: number): string {
  } else {
    return String.fromCharCode(
      ((code - 0x10000) >> 10) + 0xd800,
-      ((code - 0x10000) & 1023) + 0xdc00,
+      ((code - 0x10000) & charCodes.lowercaseF) + 0xdc00,


f is 102, not 1023

Yeah that's due to my search-and-replace. I ended up with charCodes.lowercaseF3 at this place. Thanks, i'll update it.

nicolo-ribaudo · 2017-11-09T17:05:39Z

packages/babylon/src/tokenizer/index.js

@@ -230,7 +233,7 @@ export default class Tokenizer extends LocationParser {
  readToken(code: number): void {
    // Identifier or keyword. '\uXXXX' sequences are allowed in
    // identifiers, so '\' also dispatches to that.
-    if (isIdentifierStart(code) || code === 92 /* '\' */) {
+    if (isIdentifierStart(code) || code === charCodes.backslash /* '\' */) {


Here the comment can be removed

xtuc · 2017-11-09T17:05:48Z

@nicolo-ribaudo I would like to get rid of all the magic numbers. But I guess that will be an iterative process because they are still magic numbers to me (I don't know what to replace them).

nicolo-ribaudo · 2017-11-11T13:48:44Z

packages/babylon/src/tokenizer/index.js

@@ -363,7 +368,7 @@ export default class Tokenizer extends LocationParser {

        default:
          if (
-            (ch > 8 && ch < 14) ||
+            (ch > charCodes.backSpace && ch < charCodes.shiftOut) ||
            (ch >= 5760 && nonASCIIwhitespace.test(String.fromCharCode(ch)))


5760 can be converted to a constant?

Yes, I have it here https://github.com/xtuc/charcodes/blob/master/packages/charcodes/src/index.js#L103. But it's kind of a strange char. I would prefer to use a function which tests against a range. What do you think?

I like the idea 👍

what would be the function's name though 🤣

good point 😂 no idea

charcodes.isUnusualWhiteSpace? 🤔 😂

We could make it handle also the charCodes.space and charCodes.nonBreakingSpace and call it charCodes.isSpaceSeparator (https://www.compart.com/en/unicode/category/Zs)

xtuc · 2017-11-11T17:15:17Z

src/tokenizer/index.js is ok to me, left is:

readHexChar I'm not really sure how we can remove it, what do you guys think?
Maybe missing a function as said here, I replaced it with the oghamSpaceMark constant.

@rajzshkr any update on your files? Can I help you?

nicolo-ribaudo · 2017-11-11T17:36:28Z

readHexChar I'm not really sure how we can remove it, what do you guys think?

Why do you think we should remove it? It doesn't use char codes, it just says "read an hex int of a given length".

Maybe missing a function as said here, I replaced it with the oghamSpaceMark constant.

After thinking about it a bit, I think a function with a descriptive name (like charCodes.isSpaceSeparator) would be better, since I not many people know what oghamSpaceMark is and why it is there. (I just learned what Ogham is lol)

xtuc · 2017-11-13T12:24:51Z

charCodes.isSpaceSeparator

I don't think that's a good idea actually, we are going to support more chars than Babylon is currently doing. And I will end up just the moving nonASCIIwhitespace list from Babylon's source.

And I did the JSX plugin file.

xtuc · 2017-11-15T09:48:20Z

Can we merge this?

I'm not sure what to do with isSpaceSeparator, but we can merge this for now.

Andarist

Im only wondering if transform-charcodes could possibly get more generalized, probably would be useful for people to have such 'transform-inline-modules' or something

that ofc is just an idea and not something to do within this PR

nicolo-ribaudo · 2017-11-15T11:26:17Z

@Andarist transform-inline-modules is more or less rollup 😛

Andarist · 2017-11-15T11:38:47Z

@Andarist transform-inline-modules is more or less rollup 😛

Nice one! It won't inline variables though :P

nicolo-ribaudo · 2017-11-15T17:39:28Z

packages/babylon/package.json

@@ -25,6 +25,8 @@
  "devDependencies": {
    "@babel/helper-fixtures": "7.0.0-beta.31",
    "babel-plugin-transform-for-of-as-array": "1.0.4",
+    "babel-plugin-transform-charcodes": "0.0.7",
+    "charcodes": "0.0.6",


This should be 0.0.8. Also, maybe https://github.com/xtuc/charcodes/blob/c08d3c5906475f0e2e7b7d2b316c8abdef526e02/packages/babel-plugin-transform-charcodes/package.json#L45 should be a peer dependency?

Yes right. The transform doesn't require a specific version of charCodes, it's probably a good idea to use a peer dependency here.

Ok I updated with ^

xtuc · 2017-11-16T09:35:37Z

Ok, I'll go ahead and merge this.

Thanks for the reviews.

feat: setup constants

459e289

xtuc commented Nov 2, 2017

View reviewed changes

xtuc added pkg: parser PR: Internal 🏠 A type of pull request used for our changelog categories PR: Polish 💅 A type of pull request used for our changelog categories labels Nov 2, 2017

xtuc changed the title ~~feat: setup constants~~ [Babylon] Use char codes contants Nov 2, 2017

rajasekarm reviewed Nov 2, 2017

View reviewed changes

Andarist reviewed Nov 2, 2017

View reviewed changes

refactor: use charCodes

abb4850

xtuc requested a review from danez November 2, 2017 16:27

existentialism reviewed Nov 2, 2017

View reviewed changes

refactor: switch to individual exports

b95810f

feat: add more charCodes

a2ed843

xtuc and others added 4 commits November 2, 2017 20:13

feat: generate charCodes

8b206b2

Vim 🖤

feat: sort by value

b51ac5e

Vim 🖤

feat: more charcodes

72a90cb

Merge branch 'master' into feat-use-charcode-constants

233e60c

xtuc added good first issue help wanted labels Nov 3, 2017

fix: use charcode package

b93800d

chore: bump charcode plugin

6b417a8

nicolo-ribaudo reviewed Nov 8, 2017

View reviewed changes

feat: more charcodes

88c1b4e

nicolo-ribaudo reviewed Nov 9, 2017

View reviewed changes

fix: minor changes

b79e3c0

nicolo-ribaudo reviewed Nov 11, 2017

View reviewed changes

feat: more charCodes

097ff40

xtuc removed good first issue help wanted labels Nov 13, 2017

feat: use charcodes in JSX plugin

186f04a

Andarist approved these changes Nov 15, 2017

View reviewed changes

nicolo-ribaudo approved these changes Nov 15, 2017

View reviewed changes

nicolo-ribaudo reviewed Nov 15, 2017

View reviewed changes

xtuc added 2 commits November 15, 2017 19:53

chore: upgrade and fix charcodes

4c584ae

chore: upgrade charcode

57e2c45

xtuc merged commit bb89364 into master Nov 16, 2017

xtuc deleted the feat-use-charcode-constants branch November 16, 2017 09:35

lock bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Oct 5, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 5, 2019

[Babylon] Use char codes contants #6727

[Babylon] Use char codes contants #6727

Conversation

xtuc commented Nov 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rajasekarm Nov 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolo-ribaudo commented Nov 2, 2017

Andarist commented Nov 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

babel-bot commented Nov 2, 2017 • edited Loading

hzoo commented Nov 2, 2017

xtuc commented Nov 2, 2017

xtuc commented Nov 3, 2017

bmeurer commented Nov 3, 2017

nicolo-ribaudo commented Nov 3, 2017 • edited Loading

bmeurer commented Nov 3, 2017

rajasekarm commented Nov 3, 2017

xtuc commented Nov 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicolo-ribaudo commented Nov 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xtuc commented Nov 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xtuc commented Nov 11, 2017

nicolo-ribaudo commented Nov 11, 2017

xtuc commented Nov 13, 2017 • edited Loading

xtuc commented Nov 15, 2017

Andarist left a comment

Choose a reason for hiding this comment

nicolo-ribaudo commented Nov 15, 2017

Andarist commented Nov 15, 2017

nicolo-ribaudo Nov 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xtuc commented Nov 16, 2017

xtuc commented Nov 2, 2017 •

edited

Loading

rajasekarm Nov 2, 2017 •

edited

Loading

babel-bot commented Nov 2, 2017 •

edited

Loading

nicolo-ribaudo commented Nov 3, 2017 •

edited

Loading

xtuc commented Nov 9, 2017 •

edited

Loading

xtuc commented Nov 13, 2017 •

edited

Loading

nicolo-ribaudo Nov 15, 2017 •

edited

Loading