parser: parse Unicode identifiers #218

belochub · 2017-06-12T10:56:21Z

Refs: https://github.com/metarhia/jstp/issues/152

belochub · 2017-06-12T10:56:29Z

@aqrln, may you help me with creation of pre-build configuration script to make it possible to choose which of the two implementations to use (check out 8942748#diff-9689e209b75f216133f2b377028d89f6R9)?

nechaido

LGTM

lundibundi

LGTM

lundibundi · 2017-06-14T11:40:31Z

tools/make-unicode-tables.js

+const getOutputPath = filename => path.join(__dirname, '../src', filename);
+
+const getFileHeader = filename =>
+`// Copyright (c) 2017 JSTP project authors. Use of this source code is


Maybe it'll be better to put license in a separate file?

In my opinion it is good to have it here, because it will probably help to avoid any legal problems, given the fact that this file is also a piece of software which works with Unicode data files. Consider this as a precaution, I don't know for sure but I also don't want to find out about it in a bad way.

It helps to understand the resulting format of the output files as well, so that you don't need to look anywhere else or open this auto-generated files (which are pretty big and can slow down some of the editors) to understand their contents.

Fine with me =), reasonable enough.

aqrln · 2017-06-15T12:42:04Z

@belochub can you please rebase to resolve conflicts?

aqrln · 2017-06-15T12:58:07Z

tools/make-unicode-tables.js

+
+http.get(UCD_LINK, (res) => {
+  const linereader = byline.createStream(res);
+  linereader.on('data', (line) => {


Well, wouldn't the core readline module be sufficient to accomplish this?

Yeah, it probably is sufficient, I missed that.

Add tool for getting and parsing needed categories from Unicode Character Database and generating C++ header file with code points arrays.

Also add UTF-8 decoding function.

Add two possible options to use when checking whether the code point is an identifier.

aqrln

Rubber-stamp-ish LGTM with one comment (it can be fixed on landing though).

aqrln · 2017-06-16T07:19:01Z

package-lock.json

@@ -4637,8 +4637,7 @@
            "wordwrap": {
              "version": "0.0.3",
              "resolved": "https://registry.npmjs.org/wordwrap/-/wordwrap-0.0.3.tgz",
-              "integrity": "sha1-o9XabNXAvAAI03I0u68b7WMFkQc=",
-              "dev": true


Huh? Looks like npm correctly handles the case when a package becomes used in dependencies and not only in devDependencies, but does not handle the case when the package is not required for dependencies anymore. Can you please revert this change manually?

Oh, wait, I think you only modified devDependencies. Now I really wonder what happened here.

belochub · 2017-06-16T08:44:50Z

@aqrln, we can't land it yet, you still haven't addressed #218 (comment).

belochub · 2017-06-19T17:47:37Z

@aqrln, ping.

Choose either full or short Unicode tables at build time: if $JSTP_USE_SHORT_UNICODE_TABLES is defined and set to a non-falsy value, then the native addon will be compiled with short tables, otherwise full tables will be used.

aqrln · 2017-06-20T15:43:53Z

@belochub I've pushed a commit to your branch.
@nechaido @lundibundi PTAL

* Add tool for getting and parsing needed categories from Unicode Character Database and generating C++ header file with code points arrays. * Add UTF-8 decoding function. * Implement Unicode keys parsing and add two possible options to use when checking whether the code point is an identifier. Refs: https://github.com/metarhia/jstp/issues/152 PR-URL: #218 Reviewed-By: Dmytro Nechai <nechaido@gmail.com> Reviewed-By: Denys Otrishko <shishugi@gmail.com> Reviewed-By: Alexey Orlenko <eaglexrlnk@gmail.com>

Choose either full or short Unicode tables at build time: if $JSTP_USE_SHORT_UNICODE_TABLES is defined and set to a non-falsy value, then the native addon will be compiled with short tables, otherwise full tables will be used. PR-URL: #218 Reviewed-By: Mykola Bilochub <nbelochub@gmail.com> Reviewed-By: Denys Otrishko <shishugi@gmail.com>

Choose either full or short Unicode tables at build time: if $JSTP_USE_SHORT_UNICODE_TABLES is defined and set to a non-falsy value, then the native addon will be compiled with short tables, otherwise full tables will be used. PR-URL: #218 Reviewed-By: Denys Otrishko <shishugi@gmail.com>

aqrln · 2017-06-21T07:10:21Z

Landed in 4e942ae and 4b67cbd.

* Add tool for getting and parsing needed categories from Unicode Character Database and generating C++ header file with code points arrays. * Add UTF-8 decoding function. * Implement Unicode keys parsing and add two possible options to use when checking whether the code point is an identifier. Refs: https://github.com/metarhia/jstp/issues/152 PR-URL: #218 Reviewed-By: Dmytro Nechai <nechaido@gmail.com> Reviewed-By: Denys Otrishko <shishugi@gmail.com> Reviewed-By: Alexey Orlenko <eaglexrlnk@gmail.com>

Choose either full or short Unicode tables at build time: if $JSTP_USE_SHORT_UNICODE_TABLES is defined and set to a non-falsy value, then the native addon will be compiled with short tables, otherwise full tables will be used. PR-URL: #218 Reviewed-By: Denys Otrishko <shishugi@gmail.com>

* Add tool for getting and parsing needed categories from Unicode Character Database and generating C++ header file with code points arrays. * Add UTF-8 decoding function. * Implement Unicode keys parsing and add two possible options to use when checking whether the code point is an identifier. Refs: https://github.com/metarhia/jstp/issues/152 PR-URL: #218 Reviewed-By: Dmytro Nechai <nechaido@gmail.com> Reviewed-By: Denys Otrishko <shishugi@gmail.com> Reviewed-By: Alexey Orlenko <eaglexrlnk@gmail.com>

Choose either full or short Unicode tables at build time: if $JSTP_USE_SHORT_UNICODE_TABLES is defined and set to a non-falsy value, then the native addon will be compiled with short tables, otherwise full tables will be used. PR-URL: #218 Reviewed-By: Denys Otrishko <shishugi@gmail.com>

* Add tool for getting and parsing needed categories from Unicode Character Database and generating C++ header file with code points arrays. * Add UTF-8 decoding function. * Implement Unicode keys parsing and add two possible options to use when checking whether the code point is an identifier. Refs: https://github.com/metarhia/jstp/issues/152 PR-URL: metarhia/jstp#218 Reviewed-By: Dmytro Nechai <nechaido@gmail.com> Reviewed-By: Denys Otrishko <shishugi@gmail.com> Reviewed-By: Alexey Orlenko <eaglexrlnk@gmail.com>

Choose either full or short Unicode tables at build time: if $JSTP_USE_SHORT_UNICODE_TABLES is defined and set to a non-falsy value, then the native addon will be compiled with short tables, otherwise full tables will be used. PR-URL: metarhia/jstp#218 Reviewed-By: Denys Otrishko <shishugi@gmail.com>

* Add tool for getting and parsing needed categories from Unicode Character Database and generating C++ header file with code points arrays. * Add UTF-8 decoding function. * Implement Unicode keys parsing and add two possible options to use when checking whether the code point is an identifier. Refs: https://github.com/metarhia/jstp/issues/152 PR-URL: metarhia/jstp#218 Reviewed-By: Dmytro Nechai <nechaido@gmail.com> Reviewed-By: Denys Otrishko <shishugi@gmail.com> Reviewed-By: Alexey Orlenko <eaglexrlnk@gmail.com>

Choose either full or short Unicode tables at build time: if $JSTP_USE_SHORT_UNICODE_TABLES is defined and set to a non-falsy value, then the native addon will be compiled with short tables, otherwise full tables will be used. PR-URL: metarhia/jstp#218 Reviewed-By: Denys Otrishko <shishugi@gmail.com>

* Add tool for getting and parsing needed categories from Unicode Character Database and generating C++ header file with code points arrays. * Add UTF-8 decoding function. * Implement Unicode keys parsing and add two possible options to use when checking whether the code point is an identifier. Refs: https://github.com/metarhia/jstp/issues/152 PR-URL: metarhia/jstp#218 Reviewed-By: Dmytro Nechai <nechaido@gmail.com> Reviewed-By: Denys Otrishko <shishugi@gmail.com> Reviewed-By: Alexey Orlenko <eaglexrlnk@gmail.com>

Choose either full or short Unicode tables at build time: if $JSTP_USE_SHORT_UNICODE_TABLES is defined and set to a non-falsy value, then the native addon will be compiled with short tables, otherwise full tables will be used. PR-URL: metarhia/jstp#218 Reviewed-By: Denys Otrishko <shishugi@gmail.com>

belochub added the parser label Jun 12, 2017

belochub added this to the 1.0.0 milestone Jun 12, 2017

belochub requested review from aqrln, lundibundi and nechaido June 12, 2017 10:56

belochub changed the title ~~parser: parse unicode identifiers~~ parser: parse Unicode identifiers Jun 12, 2017

belochub mentioned this pull request Jun 12, 2017

parser: parse Unicode escape sequences in keys #219

Closed

nechaido approved these changes Jun 14, 2017

View reviewed changes

lundibundi approved these changes Jun 14, 2017

View reviewed changes

aqrln reviewed Jun 15, 2017

View reviewed changes

belochub added 3 commits June 15, 2017 15:58

tools,parser: add Unicode utilities

2d5aeea

Add tool for getting and parsing needed categories from Unicode Character Database and generating C++ header file with code points arrays.

parser: add stubs for needed character checks

4971733

Also add UTF-8 decoding function.

parser: implement Unicode keys parsing

acf979b

Add two possible options to use when checking whether the code point is an identifier.

belochub force-pushed the parser-unicode-identifiers branch from e3e54ab to acf979b Compare June 15, 2017 12:59

belochub added 2 commits June 15, 2017 16:47

fixup! tools,parser: add Unicode utilities

c1ed412

fixup! fixup! tools,parser: add Unicode utilities

a2e295b

aqrln approved these changes Jun 16, 2017

View reviewed changes

belochub added the blocked label Jun 16, 2017

build: choose the Unicode tables via env variable

0bd365a

Choose either full or short Unicode tables at build time: if $JSTP_USE_SHORT_UNICODE_TABLES is defined and set to a non-falsy value, then the native addon will be compiled with short tables, otherwise full tables will be used.

aqrln removed the blocked label Jun 20, 2017

lundibundi approved these changes Jun 20, 2017

View reviewed changes

aqrln closed this Jun 21, 2017

aqrln deleted the parser-unicode-identifiers branch June 21, 2017 07:10

belochub mentioned this pull request Jan 22, 2018

v1.0.0 proposal #311

Merged

aqrln mentioned this pull request Feb 12, 2019

Make the parser compliant with JSON5 specs metarhia/mdsf#40

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parser: parse Unicode identifiers #218

parser: parse Unicode identifiers #218

belochub commented Jun 12, 2017

belochub commented Jun 12, 2017

nechaido left a comment

lundibundi left a comment

lundibundi Jun 14, 2017

belochub Jun 14, 2017

lundibundi Jun 14, 2017

aqrln commented Jun 15, 2017

aqrln Jun 15, 2017

belochub Jun 15, 2017

aqrln left a comment

aqrln Jun 16, 2017

aqrln Jun 16, 2017

belochub commented Jun 16, 2017

belochub commented Jun 19, 2017

aqrln commented Jun 20, 2017

aqrln commented Jun 21, 2017

parser: parse Unicode identifiers #218

parser: parse Unicode identifiers #218

Conversation

belochub commented Jun 12, 2017

belochub commented Jun 12, 2017

nechaido left a comment

Choose a reason for hiding this comment

lundibundi left a comment

Choose a reason for hiding this comment

lundibundi Jun 14, 2017

Choose a reason for hiding this comment

belochub Jun 14, 2017

Choose a reason for hiding this comment

lundibundi Jun 14, 2017

Choose a reason for hiding this comment

aqrln commented Jun 15, 2017

aqrln Jun 15, 2017

Choose a reason for hiding this comment

belochub Jun 15, 2017

Choose a reason for hiding this comment

aqrln left a comment

Choose a reason for hiding this comment

aqrln Jun 16, 2017

Choose a reason for hiding this comment

aqrln Jun 16, 2017

Choose a reason for hiding this comment

belochub commented Jun 16, 2017

belochub commented Jun 19, 2017

aqrln commented Jun 20, 2017

aqrln commented Jun 21, 2017