Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: make js2c.py strip comments from sources #11417

Closed
wants to merge 1 commit into from

Conversation

aqrln
Copy link
Contributor

@aqrln aqrln commented Feb 16, 2017

In order to allow using Unicode characters inside comments of built-in JavaScript libraries without forcing them to be stored as UTF-16 data in Node's binary, update the tooling to strip comments during build process. All line breaks are preserved so that line numbers in stack traces aren't broken.

Refs: #11129
Refs: #11371 (comment)

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

tools

@nodejs-github-bot nodejs-github-bot added the build Issues and PRs related to build files or the CI. label Feb 16, 2017
@aqrln
Copy link
Contributor Author

aqrln commented Feb 16, 2017

/cc @bnoordhuis @addaleax

@Fishrock123
Copy link
Contributor

I'm conflicted if we should actually do this. Are comments in the core deployed source useful anywhere for debugging?

@Fishrock123 Fishrock123 self-requested a review February 16, 2017 15:33
@Fishrock123
Copy link
Contributor

Also I was under the impression we'd make the lint rule for ASCII code ignore the comments, rather than removing them. 😕

@aqrln
Copy link
Contributor Author

aqrln commented Feb 16, 2017

@Fishrock123 the idea was to strip comments during build, make the lint rule only enforce ASCII in the code and allow any characters in comments. What about debugging... yeah, the core sources shown in, e.g., Chrome DevTools will be without comments.

addaleax
addaleax previously approved these changes Feb 16, 2017
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay to do… it saves 250 kB in the final executable, which is, like, not huge compared to the total output size, but still.

/cc @nodejs/python

@targos
Copy link
Member

targos commented Feb 16, 2017

We have at least one comment in lib that is here to help users though:

throw er; // Unhandled 'error' event

$ cat test.js
var EE = require('events');
var e = new EE();
e.emit('error', new Error('test'));

$ node test.js
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: test
    at Object.<anonymous> (/home/mzasso/test/test.js:3:17)
    at Module._compile (module.js:571:32)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:488:32)
    at tryModuleLoad (module.js:447:12)
    at Function.Module._load (module.js:439:3)
    at Module.runMain (module.js:605:10)
    at run (bootstrap_node.js:420:7)
    at startup (bootstrap_node.js:139:9)
    at bootstrap_node.js:535:3

@aqrln
Copy link
Contributor Author

aqrln commented Feb 16, 2017

@targos right, thanks for pointing that out. What if we introduce some kind of marker (like /// instead of //) to explicitly state that a comment should be kept?

@mscdex
Copy link
Contributor

mscdex commented Feb 16, 2017

'///' could be misinterpreted as a typo and isn't obvious it is intentional or why it was added.

@addaleax addaleax dismissed their stale review February 16, 2017 16:18

the “Unhandled 'error' event” thing should be sorted out first

@aqrln
Copy link
Contributor Author

aqrln commented Feb 16, 2017

@mscdex is your objection related to /// specifically or to the idea of adding some kind of special markers to comments in general?

tools/js2c.py Outdated
if char in ('\'', '"', '`'):
if char == string_mode and not escape_mode:
string_mode = None
elif not string_mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer explicit check, like elif string_mode is not None:. Thats more pythonic and readable as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is None :)
Thanks for your feedback, I've pushed a new commit.

@mscdex
Copy link
Contributor

mscdex commented Feb 16, 2017

@aqrln I'd definitely object to '///' or any similar variant, but I'm skeptical about generally having special markers as being a good solution.

In order to allow using Unicode characters inside comments of built-in
JavaScript libraries without forcing them to be stored as UTF-16 data in
Node's binary, update the tooling to strip comments during build
process. All line breaks are preserved so that line numbers in stack
traces aren't broken.

Refs: nodejs#11129
Refs: nodejs#11371 (comment)
@aqrln aqrln force-pushed the build-strip-comments branch from 9c4f699 to 6296a6c Compare February 16, 2017 17:33
@aqrln
Copy link
Contributor Author

aqrln commented Feb 16, 2017

@mscdex well, it indeed sounds like a little kludge, but I have no other ideas for now, though I'll try to come up with a better solution. I'll be grateful if you have any ideas about this.

@addaleax
Copy link
Member

Two alternative ideas:

  • Add a pragma comment format, like the eslint-disable comments that we have, e.g. /* keep-next-comment */ before the respective line.
  • Replace the comment by a statement containing a string instead of a comment, which has pretty much the same effect but won’t be stripped out by the tooling. We could add an actual comment to explain why we’re doing that. ;)

I like the latter one, but maybe only because it’s the easiest way…

@bnoordhuis
Copy link
Member

I'd say a much simpler solution is:

  1. Remove the Unicode diagram from lib/timers.js
  2. Add an eslint rule to enforce ASCII.
  3. (Optional) Add a small file to lib/internal for regression testing of Unicode sources.

The file for 3. could be just this:

// eslint-disable-next-line
module.exports = 'ç';

And the test:

// Flags: --expose_internals
// ...
assert.strictEqual(require('internal/test/unicode'), 'ç');

@addaleax
Copy link
Member

@bnoordhuis Does that mean you’d oppose a change like this?

We can do 3. anyway.

@bnoordhuis
Copy link
Member

'Oppose' is too strong a word, it just seems like over-engineering a fix for a simple problem.

I might change my mind if it turns out comments are a substantial chunk of the embedded sources but I don't think they are.

@aqrln
Copy link
Contributor Author

aqrln commented Feb 16, 2017

Not only we can do 3 anyway, we should have had it, IMO. #11423.

@Fishrock123
Copy link
Contributor

I'm not clear why we acre so much about this anyways. What is actually wrong today?

@addaleax
Copy link
Member

What is actually wrong today?

Not much… forbidding non-ASCII characters in comments just seems like a needless restriction, and it would be nice to keep the timers diagram as-is (the proper lines actually help with readability, imho).

(Aside: If English is your first language you might find it hard to imagine how much encountering “ASCII-only” validation rules feels like being shown the middle finger in general. :/ I do see that that might not be too relevant to Node core.)

I might change my mind if it turns out comments are a substantial chunk of the embedded sources but I don't think they are.

I mentioned it above, the difference is about 250 kB. That’s probably not “substantial”. 😄

@jasnell
Copy link
Member

jasnell commented Feb 16, 2017

I'm really not seeing the need for stripping comments OR having the linting rule for ASCII only. At this point I'm definitely -1 on this.

@bnoordhuis
Copy link
Member

having the linting rule for ASCII only

One possible reason, that you may or may not agree with, is that when you open lib/timers.js in a terminal window that isn't set to UTF-8, it looks like absolute garbage.

@jasnell
Copy link
Member

jasnell commented Feb 16, 2017

I'm fine with the idea of pulling out that large diagram in timers.js, it would likely be better suited for inclusion in a guide anyway. Even so, I'd be -1 on this.

@aqrln
Copy link
Contributor Author

aqrln commented Feb 17, 2017

There has been quite a much of discussion in #11371 and this PR for such a minor issue. Let me describe my point of view.

There are only two question that need to be considered for both PRs:

  1. Do we care about the size of the executable and the memory footprint in terms of tens and hundreds of kilobytes, or such overhead is negligible?
  2. Does anyone in the world actually use _third_party_main.js, and if yes, are there any consequences for their code?

I am -1 on not allowing to use Unicode characters at all (and, as an example, replacing a classy diagram in lib/timers.js with an incomprehensible mess of dashes and pipes), but I have no right to make the decision and it's up to collaborators anyway.

What about this PR, @Fishrock123's counterargument about comments being useful for debugging is very sensible, so after some thinking about it, I'd take @addaleax's idea about keep-next-comment directive and flip it so that the proposal is to keep all comments by default but strip those preceded by a special directive.

But if there's no consensus, I join the position that nothing should really be done here, except, maybe, enforcing ASCII-only characters outside comments so that the linter will catch if someone adds some kind of weird Unicode whitespace character.

@jasnell jasnell added the wip Issues and PRs that are still a work in progress. label Mar 22, 2017
@BridgeAR
Copy link
Member

@aqrln do you want to follow up on this? Right now I do not have the feeling that this will come to a conclusion that everyone likes to live with.

@aqrln aqrln added stalled Issues and PRs that are stalled. and removed wip Issues and PRs that are still a work in progress. labels Aug 26, 2017
@aqrln
Copy link
Contributor Author

aqrln commented Aug 26, 2017

@BridgeAR at this point I don't see enough evidence that it is something that we should do too, right. Let's close this for now so it doesn't clutter the backlog, we can always revisit it later if there is a need to do so.

@aqrln aqrln closed this Aug 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues and PRs related to build files or the CI. stalled Issues and PRs that are stalled.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants