Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit checking in enough source to build "small-icu" into the node repo #3476

Closed
srl295 opened this issue Oct 21, 2015 · 26 comments
Closed
Assignees
Labels
i18n-api Issues and PRs related to the i18n implementation.

Comments

@srl295
Copy link
Member

srl295 commented Oct 21, 2015

from discussion on #3460 and others

Theoretically, a stripped down ICU source base might be 23M on disk / 4.3M compressed - this is including the 1.5M "small ICU" (i.e. English only) .dat file in lieu of the 25M one which usually comes in an ICU tarball.

Investigate the actual size needed here. This would include full functionality, just not full data. If you want node with full data you would still be able to: (a) download a full ICU tarball and build with full-icu as an option (just as you can do today) or (b) provide the full .dat file at runtime (just as you can do today)

So the upshot of this change would be that we could make small-icu the configure default without any download needed, and a small impact to the repo (probably checking in source+blob, otherwise checking in a small tarball).

cc @trevnorris @bnoordhuis @jasnell

@srl295 srl295 added the i18n-api Issues and PRs related to the i18n implementation. label Oct 21, 2015
@jasnell
Copy link
Member

jasnell commented Oct 21, 2015

+1 ... with the caveat that we need to investigate how this impacts building on the smaller devices (like rpi). If the impact building icu by default on those devices is significant enough, then we'll need a configure option for optionally turning off the ICU build and set that as the default of CI runs only on the rpi devices.

@srl295
Copy link
Member Author

srl295 commented Oct 21, 2015

@jasnell you could still use --with-intl=none and get today's configure behavior. Should be noted here as well.

@trevnorris
Copy link
Contributor

I'm alright further investigating this approach. The 1.5MB dat file would only be updated every six months or so would have less impact then updating v8 does.

@jasnell
Copy link
Member

jasnell commented Oct 21, 2015

I would add that I'd definitely prefer the source to be checked in directly rather than as a tar ball, even if it does mean the full 23 M. The tarball idea just feels dirty and reduces visibility (well, it's no less visible than what we have now, but ... you know). With changes coming only twice per year, the hit (shouldn't be / won't be) too bad.

@trevnorris
Copy link
Contributor

The source sounds fine. v8 uses ~98MB and openssl uses ~43MB.

@srl295 srl295 self-assigned this Oct 21, 2015
@srl295
Copy link
Member Author

srl295 commented Oct 22, 2015

@silverwind asked

BTW, is building with full-icu by default out of question?

First of all, I'd certainly like that, personally.

Second, "Download-by-default" is out of the question, so to do full-icu by default probably means checking in ICU + full data- which would be something like ~45M on disk (source files) and 13M compressed (delta to the nodejs tarball). The "" has to do with how much of the tooling needs to be stripped out or left ( some might be needed to do an endian swap or to rebuild small vs full data etc.)

I basically have the action here to actually try out these options and report back the deltas.

@silverwind
Copy link
Contributor

@srl295 Does having full-icu packaged have any impact on performance or memory, or is it all loaded on-demand?

@srl295
Copy link
Member Author

srl295 commented Oct 22, 2015

@silverwind It's all demand paged. That's why there's a separate big and little endian version.

@silverwind
Copy link
Contributor

Sounds good, let's see some deltas 😉

@MylesBorins
Copy link
Contributor

is there any progress on this?

@srl295
Copy link
Member Author

srl295 commented Feb 16, 2016 via email

@srl295
Copy link
Member Author

srl295 commented Feb 16, 2016 via email

@jasnell
Copy link
Member

jasnell commented Feb 16, 2016

+1... it's a reasonable size.

@srl295
Copy link
Member Author

srl295 commented Feb 17, 2016

OK- checked in on my branch https://github.com/srl295/node/commits/withicu
I can open a PR but it's not quite to that stage yet. Need to clean up .gitignore

If anyone would like to try this out please do.

Thinking that I want to still allow people to customize which ICU they use - so may need to parameterize deps/icu throughout configure and the .gyp files.

@srl295
Copy link
Member Author

srl295 commented Feb 17, 2016

node_deps

Here's what deps looks like in this branch on daisydisk. Blue is ICU- the gray bar on the far left is the ICU data file. The yellow area is v8's test cases.

@srl295
Copy link
Member Author

srl295 commented Feb 17, 2016

I'll keep slicing away at it— but like to get a +1 (thanks James - @nodejs/ctc ?) for

  • checking in the above ^ code - perhaps as deps/icu4c?
  • making small-icu the default for ./configure (keeping none as an option)
  • continuing to honor --with-icu-* parameters for users that want to build with a specific version
  • remove code that downloads ICU from any special URL (unless the user provides one).

One idea might be to check ICU in as deps/icu4c (which is it's full-er abbreviation) so that users who might already be used to rm -rf deps/icu don't wind up with an unbuildable node.

@bnoordhuis
Copy link
Member

Seems like a good approach to me.

@orangemocha
Copy link
Contributor

Sounds reasonable to me. cc @nodejs/build

remove code that downloads ICU from any special URL (unless the user provides one).

We won't need to download ICU in CI anymore if it becomes part of the repo, will we?

@srl295
Copy link
Member Author

srl295 commented Feb 17, 2016

@orangemocha
Correct. No download will be needed. However, one could run ./configure --with-icu-source=https://somethingsomething.example.com/icu.tgz and that would still work.

@rvagg
Copy link
Member

rvagg commented Feb 18, 2016

@srl295 as you chip away at things to make it smaller it's going to complicate the process of landing updates isn't it? If that's the case then you may want to follow the approach that @shigeki has taken with the OpenSSL updates by documenting that needs to be done. I don't recall where this landed but I know he started it somewhere and @thealphanerd used it to upgrade one of the branches in the last round. (can one of you point us to the doc please?).

@MylesBorins
Copy link
Contributor

@srl295
Copy link
Member Author

srl295 commented Feb 18, 2016

@rvagg Yes. Documenting how to do an update is a todo of mine - nodejs/Intl#16

Right now what I am doing is using a python script to do the 'chip away', so it's repeatable.
As of srl295@632add4 :

To update ICU: (updates deps/icu-small )

  • start with a clean node build
  • ./configure --with-intl=small-icu --with-icu-source=http://download.icu-project.org/files/icu4c/56.1/icu4c-56_1-src.zip && make
  • python tools/icu/shrink-icu-src.py

Then, to build with this prepackaged small ICU: (idea is this will be the default, I haven't wired it up so yet):

  • ./configure --with-intl=small-icu --with-icu-source=deps/icu-small

@rvagg rvagg removed the ctc-agenda label Feb 24, 2016
@rvagg
Copy link
Member

rvagg commented Feb 24, 2016

Removed ctc-agenda label, we should try and finish this out on github if possible

@MylesBorins
Copy link
Contributor

@srl295 where are we on this?

@srl295
Copy link
Member Author

srl295 commented Apr 5, 2016

@thealphanerd depends somewhat on #6058 to reduce churn. thanks for the ping…

srl295 added a commit to srl295/node that referenced this issue May 3, 2016
* this commit has "small" ICU 57.1.
See other commit for tools to generate this commit.

Fixes: nodejs#3476
srl295 added a commit to srl295/node that referenced this issue May 3, 2016
…-icu

* Change configure default to "small-icu" (Intl on, English only)
* add "--without-intl" and "vcbuild without-intl" options, equivalent
to --with-intl=none
* update BUILDING.md with above changes
* Checks in tools that generate the deps/icu-small source directory
from ICU source
* Tools and process for updating ICU documented in tools/icu/README.md

nodejs#3476
@srl295 srl295 closed this as completed in 2bbd1cd May 4, 2016
srl295 added a commit that referenced this issue May 4, 2016
…-icu

* Change configure default to "small-icu" (Intl on, English only)
* add "--without-intl" and "vcbuild without-intl" options, equivalent
to --with-intl=none
* update BUILDING.md with above changes
* Checks in tools that generate the deps/icu-small source directory
from ICU source
* Tools and process for updating ICU documented in tools/icu/README.md

Fixes: #3476
PR-URL: #6088
Reviewed-By: James M Snell <jasnell@gmail.com>
evanlucas pushed a commit that referenced this issue May 17, 2016
…-icu

* Change configure default to "small-icu" (Intl on, English only)
* add "--without-intl" and "vcbuild without-intl" options, equivalent
to --with-intl=none
* update BUILDING.md with above changes
* Checks in tools that generate the deps/icu-small source directory
from ICU source
* Tools and process for updating ICU documented in tools/icu/README.md

Fixes: #3476
PR-URL: #6088
Reviewed-By: James M Snell <jasnell@gmail.com>
evanlucas pushed a commit that referenced this issue May 17, 2016
* this commit has "small" ICU 57.1.
See other related commit for tools to generate this commit.

Fixes: #3476
PR-URL: #6088
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-api Issues and PRs related to the i18n implementation.
Projects
None yet
Development

No branches or pull requests

8 participants