-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new Intl.Segmenter().segment()
causes SEGFAULT with --with-intl=small-icu
and no runtime ICU data
#51752
Comments
Backtrace:
I don't know if this is a V8 bug or if we don't initialize it correctly. /cc @nodejs/i18n-api |
FWIW we strip out ICU break iterator data with small-icu Line 19 in bf39716
Lines 34 to 40 in bf39716
Also https://bugs.chromium.org/p/v8/issues/detail?id=3345 referenced node/tools/icu/icu-generic.gyp Lines 38 to 41 in bf39716
cc @srl295 |
In 2014 there wasn't an Intl.Segmenter.. probably these should be included. However, v8 shouldn't crash, that should be fixed in v8. We do have some code (IIRC) that detects small-icu and monkeypatched Intl with a warning about small icu. |
Apart from the PR for a regression test is someone also working towards fixing this? Will this be solved by adjusting the ICU configuration in NodeJS to include a locale-unaware fallback, must this be fixed in V8, or both? (In the case that actions are required upstream it would be great to link a ticket) |
sorry i did not take a look at this more yet a couple of different concerns:
|
If you point me to where this can be done (monorail?) I could do this though I am not sure which information to provide them so it might make more sense for some of you to do this.
I was thinking of a dummy implementation similar to the other i18n functionality (which to my knowledge uses either very simplistic/naive implementatios or ships only with the data required for the english locale. I.e. simply split grapheme s at utf8 boundaries, words at spaces/punctuation and sentences at a full stop, question/exclamation mark etc.
I have no clue about this |
we remove(d) it in some circumstances. Again i need to dig that out. |
Are you able to provide more information on where this currently sits in the backlog and when this might get resolved? |
I have not been able to look further into this myself. |
Version
v20.10.0
Platform
Linux acfbcf435a24 6.7.4-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Feb 5 22:21:14 UTC 2024 x86_64 GNU/Linux
Subsystem
Intl
What steps will reproduce the bug?
--with-intl=small-icu
.*new Intl.Segmenter().segment()
in the REPL.*In my case that was the nodejs package from Fedora 39 without nodejs-full-i18n. This can be obtained as follows:
podman run --rm -it fedora:39
dnf install -y --setopt=install_weak_deps=False nodejs
How often does it reproduce? Is there a required condition?
Always, as long as no runtime ICU data is present.
What is the expected behavior? Why is that the expected behavior?
NodeJS should fallback to a locale-unaware string separator, not provide
Intl.Segmenter
at all, or raise a JS exception.The status quo offers no possibility to know if segmentation is safe and calling it results in a segmentation fault and therefore instant termination of the process.
What do you see instead?
A segmentation fault
Additional information
Backtrace
On a Fedora 39 system one should be able to open the coredump with:
DEBUGINFOD_URLS=https://debuginfod.fedoraproject.org/ gdb /usr/bin/node-20 coredump
coredump.zip
The text was updated successfully, but these errors were encountered: