Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: fix cpSync crash on utf characters #54653

Closed
wants to merge 1 commit into from

Conversation

jazelly
Copy link
Member

@jazelly jazelly commented Aug 30, 2024

Fixes: #54476
Refs: #54653 (comment)

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. labels Aug 30, 2024
Copy link

codecov bot commented Aug 30, 2024

Codecov Report

Attention: Patch coverage is 86.04651% with 6 lines in your changes missing coverage. Please review.

Project coverage is 88.04%. Comparing base (59c7c55) to head (32faca8).
Report is 481 commits behind head on main.

Files with missing lines Patch % Lines
src/node_file.cc 85.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #54653      +/-   ##
==========================================
+ Coverage   88.03%   88.04%   +0.01%     
==========================================
  Files         652      652              
  Lines      183763   183779      +16     
  Branches    35860    35865       +5     
==========================================
+ Hits       161774   161817      +43     
+ Misses      15237    15228       -9     
+ Partials     6752     6734      -18     
Files with missing lines Coverage Δ
src/util.h 90.16% <100.00%> (+0.24%) ⬆️
src/node_file.cc 77.04% <85.00%> (-0.06%) ⬇️

... and 37 files with indirect coverage changes

@jazelly jazelly force-pushed the fix-cpsync-utf-fail branch 3 times, most recently from 1ba6b97 to d78ea43 Compare August 30, 2024 08:29
@aduh95
Copy link
Contributor

aduh95 commented Aug 30, 2024

It looks like it's consistently failing to build on our macOS GHA CI

@jakecastelli
Copy link
Member

Possibly related to this

ld: write() failed errno=28

which indicates no space left on the device, I assume that is why it fails

@jakecastelli jakecastelli added the request-ci Add this label to start a Jenkins CI on a PR. label Aug 30, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Aug 30, 2024
@nodejs-github-bot

This comment was marked as outdated.

src/util.h Outdated Show resolved Hide resolved
@jazelly

This comment was marked as outdated.

@jazelly jazelly force-pushed the fix-cpsync-utf-fail branch 2 times, most recently from 8362525 to 52d0c3a Compare August 31, 2024 13:06
@jazelly jazelly force-pushed the fix-cpsync-utf-fail branch 7 times, most recently from fe5f947 to a09e59f Compare August 31, 2024 14:04
@jakecastelli jakecastelli added the request-ci Add this label to start a Jenkins CI on a PR. label Aug 31, 2024
@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
@jazelly jazelly marked this pull request as ready for review September 20, 2024 04:50
@jazelly
Copy link
Member Author

jazelly commented Sep 20, 2024

With the requested CI env, I can finally reproduce the segfaults.

The original reported issue
Chinese are UTF-8 encoded and each char is 3-byte encoded. 5 characters are 15 bytes in the reported case. Windows std::filesystem::path treats it as UTF-16 encoded, which is 2 bytes per char, and hence the segfault happened when constructing the std::filesystem::path for src or dest.

Simply constructing from u8string still has segfault (this was the one I couldn't reproduce locally, very strange)
The segfault comes from the later call of std::filesystem::path.string() on the path constructed from u8string. I don't particularly know why? But using u8string for all the error messages fixed it. Thanks @jakecastelli for requesting the CI environment. (sorry for dragging this too long).

cc @joyeecheung @anonrig

@joyeecheung
Copy link
Member

joyeecheung commented Sep 20, 2024

The segfault comes from the later call of std::filesystem::path.string() on the path constructed from u8string. I don't particularly know why? But using u8string for all the error messages fixed it.

This is because the error throwing functions convert the messages into JS strings - technically, it is expecting Latin1

OneByteString(isolate, message.c_str(), message.length()); \
which is also incorrect, but Latin1 has an overlapping set with UTF-8, and if you pass UTF8 data as Latin1 you are likely to see gibberish for non-ASCII characters but it won't crash, whereas UTF16 has fixed 2 char widths, so it would fare worse if you pass it as Latin1.

Copy link
Member

@joyeecheung joyeecheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I opened a separate PR to fix the decoding in error throwing utils #55024

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 20, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 20, 2024
@nodejs-github-bot
Copy link
Collaborator

nodejs-github-bot commented Sep 20, 2024

@jakecastelli jakecastelli added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Sep 20, 2024
jasnell pushed a commit that referenced this pull request Sep 21, 2024
PR-URL: #54653
Fixes: #54476
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@jasnell
Copy link
Member

jasnell commented Sep 21, 2024

Landed in 77ca5ca

@jasnell jasnell closed this Sep 21, 2024
targos pushed a commit that referenced this pull request Oct 4, 2024
PR-URL: #54653
Fixes: #54476
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@aduh95 aduh95 mentioned this pull request Oct 9, 2024
louwers pushed a commit to louwers/node that referenced this pull request Nov 2, 2024
PR-URL: nodejs#54653
Fixes: nodejs#54476
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@marco-ippolito marco-ippolito added the dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. label Nov 16, 2024
tpoisseau pushed a commit to tpoisseau/node that referenced this pull request Nov 21, 2024
PR-URL: nodejs#54653
Fixes: nodejs#54476
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v22.6.0 fs.cpSync crash
9 participants