-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-48721: [C++] Add test for UTF-8 filenames on Windows #48722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
| // Test that file operations work with valid UTF-8 filenames. | ||
| // On Windows, PlatformFilename::FromString() converts UTF-8 strings to wide strings. | ||
| // On Unix, filenames are treated as opaque byte strings. | ||
| std::string utf8_file_name = "test_file_한국어_😀.txt"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
한국어 is "Korean" in Korean FYI .. :-)..
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks contributing this @HyukjinKwon . Looks good in general, just one comment.
Also, can you please trim down the PR description?
| } | ||
|
|
||
| // TODO add a test with a valid utf-8 filename | ||
| TEST_F(TestFileOutputStream, FileNameValidUtf8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why test this only on Windows? Let's test this on all platforms.
Rationale for this change
4937d9f (ARROW-5102) added the TODO comment requesting a test with valid UTF-8 filenames. Later, the UTF-8 to UTF-16 conversion logic on Windows was introduced in commit eb23ea9 (ARROW-5648) which should fix the issue.
Essentially we should add a test:
"test_file_한국어_😀.txt"FileOutputStream::Open()converts UTF-8 to UTF-16 via the following call chain:arrow/cpp/src/arrow/io/file.cc
Lines 359 to 364 in 727106f
FileOutputStream::Open(path))arrow/cpp/src/arrow/io/file.cc
Lines 186 to 188 in 727106f
OSFile::SetFileName(path))arrow/cpp/src/arrow/util/io_util.cc
Lines 604 to 608 in 727106f
PlatformFilename::FromString())arrow/cpp/src/arrow/util/io_util.cc
Lines 143 to 149 in 727106f
StringToNative())arrow/cpp/src/arrow/util/utf8.h
Line 34 in 727106f
UTF8ToWideString())This test complements existing
FileNameWideCharConversionRangeExceptiontest (invalid UTF-8):arrow/cpp/src/arrow/io/file_test.cc
Lines 111 to 116 in 727106f
Why Windows-only?
#if defined(_MSC_VER)because Windows usesNativePathString = std::wstring(UTF-16):arrow/cpp/src/arrow/util/io_util.h
Line 48 in 727106f
arrow/cpp/src/arrow/util/io_util.h
Lines 44 to 51 in 727106f
What changes are included in this PR?
This PR adds the test described above.
Are these changes tested?
Unittest was added.
Are there any user-facing changes?
No, test-only.