-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unzip-like Unicode support? #125
Comments
zziplib does scan the zip directory ahead of time, so it does not seem to be very complicated to guide the processing function to do a re-encoding. It was simply not needed so far - and I don't have time implement such a thing. |
Without delving into encoding conversion (via ICU lib) and only relying on the Extra Field (subfield 0x7075) which is already in UTF8 if available, should the zziplib api be duplicated into utf8 aware functions if implemented? |
Well, if you really want to be correct then you need to consider that file system functions expect the arguments for file names to be in the encoding that is used on that file system. Just disregarding the encoding was an easy approach. Since unix-ish system have switched to UTF8 more than a decade ago, it did work as expected. I don't know of any existing file API that differentiates between native and utf8 encoding - instead we see that a derived API based on wchar_t has been developed, and with using POSIX mbstowcs and friends there is a standard API which can do the conversion (without UCI libs). I guess that is too much of an effort to be put into zziplib. So may be a compile-switch is the only thing that could be put into the current design - forcing the arguments of the functions to be utf8 no matter what the operating system uses elsewhere. The testsuite could be adapted to that, handing over arguments in utf8 as well. Depending on the project it may help - but it would not go out into standard packages in shared libraries. |
If the filenames of a zip file are in a different encoding, attempting to interact with them with zziplib results in mojibake or invalid byte sequences if the "current system is on a different encoding".
unzip can extract them fine without any extra options but with
unzip -UU -O UTF8
it results in the same mojibake seen with zziplib. Inspecting the zip with a hex editor it looks like the zip file actually stores 2 filenames, one in UTF and another in the original system encoding.Is there anything that can be done in zziplib to pick the UTF8 filename instead?
The text was updated successfully, but these errors were encountered: