-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect decoding of file names with some zip files #403
Comments
An entry in a zip file has a flag set in its header data which defines if the file name of this entry is UTF-8 encoded or not. Any tool that creates a zip file has to set this flag if it uses utf-8 to encode the file name. In this case, I think the tool that created the zip did not set this flag even though it used utf-8 to encode the file name. The difference you see between zip4j and apache compress is that, if this flag is not set, zip4j uses a zip spec standard charset by default, whereas apache compress uses utf-8 even if the flag is not set. Technically speaking, this is not as per the specification, but I think I tend to agree with the apache compress solution to use utf-8 even if the flag is not set. I will change this in zip4j and include it in the next release. |
Thanks. Once this hits maven central I'll give it another try. |
Fixed in v2.10.0 released today |
@jimfcarroll I am reverting the change I did here because this was having some side effects with zip files that use zip standard charset as reported in this issue. The change I did as part of this issue was to use utf8 by default in zip4j. But, and I am contradicting my statement from earlier comment in this issue, this is not as per the zip specification. Zip specification states to use zip standard charset if utf8 flag is not set. In your case, if you are sure your zip files use utf8 encoding, you can force zip4j to use utf8 with |
I have a zip file where zip4j doesn't decode the file names correctly while all of the command line utilities and Apache's VFS2 do. From the contents of the zip file it looks like it was made on a Mac. Here is the output of test code I wrote. The first 2 lines are using Apache VFS2. The second 2 lines are from using Zip4j's
ZipInputStream
The file is 700MB so I can't really attach it here. I tried to create a smaller file by unzipping the archive and rezipping only that file but Zip4j worked fine on the rezipped file.
I saw issue #304 which I'm not sure is related. In any case I don't have control over the zip files I'm unzipping and so I don't know the encoding beforehand.
For completeness the following is my test code. It prints out the name of the 184th entry in the zip file:
The text was updated successfully, but these errors were encountered: