Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select OEM/ANSI code page according to system locale setting #36

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

unxed
Copy link

@unxed unxed commented Jun 7, 2024

@LupinThidr
Copy link

LupinThidr commented Jun 22, 2024

Hello, thank you for this. I've seen your progress across multiple issue trackers regarding Linux and 7zip path encoding.
I ran across your comments while trying to diagnose 7zip unable to decode Shift-JIS (CP932) encoded paths in LZH files
I had thought that this patch would apply to those, but it seems ZipItem is specific to the .zip handler.
Other software such as unar don't fully support the .LZH spec as 7zip does.
I think your iconv conversion should be implemented as a separate function in StringConvert, so it can be easily used with other classes.
I was able to make LzhHandler use it rather the standard MultiByteToUnicodeString, but I don't completely understand the 7zip codebase, so I've hardcoded it to CP932 rather than the a mcp argument.

void MultiByteToUnicodeString3_iconv(UString &res, const AString &s)
{
  res.Empty();
  if (s.IsEmpty())
    return;

  iconv_t cd;
  if ((cd = iconv_open("UTF-8", "CP932")) != (iconv_t)-1) {

    AString sUtf8;

    unsigned slen = s.Len();
    char* src = s.Ptr_non_const();

    unsigned dlen = slen * 4 + 1; // (source length * 4) + null termination
    char* dst = sUtf8.GetBuf_SetEnd(dlen);
    const char* dstStart = dst;

    memset(dst, 0, dlen);

    size_t slen_size_t = static_cast<size_t>(slen);
    size_t dlen_size_t = static_cast<size_t>(dlen);
    size_t done = iconv(cd, &src, &slen_size_t, &dst, &dlen_size_t);

    if (done == (size_t)-1) {
      iconv_close(cd);

      // iconv failed. Falling back to default behavior
      MultiByteToUnicodeString2(res, s, 932);
      return;
    }

    // Null-terminate the result
    *dst = '\0';

    iconv_close(cd);

    AString sUtf8CorrectLength;
    size_t dstCorrectLength = dst - dstStart;
    sUtf8CorrectLength.SetFrom(sUtf8, static_cast<unsigned>(dstCorrectLength));
    if (ConvertUTF8ToUnicode(sUtf8CorrectLength, res) /*|| ignore_Utf8_Errors*/)
      return;
  }

}

and then in LzhHandler's GetProperty:

UString dst;
MultiByteToUnicodeString3_iconv(dst, item.GetName());

UString s = NItemName::WinPathToOsPath(dst);

If you'd like to try implementing it for LZH, here's a sample file and the current / expected output of 7zz l
https://archive.org/download/narcissu/na_sabun.lzh

2007-06-03 22:25:18 .....          537          392  na_sabun/▒C▒▒▒▒▒▒.txt
2007-06-03 22:25:18 .....          537          392  na_sabun/修正差分.txt

(I noticed the Debian package maintainer for 7zz is Japanese, so he may have some experience)

@unxed
Copy link
Author

unxed commented Sep 1, 2024

I don't completely understand the 7zip codebase

Unfortunately, I also don't understand the 7-zip codebase well enough. I'm afraid we need Igor Pavlov to tell us how to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants