Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix CI with platform-dependent string encoding #125

Closed
wants to merge 2 commits into from

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented May 15, 2024

This fixes the CI failure introduced in 8715d93, as there appears to be no platform-independent way to create an OsString (see links in docstring for OsString::from_encoded_bytes_unchecked()). This is causing a failure on master.

This also fixes an additional compile error in a test file that only shows up with specific features enabled.

@cosmicexplorer
Copy link
Contributor Author

cc @Pr0methean as I can't add reviewers (that's fine, I'm not asking for more permissions).

@cosmicexplorer
Copy link
Contributor Author

Ok, now I've tried actually using the platform-specific code to extract an OsString since apparently windows was failing even before that. I think I actually should be reverting a prior commit instead of inserting my own code, so let me try that now.

@cosmicexplorer
Copy link
Contributor Author

Hmmm, I actually think the change I've added in aef0bb6 is necessary, as master appears to have been broken since 8715d93. Please let me know if I'm mistaken -- I am extremely unfamiliar with handling windows paths and strings and I have no clue whether there's a more canonical approach that I'm missing.

@cosmicexplorer
Copy link
Contributor Author

I'm going to make another PR adding cfg-if as a dependency because the conditional compilation sections are now getting pretty hairy.

@cosmicexplorer
Copy link
Contributor Author

Just squashed and force pushed to clarify that no commits were reverted. Will update the description.

@cosmicexplorer cosmicexplorer force-pushed the revert-path-join branch 2 times, most recently from b551163 to bfac065 Compare May 15, 2024 16:46
src/read.rs Dismissed Show dismissed Hide dismissed
@cosmicexplorer cosmicexplorer changed the title Revert "chore: Move path join into platform-independent code" fix CI with platform-dependent string encoding May 15, 2024
Comment on lines +694 to +696
/* FIXME: what bug does this code fix?
* https://github.com/zip-rs/zip2/commit/8715d936cbd3e0fdc2e778b70d8a02e2dfc16fc2
* doesn't specify! */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* FIXME: what bug does this code fix?
* https://github.com/zip-rs/zip2/commit/8715d936cbd3e0fdc2e778b70d8a02e2dfc16fc2
* doesn't specify! */
// Extract symlinks as symlinks

The bug was that symlinks were extracted as normal files.

assert!(
chunks.remainder().is_empty(),
"windows utf-16 strings should be divisible by 2"
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't symlinks stored in UTF-8, given the general principle that file paths are Unix-style? The APPNOTE doesn't specifically address this.

);
let target_wide: Vec<u16> = chunks
.into_iter()
.map(|c| u16::from_le_bytes(c.try_into().unwrap()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will panic if a character code point is above 0xFFFF, because that has to be encoded in UTF-16 as a surrogate pair.

@Pr0methean
Copy link
Member

From the description of The byte encoding is an unspecified, platform-specific, self-synchronizing superset of UTF-8. By being a self-synchronizing superset of UTF-8, this encoding is also a superset of 7-bit ASCII.

Since UTF-16 isn't a superset of UTF-8, passing UTF-16 to it is wrong.

The byte encoding is an unspecified, platform-specific, self-synchronizing superset of UTF-8. By being a self-synchronizing superset of UTF-8, this encoding is also a superset of 7-bit ASCII.

@Pr0methean
Copy link
Member

Closing since https://github.com/zip-rs/zip2/actions/runs/9104341554/job/25028016801 shows that the issue is fixed.

@Pr0methean Pr0methean closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants