Skip to content
This repository has been archived by the owner on Jun 2, 2024. It is now read-only.

InvalidArchive("Could not find central directory end") when trying to decompress zip archive #183

Closed
moriartydev opened this issue Aug 19, 2020 · 13 comments
Labels

Comments

@moriartydev
Copy link

When I try to run code, I have this - thread 'main' panicked at 'called Result::unwrap() on an Err value: InvalidArchive("Could not find central directory end")', src/main.rs:35:23
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Distro: Manjaro Linux 20
P.S Zip archive can be opened with standard utility for package's decompression

@Plecra
Copy link
Member

Plecra commented Aug 20, 2020

Would you be able to share the ZIP file that is failing to parse? It'd be very helpful in reproducing the issue

@philippeitis
Copy link

I also ran into this issue when trying to open a zip file - it seems that opening with 7zip warns that Unexpected end of data, but it still manages to open and extract the files sucessfully, so the error seems to be legitimate.

However, it would be helpful if this error could be bypassed, as it seems like it might be a soft issue which might not affect all the files in the archive.

If you need a copy of the file, I'd be happy one via email.

@Plecra
Copy link
Member

Plecra commented Dec 14, 2020

@philippeitis that'd be very helpful 😃 Send it to the email on this account and I'll see what I can do

@bivald
Copy link

bivald commented Mar 19, 2021

I've ran into this as well, I'll see if I can make a reproducible copy (the file I'm doing is sensitive so not allowed to share)

@zamazan4ik zamazan4ik added the bug label Jan 24, 2022
@aweinstock314
Copy link
Contributor

I've written a reproducer for this issue (https://github.com/aweinstock314/zip/blob/7850805911ce98cd5987390b735c81a391f0d14c/tests/issue_183.rs). Info-sys unzip (which seems to be the ancestor of commonly-packaged unzip(1)'s) uses a search length of 66000 when searching for the end-of-central-directory record. This crate uses 65557 (https://github.com/zip-rs/zip/blob/4aafe04be6de871f31019cbce1866c47a8f27d70/src/spec.rs#L56), so archives whose EOCD record is between 65557 and 66000 bytes from the end will be unzipped by standard tools, but not by this crate. I'm not sure if the constant should be changed, or if the search length should be made parameterizable.

@zamazan4ik
Copy link
Contributor

Ouch. I think we need to recheck the ZIP spec for this behaviour. However, even if our current behaviour is correct from the spec perspective, I think we need to introduce a workaround since it will allow us to process possibly incorrect archives. Thanks for the reproducing!

@aweinstock314
Copy link
Contributor

The spec (or at least the PK-Zip documentation, which seems to be written as if it's a spec) doesn't seem to explicitly say that the central directory headers and end-of-central-directory-record have to be at the end. It implies via diagrams that the CDHs/EOCD should be placed after the last LFH, but in practice, info-sys unzip will handle archives where you place both of those before the first LFH, so long as the EOCD record is within the last 66000 bytes of the file, and the EOCD occurs after the last CDH (and the CDH's are consecutive, and the first CDH has nonzero offset, which can be accomplished with 1 null byte of padding).

The only constraint the spec seems to have is that A ZIP file MUST have only one "end of central directory record"..

@danielvschoor
Copy link

danielvschoor commented Feb 2, 2023

I'm also having this issue now. Is there any work-around, except having to recreate the zip?

Edit: to add more context - I ported a python app to Rust, and moved from Python's zipfile lib to zip-rs. The python lib can open these zip files without an issue(7zip shows a warning), but zip-rs returns an error.

@Plecra
Copy link
Member

Plecra commented Feb 2, 2023

You know what - if you can send me a repro of what you want to work, I'll fix it by this evening (UTC+0) :)

It's a bug in zip so we need to fix it here, really

@danielvschoor
Copy link

You know what - if you can send me a repro of what you want to work, I'll fix it by this evening (UTC+0) :)

It's a bug in zip so we need to fix it here, really

I've emailed you an example file, thanks

@Plecra
Copy link
Member

Plecra commented Feb 2, 2023

Argh ok, hm. Your issue is a little different @danielvschoor. The metadata is actually missing, and that zip file is technically corrupt. The zip utilities are using a recovery pass to be able to load them.

This is good! We've already got the right API to read these files with: you can use zip::read::read_zipfile_from_stream to iterate through each file in the corrupt archives, and read them just as you would normally.

(We could also start a GH discussion abt a ZipArchive::open_and_recover util that could try to detect these kinds of errors. Afraid I'm going to have to give that low priority though :))

Oh and to explain how to use it in this case 😃:

let mut archive = std::fs::File::open(std::env::args_os().nth(1).unwrap()).unwrap();
loop {
    let zipfile = zip::read::read_zipfile_from_stream(&mut archive).unwrap().unwrap();
    // That's a nice and readable `zipfile` right there.
    println!("{:?}", zipfile.name());
}

@danielvschoor
Copy link

read_zipfile_from_stream

Apologies, I'm not too well-versed with the internals of zip, so I assumed my issue was related to this one based on the error message.

Thanks for the workaround.

@Pr0methean
Copy link
Member

Commit zip-rs/zip2@5237543 will increase the ECD search window to 66,000 bytes (including the 22-byte header). I expect it to release in 1.2.1. If anyone still has concerns, please open a new issue at https://github.com/zip-rs/zip2/issues since this repo is no longer maintained.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

8 participants