Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupt ZIP archive when streaming in 200+ files #131

Open
nick-george opened this issue Jun 23, 2022 · 3 comments
Open

Corrupt ZIP archive when streaming in 200+ files #131

nick-george opened this issue Jun 23, 2022 · 3 comments
Assignees

Comments

@nick-george
Copy link

Hi there,

Note that I've created this same issue on the "archiver" project here: archiverjs/node-archiver#602

I strongly suspect that the issue discussed below is an issue with this library as opposed to "archiver". I have tried using its "TAR" output and am not experiencing any issues.

I've been troubleshooting an issue where archiver appears to be generating corrupt archives. Using version 5.3.1 on node v16.13.1.

We're streaming in files that have been retrieved from ssh2-sftp-client.

This library seems to work fine with very large archives built from a few large files. It also seems to work fine for archives up to 199 files. However, when I have 200 files or more, the archive gets corrupted. By diffing the hexdumps of one archive that has 199 files and another that has 200, I can see the archive with 200 files is missing the "End of central directory record" (EOCD). See below for the bytes that are missing from my archive with 200 files (note the first four bytes below are the last part of the last filename in the archive).

0039b9b0 65 2e 70 70 50 4b 05 06 00 00 00 00 c6 00 c6 00 |e.ppPK..........|
0039b9c0 f4 3c 00 00 c0 7c 39 00 00 00 |.<...|9...|
0039b9ca
Otherwise, the generated files are pretty much identical (except for one less file being present in the "good" archive).

Are you aware of any file count limit for this library?

Many thanks,
Nick

@nick-george
Copy link
Author

Oh whoops, I see this project and archiver are part of the same organisation. Apologies for the duplication. I'm happy to close this ticket in whatever repo you think is less appropriate.

@ctalkington ctalkington self-assigned this Sep 4, 2023
@Veragin
Copy link

Veragin commented Oct 4, 2023

Hi I have still the same problem, if I am creating zip archive with lots of small files, very often it generates corrupted archive

some data are missing, cause i compared the size of the corrupted zip with the correct one (i got luckily from successful run)

I was trying to zip around 9000 images and i was missing like 700kb of 50MB of the final zip archive ... but even only (600 images => 4MB) did not work

The corrupted zip file is openable with 7zip application on windows, but others software cant handle the zip

EDIT:
Found cause of the problem, zip.finish() is probably async somewhere inside so it needs some time to write the data
if I wait like 5s after the finish() is called everything is fine

@doriancollier
Copy link

doriancollier commented Apr 14, 2024

Found cause of the problem, zip.finish() is probably async somewhere inside so it needs some time to write the data
if I wait like 5s after the finish() is called everything is fine

I had the same issue, and adding a 5s wait after finish() did help.

I kept experimenting and found out that the file isn't fully written until the output "close" event fires, which happens after finish(). To handle this, I wrapped all of my zip code in a promise that resolves after the "close" event.

Here's my final function...

`export async function zipDirectory(
directoryToZip: string,
zipFilePath: string,
): Promise {
const archive = archiver('zip', { zlib: { level: 9 } });
const tempRootDir = await files.getTempRootDir();

const finalDirectoryToZip = `${tempRootDir}/${directoryToZip}`;
const finalZipFilePath = `${tempRootDir}/${zipFilePath}`;
const output = fs.createWriteStream(finalZipFilePath);

return new Promise((resolve, reject) => {
    output.on('close', () => {
        resolve(zipFilePath); // Resolve with the zipFilePath
    });

    output.on('end', () => {
        //logger.trace('Data has been drained');
    });

    archive.on('warning', (err) => {
        if (err.code === 'ENOENT') {
            logger.warn(`File not found: ${err}`);
        } else {
            logger.error(`Archiver warning: ${err}`);
            reject(err);
        }
    });

    archive.on('error', (err) => {
        logger.error(`Archiver error: ${err}`);
        reject(err);
    });

    archive.pipe(output);
    archive.directory(finalDirectoryToZip, false);
    archive.finalize();
});

}
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants