Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files extracted from zip are currupt #271

Closed
webwake opened this issue Mar 31, 2023 · 19 comments
Closed

Files extracted from zip are currupt #271

webwake opened this issue Mar 31, 2023 · 19 comments

Comments

@webwake
Copy link

webwake commented Mar 31, 2023

The extracted files are getting corrupts, this started to happen after upgrading from Node 18 to Node.js 19.8.1 on Windows 10 (haven't tested if broader issue)

Below is a code sample to reproduce the issue. Have incorporated the decompress library to demonstrate an non-corrupt file

const fs = require('fs');
const unzipper = require('unzipper');
const { https } = require('follow-redirects');
const crypto = require("crypto");

const decompress = require('decompress');

async function downloadBinariesFromRelease() {
    await new Promise((resolve, reject) => {
        fs.mkdirSync('unzipper', { recursive: true });
        const file = fs.createWriteStream('binaries.zip');
        console.log('Downloading Neutralinojs binaries..');
        https.get("https://github.com/neutralinojs/neutralinojs/releases/download/v4.10.0/neutralinojs-v4.10.0.zip", function (response) {
            response.pipe(file);
            response.on('end', () => {
                resolve();
            });
        });
    });
}

function fileHash(filename, algorithm = 'md5') {
    return new Promise((resolve, reject) => {
        // Algorithm depends on availability of OpenSSL on platform
        // Another algorithms: 'sha1', 'md5', 'sha256', 'sha512' ...
        let shasum = crypto.createHash(algorithm);
        try {
            let s = fs.ReadStream(filename)
            s.on('data', function (data) {
                shasum.update(data)
            })
            // making digest
            s.on('end', function () {
                const hash = shasum.digest('hex')
                return resolve(hash);
            })
        } catch (error) {
            return reject('calc fail');
        }
    });
}

async function extractWithUnzipperLibrary() {
    await new Promise((resolve, reject) => {
        console.log('Extracting using "unzipper" library')
        fs.createReadStream('binaries.zip')
            .pipe(unzipper.Extract({ path: './unzipper' }))
            .promise()
            .then(() => resolve())
            .catch((e) => reject(e));
    });
}

async function extractWithDecompressLibrary() {
    console.log('Extracting using "decompress" library')
    await decompress('binaries.zip', 'decompress');
}


async function main() {
    await downloadBinariesFromRelease();
    await extractWithUnzipperLibrary();
    await extractWithDecompressLibrary();
    

    console.log(`correct hash:    3bbb562a59a454534f0ced6c801ccdb7`);
    console.log(`unzipper hash:   ${await fileHash("unzipper/neutralino-win_x64.exe", "md5")}`);
    console.log(`decompress hash: ${await fileHash("decompress/neutralino-win_x64.exe", "md5")}`);
}

main();

@mvolfik
Copy link

mvolfik commented Apr 15, 2023

I have another code sample. Unzipping works when you send the zip file as one large chunk, but not when you slice it like this (I discovered this by piping the file from https download, so I suppose these are some transmission window sizes).

Code:

const fs = require("node:fs");
const unzipper = require("unzipper");

const data = fs.readFileSync("a.zip");
const extractor = unzipper.Extract({ path: "./test" });

extractor.write(Uint8Array.prototype.slice.call(data, 0, 1378));
extractor.write(Uint8Array.prototype.slice.call(data, 1378, 1378 * 2));
extractor.write(Uint8Array.prototype.slice.call(data, 1378 * 2, 1378 * 3));
extractor.write(Uint8Array.prototype.slice.call(data, 1378 * 3));

The zipfile: a.zip

As a result, the package.json in output is corrupted like this:

Corrupted file
 of an Apify actor.",
    "engines": {
        "node": ">=16.0.0"
    },
    "dependencies": {
        "apify": "^3.0.0",
        "crawlee": "^3.0.0"
    },
    "devDependencies": {
        "@apify/eslint-config-ts": "^0.2.3",
        "@apify/tsconfig": "^0.1.0",
        "@typescript-eslint/eslint-plugin": "^5.55.0",
        "@typescript-eslint/parser": "^5.55.0",
        "eslint": "^8.36.0",
        "ts-node": "^10.9.1",
        "typescript": "^4.9.5"
    },
    "scripts": {
        "start": "npm run start:dev",
        "start:prod": "node dist/main.js",
        "start:dev": "ts-node-esm -T src/main.ts",
        "build": "tsc",
        "lint": "eslint ./src --ext .ts",
        "lint:fix": "eslint ./src --ext .ts --fix",
        "test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1"
    },
    "author": "It's not you it's me",
    "license": "ISC"
}
{
    "name": "crawlee-cheerio-typescript",
    "version": "0.0.1",
    "type": "module",
    "description": "This is a boilerplate

Node: 18.16.0
unzipper: 0.10.11

sinedied added a commit to Azure-Samples/contoso-real-estate that referenced this issue Apr 24, 2023
sinedied added a commit to Azure-Samples/contoso-real-estate that referenced this issue Apr 24, 2023
anfibiacreativa pushed a commit to Azure-Samples/contoso-real-estate that referenced this issue Apr 24, 2023
idg10 added a commit to corvus-dotnet/Corvus.Testing that referenced this issue May 2, 2023
The build started failing due to changes in the Azure Build agent, as reported at actions/runner-images#7467

It appears that the underlying problem is a bug in the Node library being used to extract files from packages: ZJONSSON/node-unzipper#271

The problem was that this corrupted some of the files being unpacked. This meant that when we tried to run `func` we got errors complaining about an invalid executable.

Although it looks like the underlying issue has not yet been fixed, the comment at actions/runner-images#7467 (comment) suggests that a new build agent image has been created that will not run into this problem. (Presumably they've done something to avoid the problematic package.) This means we should no longer need our workaround.
idg10 added a commit to corvus-dotnet/Corvus.Testing that referenced this issue May 2, 2023
The build started failing due to changes in the Azure Build agent, as reported at actions/runner-images#7467

It appears that the underlying problem is a bug in the Node library being used to extract files from packages: ZJONSSON/node-unzipper#271

The problem was that this corrupted some of the files being unpacked. This meant that when we tried to run `func` we got errors complaining about an invalid executable.

Although it looks like the underlying issue has not yet been fixed, the comment at actions/runner-images#7467 (comment) suggests that a new build agent image has been created that will not run into this problem. (Presumably they've done something to avoid the problematic package.) This means we should no longer need our workaround.
uguy pushed a commit to bonitasoft/bonita-rest-documentation-site that referenced this issue May 9, 2023
The existing lib was erratic with node 18. Using `adm-zip` instead.

Relates to ZJONSSON/node-unzipper#271
@matej-marcisovsky
Copy link

Downgrading to 18.14.0 fixed issues for me.

@didaquis
Copy link

didaquis commented May 22, 2023

Same situation for me after update Node 16 to Node 18.16.0.
If I update from Node 16 to Node 18.15.0 there are not errors!

This is the Node changelog for Node 18.16. https://github.com/nodejs/node/releases/tag/v18.16.0

@ZJONSSON can you help me to identify the cause, please?

@didaquis
Copy link

Hi @mvolfik and @webwake.

Can either of you confirm what I say in my previous comment? In my case using Node 18.15.0 no error occurs. However, the files are corrupted if Node 18.16.0 is used.

HighCommander4 pushed a commit to clangd/node-clangd that referenced this issue Sep 25, 2023
@ericman314
Copy link

The problem appears to be that unzipper is putting the first block of code in the wrong place.

Same thing is happening on Node 20.7.0, using unzipper version 0.10.14. In my case it's usually blocks of 2^14 bytes that are being swapped, but it also happens with files much smaller than this.

Would love to see this fixed, as Node 16 has reached end-of-life. There are alternatives, but I really like unzipper's api best, especially the ability to stream archived data without loading it into a buffer first. Very cool!

@pixartist
Copy link

Very simple application here, all data corrupted by this library

dankeboy36 pushed a commit to dankeboy36/install-from-gh-to-vscode that referenced this issue Dec 6, 2023
@sovcik
Copy link

sovcik commented Dec 12, 2023

Same issue here.

@wfairclough
Copy link

Also same issue, if I didn't find this page I would have wasted many more hours debugging.

@Lulalaby
Copy link

CR v20.10.0

@DaCao
Copy link

DaCao commented Dec 29, 2023

same issue here. all data corrupted by this library.
Wasted days on this....

please fix it.

@kenotron
Copy link

https://www.npmjs.com/package/yauzl#no-streaming-unzip-api - wondering if that's the reason?

@samerkassem82
Copy link

So is this library dead? Same issas as almost a year later

@dy-dx
Copy link

dy-dx commented Mar 26, 2024

The issue has been fixed in the following node.js versions:

But for anyone unable to upgrade, you'll need to switch to a different unzipper library until this gets addressed: #261

@ZJONSSON
Copy link
Owner

ZJONSSON commented Jun 8, 2024

Thanks @dy-dx for identifying the underlying issue as a bug with fs.createWriteStream in nodejs. This has now been fixed in node per #271 (comment)

As far as extract goes we have moved from unmaintained fstream to fs-extra in a newly published version (v0.12.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests