Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GzDecoder seem decode incorrect #66

Closed
axetroy opened this issue Mar 14, 2022 · 9 comments
Closed

GzDecoder seem decode incorrect #66

axetroy opened this issue Mar 14, 2022 · 9 comments

Comments

@axetroy
Copy link

axetroy commented Mar 14, 2022

let tar_file = File::open(&tar_file_path)?;
        let input = GzDecoder::new(&tar_file)?;
        let mut archive = Archive::new(input);

        archive.set_unpack_xattrs(true);
        archive.set_overwrite(true);
        archive.set_preserve_permissions(true);
        archive.set_preserve_mtime(true);

        let files = archive.entries()?;

        for entry in files {
            let mut file = entry?;

            let file_path = file.path()?;

            if let Some(file_name) = file_path.file_name() {
                if file_name.to_str().unwrap() == extract_file_name {
                    binary_found = true;
                    file.unpack(&output_file_path)?;
                    break;
                }
            }
        }

test file: https://github.com/axetroy/prune.rs/releases/download/v0.1.1/prune_darwin_amd64.tar.gz

The origin file size is : 985,384
The unzip file size is : 965,416

I have tested Tar, he works fine

@sile
Copy link
Owner

sile commented Mar 14, 2022

Thank you for reporting this issue.
However, I could not reproduce your problem.

I wrote the following code that deflates the above input file then prints the original size:

fn main() -> anyhow::Result<()> {
    let tar_file = std::fs::File::open("prune_darwin_amd64.tar.gz")?;
    let mut input = libflate::gzip::Decoder::new(&tar_file)?;
    let mut output = Vec::new();
    std::io::copy(&mut input, &mut output)?;

    println!("Deflated size: {}", output.len());
    Ok(())
}

// The comment below is the output of this command.
// Deflated size: 968192

Then, I applied gzip command to deflate the file then confirmed the result file size by using ls:

$ gzip -d prune_darwin_amd64.tar.gz
$ ls -l
-rw-r--r-- 1 user user 968192 Feb 28 02:19 prune_darwin_amd64.tar

So the two results seem matched.

@sile
Copy link
Owner

sile commented Mar 14, 2022

BTW, I could not run the code snippet you shared as I don't know where the Archive struct (or enum?) comes from.

@axetroy
Copy link
Author

axetroy commented Mar 14, 2022

Here is the source code https://github.com/axetroy/cask.rs/blob/main/src/extractor.rs

I tried to give a minimum implementation for reproduce

@axetroy
Copy link
Author

axetroy commented Mar 14, 2022

@sile Hello, Thanks for your help.

And here is the reproduced repo: https://github.com/axetroy/libflate-66

git clone https://github.com/axetroy/libflate-66
cd ./libflate-66
cargo run ./

# View unzipped files ’prune‘

@sile
Copy link
Owner

sile commented Mar 14, 2022

Thank you for the additional information. I could reproduce your result.

Then, I modified the code to use the file ("prune_darwin_amd64.tar") directly deflated by gzip command as follows:

fn main() -> anyhow::Result<()> {
    let extract_file_name = "prune";
    let output_file_path = "output";
    let mut archive = Archive::new(std::fs::File::open("prune_darwin_amd64.tar")?);

    archive.set_unpack_xattrs(true);
    archive.set_overwrite(true);
    archive.set_preserve_permissions(true);
    archive.set_preserve_mtime(true);

    let files = archive.entries()?;

    for entry in files {
        let mut file = entry?;

        let file_path = file.path()?;

        if let Some(file_name) = file_path.file_name() {
            dbg!(&file_name);
            if file_name.to_str().unwrap() == extract_file_name {
                file.unpack(&output_file_path)?;
                print!("unpacked");
                break;
            }
        }
    }
    Ok(())
}

The result was unchanged (i.e., the unpacked file size was 965,416). Thus this is not a problem relevant to libflate I think.

@axetroy
Copy link
Author

axetroy commented Mar 14, 2022

This makes me confused

I use GZ to compress the file, then decompress it, the result is correct.
I use the TAR archive, then decompress it, the result is correct.

But I combined them, the result is incorrect.

But installing the unzipping tools installed on my computer, everything is fine

@sile
Copy link
Owner

sile commented Mar 14, 2022

FYI, Python3's tar library could handle the (already deflated) input tar file correctly.

>>> import tarfile
>>> tar = tarfile.open("prune_darwin_amd64.tar")
>>> tar.getmember("prune").size
985384

@axetroy
Copy link
Author

axetroy commented Mar 14, 2022

OK, it should be the difference between TAR implementation.

Thanks for your help and time. Have a good day 👍

@axetroy axetroy closed this as completed Mar 14, 2022
@sile
Copy link
Owner

sile commented Mar 14, 2022

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants