Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed zip file generated #666

Closed
aboryczko opened this issue May 26, 2022 · 13 comments
Closed

Malformed zip file generated #666

aboryczko opened this issue May 26, 2022 · 13 comments

Comments

@aboryczko
Copy link

aboryczko commented May 26, 2022

Hi,

I'm using salvois/LargeXlsx library for generating Excel files (which are essentialy zip files) that uses this library.
When I try to open a generated file in Excel it says that it's corrupt. If I try to open the file with any unzip tool there are no errors. When I use the built-in ZipArchive class from .NET Core everything works as expected.
I've dug a little deeper into the files generated by this library and the .NET one I see that every 16 kB I get 5 bytes inputed into the output stream.
I don't know if this is some special marker but Excel doesn't recognize it and it messes up the content for it i.e. instead of 's="0"' i have 's="0 ý��€"'.

a simple way to reproduce it:

var ms = new MemoryStream();
var zw = new ZipWriter(ms, new ZipWriterOptions(compressionType: CompressionType.Deflate) { DeflateCompressionLevel = CompressionLevel.None });
var payload = new string('\n', 100000);
using (var stream = zw.WriteToStream("test.txt", new ZipWriterEntryOptions()))
using (var streamWriter = new StreamWriter(stream: stream))
{
    streamWriter.Write(value: payload);
}

sequences are at offsets: 38, 32811, 65584,98357

Linking salvois/LargeXlsx#9

@Erior
Copy link
Contributor

Erior commented Jun 16, 2022

I think you confuse CompressionType.None and CompressionType.Deflate , you will have extra information with CompressionType.Deflate, it is instructions for inflate to "copy the next 7fff bytes as is" and as said, it will repeat every 0x7fff
However, it is very uncommon to use DeflateCompressionLevel.None, many libraries limits you to atleast Level1 when you choose Deflate.
If you really do not want to compress, you usually choose CompressionType.None, if you want to use Deflate, it is better to choose Level1 as minimum.

@aboryczko
Copy link
Author

I doesn't matter which level I choose, the issue is the same Excel recognizes the file as corrupt. I've used none in the examples because it is easier to find which characters are causing the issue.

@Erior
Copy link
Contributor

Erior commented Jun 17, 2022

Is there any tests in LargeXlsx when dumped to disk that shows the problem, could help me with understanding the problem/corruption, I guess right now I'm totally missing it.

@aboryczko
Copy link
Author

something in the lines of:

static void Main()
{
    using var file = File.Create(@"C:\Temp\test.xlsx");
    using var xlsxWriter = new LargeXlsx.XlsxWriter(new NotSeekableWrapper(file));
    xlsxWriter.BeginWorksheet("Sheet 1");
    for (int i = 0; i < 1000; i++)
    {
        xlsxWriter.BeginRow();
        xlsxWriter.Write(i);
        xlsxWriter.Write($"Value 0{i}");
        xlsxWriter.Write($"Value 1{i}");
        xlsxWriter.Write($"Value 2{i}");
        xlsxWriter.Write($"Value 3{i}");
    }    
}

public class NotSeekableWrapper : Stream
{
    private readonly Stream _stream;

    public NotSeekableWrapper(Stream stream)
    {
        _stream = stream;
    }

    public override bool CanRead => _stream.CanRead;

    public override bool CanSeek => false;

    public override bool CanWrite => _stream.CanWrite;

    public override long Length => throw new NotSupportedException();

    private long _position;
    public override long Position { get => _position; set => throw new NotSupportedException(); }

    public override void Flush()
    {
        _stream.Flush();
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        var read = _stream.Read(buffer, offset, count);
        _position += read;
        return read;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        throw new NotSupportedException();
    }

    public override void SetLength(long value)
    {
        throw new NotSupportedException();
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        _stream.Write(buffer, offset, count);
        _position += count;
    }
}

produces the malformed file.

@Erior
Copy link
Contributor

Erior commented Jun 17, 2022

Thank you, that does make excel unhappy, I will investigate

@Erior
Copy link
Contributor

Erior commented Jun 17, 2022

Did not find a corruption, however, we do explicitly say that the files are "Label"'s in this scenario , and that is obviously not right, removing it made Excel happy on my side. It would be great if you can test it out.

@aboryczko
Copy link
Author

@Erior sorry, I don't follow, could you explain a bit more how can I do it?

@Erior
Copy link
Contributor

Erior commented Jun 18, 2022

Top of my head would be for you to build with the pull request changes and use that dll, or wait for @adamhathcock to accept the change and generate new release. Not sure if you can get binaries out from pull builds even though they are tested automatically.
Someone could build it for you and send dll's, that is however a matter of trust/security.
Perhaps Mr Hathcock knows of other options, I participate somewhat infrequently on github.

@adamhathcock
Copy link
Owner

thanks for the fix @Erior

I'll push out the change soon as it was creating bad files.

@adamhathcock
Copy link
Owner

@salvois
Copy link

salvois commented Jul 11, 2022

Hi @aboryczko , @Erior , @adamhathcock ,
SharpCompress 0.32.1 appears to fix the offending code posted above as per in salvois/LargeXlsx#9.
Many thanks for reporting and fixing this!
Salvo

@adamhathcock
Copy link
Owner

🥳

@Nanook
Copy link
Collaborator

Nanook commented Jul 15, 2022

I'm closing this issue after confirming it's now working.

@Nanook Nanook closed this as completed Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants