Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally allow trailing data in bufread::XzDecoder #86

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bgilbert
Copy link

Some xz streams have unrelated data afterward. In particular, Linux kernel initrd files are the concatenation of multiple cpio archives, each of which can be compressed with a different compressor. read::XzDecoder and bufread::XzDecoder return InvalidData in this case, which makes it difficult to detect the EOF, unwrap the underlying stream, and continue reading with a different decompressor. (write::XzDecoder returns Ok(0) after the end of the xz stream, which is less ambiguous.) Multi-decoder mode doesn't address this, since that only handles the case where the following data is also an xz stream.

liblzma properly returns StreamEnd here; we just need to detect it. However, the xz test suite contains some tests with trailing garbage, and the xz command-line tool is designed to fail on those unless --single-stream is specified. For compatibility, we probably can't allow trailing garbage by default, but we can provide an option. Add an allow_trailing_data() toggle to bufread::XzDecoder, and stop accepting bytes in read() if we reach StreamEnd with that toggle enabled.

Do not add a similar option to read::XzDecoder, since it's only useful if the underlying stream is synced to the end of the xz stream afterward, and read::XzDecoder can't ensure that.

Also add an additional test verifying that write::XzDecoder refuses to accept additional bytes after the xz stream reaches StreamEnd.

Some xz streams have unrelated data afterward.  In particular, Linux
kernel initrd files are the concatenation of multiple cpio archives, each
of which can be compressed with a different compressor.  read::XzDecoder
and bufread::XzDecoder return InvalidData in this case, which makes it
difficult to detect the EOF, unwrap the underlying stream, and continue
reading with a different decompressor.  (write::XzDecoder returns Ok(0)
after the end of the xz stream, which is less ambiguous.)
Multi-decoder mode doesn't address this, since that only handles the case
where the following data is also an xz stream.

liblzma properly returns StreamEnd here; we just need to detect it.
However, the xz test suite contains some tests with trailing garbage,
and the xz command-line tool is designed to fail on those unless
--single-stream is specified.  For compatibility, we probably can't allow
trailing garbage by default, but we can provide an option.  Add an
allow_trailing_data() toggle to bufread::XzDecoder, and stop accepting
bytes in read() if we reach StreamEnd with that toggle enabled.

Do not add a similar option to read::XzDecoder, since it's only useful if
the underlying stream is synced to the end of the xz stream afterward,
and read::XzDecoder can't ensure that.
Verify that write::XzDecoder refuses to accept additional bytes after
the xz stream reaches StreamEnd.
@alexcrichton
Copy link
Owner

Thanks for the PR! Unfortunately I don't really have the time to maintain this crate nowadays, though. If you're interested I could transfer ownership to you, however.

Copy link

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superficial LGTM

@cgwalters
Copy link

cgwalters commented Dec 14, 2021

Hmm. In the coreos/ GH organization we do maintain some crates. So one option is to transfer it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants