-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for specifying the unpacked size outside of header #17
Add support for specifying the unpacked size outside of header #17
Conversation
Some LZMA streams, such as those used in OpenCTM, are encoded without the unpacked size specified in the header. This is possible to read in some of the C implementations of LZMA by specifying a header size and providing the unpacked size as an option to the decoder. This change adds the same possibility to lzma_rs in a typesafe manner, where the unpacked size and whether it should be written to the header is specified explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great :) Just a few comments.
Thanks for the comments! After writing the tests, I realized that the end-of-stream marker is always written. However, this is documented in XZ as only being supposed to be written if it is not provided in the header. We should probably not write this if the value is provided and written in the header. Is removing the bits encoded in Encoder::finish the way to achieve this? lzma-rs/src/encode/dumbencoder.rs Lines 71 to 97 in 4d46835
Not sure exactly what should be done if it is known, but not provided, like in the case of OpenCTM. I will see if I can figure out what happens in other libs in that case. |
As the name goes, the |
Also, make the Options derive Clone and Debug.
These tests do not make sense because they provided a value (None) that indicated an end marker to be expected in cases where there should be none.
Seems like it works fine with unlzma, so I removed writing the end marker if the unpacked size is written to the header. I also realized that a couple of the tests did not really make sense after this and removed those. |
…for-custom-unpacked-size
bors r+ |
17: Add support for specifying the unpacked size outside of header r=gendx a=dragly Some LZMA streams, such as those used in OpenCTM, are encoded without the unpacked size specified in the header. This is possible to read in some of the C implementations of LZMA by specifying a header size and providing the unpacked size as an option to the decoder. This change adds the same possibility to lzma_rs in a typesafe manner, where the unpacked size and whether it should be written to the header is specified explicitly. ### Pull Request Overview This pull request adds `Options` objects to two new public modules named `compress` and `decompress` that can be used to specify whether the unpacked size should be written to and read from the LZMA header. ### Testing Strategy This pull request was tested by... - [x] Added relevant unit tests. - [ ] Added relevant end-to-end tests (such as `.lzma`, `.lzma2`, `.xz` files). ### Supporting Documentation and References This exotic use of the LZMA header is only indicated in the [OpenCTM specification itself](http://openctm.sourceforge.net/media/FormatSpecification.pdf), where it specifically states that the offset of the stream [is only 9 bytes](https://github.com/Danny02/OpenCTM/blob/243a343bd23bbeef8731f06ed91e3996604e1af4/doc/FormatSpecification.tex#L91), while it should have been 17 bytes with the unpacked size in place. This is based on 4 bytes from OpenCTM itself, 1 byte for the props, 4 bytes for the dict size and the missing 8 bytes for the unpacked size. It can also be seen from the OpenCTM source code that the LZMA header written is only 5 bytes: https://github.com/Danny02/OpenCTM/blob/243a343bd23bbeef8731f06ed91e3996604e1af4/lib/stream.c#L311 ### TODO or Help Wanted This pull request still needs a bit of bikeshedding on the names of the modules and enums used in the options :) Co-authored-by: Svenn-Arne Dragly <s@dragly.com> Co-authored-by: Svenn-Arne Dragly <dragly@cognite.com> Co-authored-by: G. Endignoux <ggendx@gmail.com>
Build succeeded |
Some LZMA streams, such as those used in OpenCTM, are encoded without
the unpacked size specified in the header. This is possible to read in
some of the C implementations of LZMA by specifying a header size and
providing the unpacked size as an option to the decoder.
This change adds the same possibility to lzma_rs in a typesafe manner,
where the unpacked size and whether it should be written to the header
is specified explicitly.
Pull Request Overview
This pull request adds
Options
objects to two new public modules namedcompress
anddecompress
that can be used to specify whether the unpacked size should be written to and read from the LZMA header.Testing Strategy
This pull request was tested by...
.lzma
,.lzma2
,.xz
files).Supporting Documentation and References
This exotic use of the LZMA header is only indicated in the OpenCTM specification itself, where it specifically states that the offset of the stream is only 9 bytes, while it should have been 17 bytes with the unpacked size in place. This is based on 4 bytes from OpenCTM itself, 1 byte for the props, 4 bytes for the dict size and the missing 8 bytes for the unpacked size.
It can also be seen from the OpenCTM source code that the LZMA header written is only 5 bytes:
https://github.com/Danny02/OpenCTM/blob/243a343bd23bbeef8731f06ed91e3996604e1af4/lib/stream.c#L311
TODO or Help Wanted
This pull request still needs a bit of bikeshedding on the names of the modules and enums used in the options :)