-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade from Construct 2.8.8 to 2.10.68 (All tests pass on 3.10) #24
base: master
Are you sure you want to change the base?
Conversation
33009cf
to
dff8bb5
Compare
f38276c
to
0de7eca
Compare
When construct replaced String with PaddedString, the return type changed from bytes to unicode strings (u"..." on Python 2, "..." on Python 3). This affects a lot of conditional checks, tests, and user code. For example box type definitions. I've updated the code to reflect this change.
For some reason sometime between 2.8.8 and 2.8.22 the arguments for Const were flipped.
The includelength parameter was added in `2.8.11`. Also removes TellMinusSizeOf as it's now unused.
The default for CString and PascalString was previously "ascii" (from what I can tell) so I specified "ascii" for everything that did not explicitly specify an encoding.
Construct 2.10 (maybe even 2.9) no longer allows you to do e.g., `Container(foo=1)(bar=2)` as an alternative to `Container(foo=1, bar=2)` and I honestly can't blame them, it makes the code extremely annoying to read and undoubtably to parse as well. This commit simplifies them to singular construct call under a single Container object.
This way we don't have to use BitStruct and messy embedding/nesting decisions and can directly nest the language.
This removes duplicate checks and duplicate output data.
This is by no means ideal. It would be much preferred to be embedded, but the maintainer of construct has stated there is no workaround and nesting is the only option. This means when building boxes we must now nest the data in `data` container. This also means grabbing a fields value must now specify `.data` first, e.g., `tenc_box.data.is_encrypted`. See changes with the tests. Not ideal at all.
0de7eca
to
d81c932
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #24 +/- ##
==========================================
+ Coverage 75.24% 79.66% +4.41%
==========================================
Files 5 7 +2
Lines 202 177 -25
==========================================
- Hits 152 141 -11
+ Misses 50 36 -14
☔ View full report in Codecov by Sentry. |
@rlaphoenix do you think you could rebase this PR? |
Done 👍 |
6e47a61
to
73a4ba4
Compare
73a4ba4
to
33dc5d6
Compare
Do you think this is good to go? @rlaphoenix |
Other than the cons/differences as listed in the pull description, it should be, yeah. All tests are successful, just any client that uses pymp4 should be very hesitant to update pymp4 until they can verify their own calls to construct will support 2.10, as well as understand the changes with pymp4 in regards to accessing most parsed box information, again referencing the pull request description. Probably best to do a pymp4 v2 release if we do go ahead and pull this, just so ^-based dependency requirements won't update to pymp4 2.x without manual intervention. |
I think the box type should be still be |
Hi, sorry if this is perhaps not the right place to report this, but I found a bug in this branch - test case: from pymp4.parser import Box
data = bytes.fromhex("000000147361696f000000000000000100000325")
saio = Box.parse(data)
print(saio) In the
But in
(the full backtrace is longer) Relevant code: https://github.com/devine-dl/pymp4/blob/33dc5d620f361cb49d713b60dcb2686ab8696f91/src/pymp4/parser.py#L511-L522 Possibly related: construct/construct#409 |
In this pull request, I've made as few changes to the existing code as possible. I've also reviewed my code multiple times before pushing, unlike in #9 where changes in each commit seem unbalanced, bad commit titles and descriptions, and various debugging code left in.
Breaking changes to users in pymp4 (so far) if pulled:
Box
construct that were parsed as strings, are no longer left asbytes
type. They are now decoded to unicode strings (i.e., "foo" not b"foo", or "tenc" not b"tenc"). This affects everything from box type, ftyp major_brand, and so on.Instead of:
parser.py
toadapters.py
.parser.py
tosubconstructs.py
.data
field. This was simply a requirement. I could not think of any way to embedded the box fields, nor can the maintainer of construct when asked. It seems to simply be a completely pulled feature that will not return. This means all user code that accesses fields must be updated from e.g.,tenc_box.is_encrypted
totenc_box.data.is_encrypted
.data
field container.box.language.code
like Construct upgrade to 2.10 #9 did).If you have any ideas for improvements or want to change how those "data"/"flags" fields are named or function, feel free to. Sadly I could not find anything better.
All tests pass on Python 3.10.
Closes #23 and #3