Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from Construct 2.8.8 to 2.10.68 (All tests pass on 3.10) #24

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

rlaphoenix
Copy link
Collaborator

@rlaphoenix rlaphoenix commented Apr 6, 2023

Warning This has changes to both tests and user code that would be breaking changes. This is not something a user should simply upgrade to without checking and changing code. This WILL break user code. Perhaps a major upgrade is necessary? (4.x). This pull request is based on #9 but with some semantics changed to be as non-breaking as possible.

In this pull request, I've made as few changes to the existing code as possible. I've also reviewed my code multiple times before pushing, unlike in #9 where changes in each commit seem unbalanced, bad commit titles and descriptions, and various debugging code left in.

Breaking changes to users in pymp4 (so far) if pulled:

  1. All data in the boxes and the high-level Box construct that were parsed as strings, are no longer left as bytes type. They are now decoded to unicode strings (i.e., "foo" not b"foo", or "tenc" not b"tenc"). This affects everything from box type, ftyp major_brand, and so on.
  2. Since it now uses Construct 2.10, you can no longer define/modify a construct by re-calling the constructor of the constructor return, because it no longer returns the constructor like that. I.e., you must now do:
Container(
    offset=0,
    type="ftyp",
    major_brand="iso5",
    minor_version=1,
    compatible_brands=["iso5", "avc1"],
    end=24
)

Instead of:

Container(offset=0)
  (type="ftyp")
  (major_brand="iso5")
  (minor_version=1)
  (compatible_brands=["iso5", "avc1"])
  (end=24)
  1. All custom adapters have been moved from parser.py to adapters.py.
  2. TellMinusSizeOf is now TellPlusSizeOf and has been moved from parser.py to subconstructs.py.
  3. All boxes field's are now nested within a data field. This was simply a requirement. I could not think of any way to embedded the box fields, nor can the maintainer of construct when asked. It seems to simply be a completely pulled feature that will not return. This means all user code that accesses fields must be updated from e.g., tenc_box.is_encrypted to tenc_box.data.is_encrypted.
  4. All boxes had the type field removed from their individual structs. It is unnecessary as the length information isn't prefixed on them, so they were more or less already risky to use manually/outside of the Box struct. This was required so as to not duplicate the type in the Box container and the Box's data field container.
  5. All EmbeddedBitStruct and Embedded(BitStruct()) fields were nested under a "flags" field. This has the same ramifications as change 5.
  6. The ISO6392TLanguageCode Adapter has been reworked to expect an Int16 input as obj, which will then do Bitwise operations on it itself. This was so we don't need to nest the parsed Language Code under another field (i.e., box.language.code like Construct upgrade to 2.10 #9 did).

If you have any ideas for improvements or want to change how those "data"/"flags" fields are named or function, feel free to. Sadly I could not find anything better.

All tests pass on Python 3.10.

(pymp4-py3.10) PS C:\Users\rlaphoenix\Projects\pymp4> python -m pytest tests/
================================= test session starts ==================================
platform win32 -- Python 3.10.10, pytest-7.0.1, pluggy-1.0.0
rootdir: C:\Users\rlaphoenix\Projects\pymp4
plugins: cov-4.0.0
collected 17 items                                                                      

tests\test_box.py ........                                                        [ 47%]
tests\test_dashboxes.py ..                                                        [ 58%]
tests\test_util.py .......                                                        [100%]

================================== 17 passed in 0.09s ==================================

Closes #23 and #3

@rlaphoenix rlaphoenix force-pushed the construct-2.10-patch branch 5 times, most recently from 33009cf to dff8bb5 Compare April 6, 2023 11:26
@rlaphoenix rlaphoenix marked this pull request as ready for review April 6, 2023 11:48
@rlaphoenix rlaphoenix changed the title WIP upgrade from Construct 2.8.8 to 2.10.68 Upgrade from Construct 2.8.8 to 2.10.68 (All tests pass on 3.10) Apr 6, 2023
@rlaphoenix rlaphoenix linked an issue Apr 6, 2023 that may be closed by this pull request
@rlaphoenix rlaphoenix force-pushed the construct-2.10-patch branch 2 times, most recently from f38276c to 0de7eca Compare April 17, 2023 20:18
When construct replaced String with PaddedString, the return type changed from bytes to unicode strings (u"..." on Python 2, "..." on Python 3).

This affects a lot of conditional checks, tests, and user code. For example box type definitions.

I've updated the code to reflect this change.
For some reason sometime between 2.8.8 and 2.8.22 the arguments for Const were flipped.
The includelength parameter was added in `2.8.11`. Also removes TellMinusSizeOf as it's now unused.
The default for CString and PascalString was previously "ascii" (from what I can tell) so I specified "ascii" for everything that did not explicitly specify an encoding.
Construct 2.10 (maybe even 2.9) no longer allows you to do e.g., `Container(foo=1)(bar=2)` as an alternative to `Container(foo=1, bar=2)` and I honestly can't blame them, it makes the code extremely annoying to read and undoubtably to parse as well.

This commit simplifies them to singular construct call under a single Container object.
This way we don't have to use BitStruct and messy embedding/nesting decisions and can directly nest the language.
This removes duplicate checks and duplicate output data.
This is by no means ideal. It would be much preferred to be embedded, but the maintainer of construct has stated there is no workaround and nesting is the only option.

This means when building boxes we must now nest the data in `data` container. This also means grabbing a fields value must now specify `.data` first, e.g., `tenc_box.data.is_encrypted`. See changes with the tests. Not ideal at all.
@codecov-commenter
Copy link

codecov-commenter commented Apr 19, 2023

Codecov Report

Patch coverage: 92.30% and project coverage change: +4.41% 🎉

Comparison is base (8e00455) 75.24% compared to head (6e47a61) 79.66%.

❗ Current head 6e47a61 differs from pull request most recent head 73a4ba4. Consider uploading reports for the commit 73a4ba4 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #24      +/-   ##
==========================================
+ Coverage   75.24%   79.66%   +4.41%     
==========================================
  Files           5        7       +2     
  Lines         202      177      -25     
==========================================
- Hits          152      141      -11     
+ Misses         50       36      -14     
Files Changed Coverage Δ
src/pymp4/adapters.py 90.00% <90.00%> (ø)
src/pymp4/subconstructs.py 91.66% <91.66%> (ø)
src/pymp4/parser.py 93.58% <100.00%> (+9.88%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@beardypig
Copy link
Owner

@rlaphoenix do you think you could rebase this PR?

@rlaphoenix
Copy link
Collaborator Author

@rlaphoenix do you think you could rebase this PR?

Done 👍

@beardypig
Copy link
Owner

Do you think this is good to go? @rlaphoenix

@rlaphoenix
Copy link
Collaborator Author

rlaphoenix commented Aug 7, 2023

Other than the cons/differences as listed in the pull description, it should be, yeah. All tests are successful, just any client that uses pymp4 should be very hesitant to update pymp4 until they can verify their own calls to construct will support 2.10, as well as understand the changes with pymp4 in regards to accessing most parsed box information, again referencing the pull request description.

Probably best to do a pymp4 v2 release if we do go ahead and pull this, just so ^-based dependency requirements won't update to pymp4 2.x without manual intervention.

@ktp420
Copy link

ktp420 commented Dec 6, 2023

I think the box type should be still be byte type since mp4 specs defines them as fourCC, which is just realy int made from readable 4 chars. Also there are specs, such as Apple metadata, which define types as non-ascii, i.e. byte > 0x7F (\251nam).
I would be happen with making them Bytes(4) instead of PaddedString(4, 'ascii'). This will keep existing application working and match the spec. Also makes extending pymp4 much more prodictable and work with boxes not defined in pymp4.

@DavidBuchanan314
Copy link

DavidBuchanan314 commented Jan 21, 2024

Hi, sorry if this is perhaps not the right place to report this, but I found a bug in this branch - saio boxes are not parsed correctly.

test case:

from pymp4.parser import Box
data = bytes.fromhex("000000147361696f000000000000000100000325")
saio = Box.parse(data)
print(saio)

In the master branch, we get the expected output:

Container(offset=0)(type=b'saio')(version=0)(flags=Container(has_aux_info_type=False))(aux_info_type=None)(aux_info_type_parameter=None)(offsets=[805])(end=20)

But in construct-2.10-patch:

[...]
  File "/usr/lib/python3.12/site-packages/construct/expr.py", line 188, in __call__
    return self.__parent(obj)[self.__field]
           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'version'

(the full backtrace is longer)

Relevant code: https://github.com/devine-dl/pymp4/blob/33dc5d620f361cb49d713b60dcb2686ab8696f91/src/pymp4/parser.py#L511-L522

Possibly related: construct/construct#409

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Majority of tests fail on Python 3.10 and newer Update to construct 2.9
5 participants