Add basic AVIF parsing support #193

baumanj · 2020-02-21T03:59:24Z

Since the fundamentals of HEIF, upon which AVIF is based, are quite different
from video, this adds a new context type, which currently only contains the
data for the "primary item", typically an av01 coded image frame.

Extracting the primary item requires support for several new box types which
specify the location type of various data and metadata. Various features are
not yet fully supported, but should provide informative errors.

Since the specification does not make many guarantees about the ordering of
boxes, there is some additional logic complexity in an attempt to avoid
copying buffers when possible.

Also, a set of AVIF sample images (see the ReadMe.txt for licensing) is added
to exercise the new parsing code.

Since the fundamentals of HEIF, upon which AVIF is based, are quite different from video, this adds a new context type, which currently only contains the data for the "primary item", typically an av01 coded image frame. Extracting the primary item requires support for several new box types which specify the location type of various data and metadata. Various features are not yet fully supported, but should provide informative errors. Since the specification does not make many guarantees about the ordering of boxes, there is some additional logic complexity in an attempt to avoid copying buffers when possible. Also, a set of AVIF sample images (see the ReadMe.txt for licensing) is added to exercise the new parsing code.

kinetiknz

Thanks! I'm still working on fully reviewing this, but here are my comments and questions so far:

Would it make sense to exclude mp4parse/tests/avif and use a git submodule of https://github.com/AOMediaCodec/av1-avif instead?
- If not, remove .DS_Store from the import please
New C API needs a new fuzzer added in mp4parse_capi/fuzz/fuzz_targets
I'm not sure about read_avif_meta diverging from read_meta and generally splitting the parser implementation too much to support AVIF; we should be aiming to share as much of the code as makes sense
I need to take a closer pass through the spec(s) to understand why we need the extent stuff

mp4parse_capi/src/lib.rs

mp4parse/src/boxes.rs

mp4parse_capi/src/lib.rs

mp4parse/src/lib.rs

kinetiknz

Last comment missed my comments on fallible allocation.

The existing code uses vec_push and allocate_read_buf. It might be simple to adapt to using those, but if it's going to be cleaner to extend the fallible allocation code there may be existing code inside Gecko we could consider sharing.

mp4parse/src/lib.rs

kinetiknz · 2020-02-25T05:57:03Z

mp4parse/src/lib.rs

+                // Store any remaining data for potential later extraction
+                if b.bytes_left() > 0 {
+                    let offset = b.offset();
+                    let mut data = Vec::with_capacity(b.bytes_left().try_into()?);


This should use allocate_read_buf and handle possible failure.

Why does allocate_read_buf explicitly zero the memory as opposed to just allocating the additional capacity? For the single caller, it seems to incur an unnecessary cost.

I have other Vec::with_capacity callers that aren't Vec<u8>, so I'll need to make some change here. What's your preference?

Modify allocate_read_buf to take a type parameter and not fill its capacity with zeroes

Leave allocate_read_buf as is, but add a new function like
fn vec_with_capacity<T>(capacity: usize) -> std::result::Result<Vec<T>, ()> { #[cfg(feature = "mp4parse_fallible")] { let mut v = Vec::new(); FallibleVec::try_reserve(&mut v, capacity)?; Ok(v) } #[cfg(not(feature = "mp4parse_fallible"))] { Ok(Vec::with_capacity(capacity)) } }

Something else
?

The zeroing is due to #172

Ooh, that's subtle. I'm going to add a comment back to the issue for posterity, if you don't mind.

Do you have any preference regarding whether or not to add vec_with_capacity versus modifying allocate_read_buf to handle types other than u8?

If I implement vec_with_capacity as described above, I don't think we need to worry about the issue in #172 since it doesn't provide any access to potentially uninitialized memory the way the old version of allocate_read_buf did. It gets around that by using Read::read_to_end which takes a &mut Vec<u8> buf rather than Read::read, which takes a &mut [u8] buf.

It occurs to me that we could eliminate the need for zeroing out the memory in allocate_read_buf (and for allocate_read_buf ifself) if we switched the code here from using read to read_to_end with take like so:

if let Ok(mut buf) = allocate_read_buf(size) { let r = src.read(&mut buf)?;

becomes

if let Ok(mut buf) = vec_with_capacity(size) { let r = src.take(size.to_u64()).read_to_end(&mut buf)?;

The only caveat I can see is that if the Vec passed to read_to_end didn't already have sufficient capacity, it could end up doing infallible allocation. But that shouldn't be an issue as long as we use vec_with_capacity.

Thoughts?

Ooh, that's subtle. I'm going to add a comment back to the issue for posterity, if you don't mind.

Good idea, thanks.

Do you have any preference regarding whether or not to add vec_with_capacity versus modifying allocate_read_buf to handle types other than u8?

I think we only need one, so making it handle other simple types makes sense. I don't mind which one we keep but vec_with_capacity is probably a better name.

If I implement vec_with_capacity as described above, I don't think we need to worry about the issue in #172 since it doesn't provide any access to potentially uninitialized memory the way the old version of allocate_read_buf did. It gets around that by using Read::read_to_end which takes a &mut Vec<u8> buf rather than Read::read, which takes a &mut [u8] buf.

I think you still have the same issue, in that you can't pass a Vec with an uninitialized backing store to Read::read_to_end because the trait doesn't promise implementations won't read from the &mut Vec before writing to it. Although the docs only include a warning for Read::read, it's actually a restriction on the entire trait. Once Read::initializer is stable, we can use that to signal that our implementation of Read doesn't read from the buffers passed in.

Maybe I'm not understanding #172 correctly. Can you let me know where my logic is flawed?

As I understand it, the unsoundness came from this line in the old version of allocate_read_buf:

unsafe { buf.set_len(size); }

because it violated the safety requirement of Vec::set_len():

The elements at old_len..new_len must be initialized.

And as mentioned in Read::read:

It is your responsibility to make sure that buf is initialized before calling read. Calling read with an uninitialized buf (of the kind one obtains via MaybeUninit<T>) is not safe, and can lead to undefined behavior.

So, because allocate_read_buf called set_len without initializing the additional capacity, it was uninitialized and the call to Read::read was unsound. More generally, allocate_read_buf returned a vector that was invalid because uninitialized memory was now accessible via the safe interface.

Hovever, I don't see how the vec_with_capacity case has the same issue. There's no unsafe code in its implementation, so how does it introduce any uninitialized memory into the memory of the vector's elements? The only potentially uninitialized memory should be in the internal buffer beyond the valid length of the vector.

Take this example code:

let mut reader: &[u8] = b"1234567890"; let mut buf = Vec::with_capacity(5); reader.read(&mut buf).unwrap();

After the first call to reader.read, buf will be empy. Nothing will be read into it because despite having a capacity of 10, it has a length of 0. Contrast that with identical code except replacing read with read_to_end, and the result is that buf contains all 10 elements.

I think the difference is easy to miss because in both cases we pass &mut buf, but in the read call, that is auto derefed to a &mut [u8] slice, and in the read_to_end case, it remains a &mut Vec<u8>. The callee can't change the length of the prior, but it can change the length of the latter, which is why Read::read_to_end's documentation describes it as "append[ing] data to buf, whereas Read::read's says it "pull[s] some bytes from this source into the specified buffer".

Unless I'm missing something, we have the opportunity to avoid the cost of zeroing out the buffer (as with the earlier version of allocate_read_buf), but eliminating the potential for unsoundness. Does that seem worth it?

I'm confusing things by thinking only about the allocate_read_buf implementation. I agree with your comment (thanks for the detailed reasoning!), so replacing allocate_read_buf + read with vec_with_capacity + read_to_end allows us to eliminate the extra initialization safely.

I ended up taking a slightly different approach from what I suggested in #193 (comment) since I realized I could simplify things by making our new read_to_end with fallible allocation support take care of doing all the allocation.

The heart of the change is here. I ended up splitting out some other cleanup-type changes into the previous commit.

@kinetiknz, @SingingTree: could you please give this one more look before I merge?

mp4parse/src/lib.rs

SingingTree · 2020-02-25T22:28:55Z

mp4parse/src/lib.rs

+        // which has no `construction_method` field. It does say:
+        // "For maximum compatibility, version 0 of this box should be used in preference to
+        //  version 1 with `construction_method==0`, or version 2 when possible."
+        // We take this to imply version 0 can be interpreted as using file offsets.


That is my reading too, and I've been trying to figure out if there's a rule somewhere the makes this explicit. For example if there is a blanket rule with the ISOBMFF where a field that does not exist in for a given version of a box is treated as 0, which would be annoying in it's own way, but would at least give explicit clarity to this case :|

baumanj · 2020-02-26T05:54:23Z

Thanks for all the feedback, @kinetiknz! I'll be addressing all the issues and update the PR soon. To reply to some of the higher level comments:

Would it make sense to exclude mp4parse/tests/avif and use a git submodule of https://github.com/AOMediaCodec/av1-avif instead?

Ah, good idea. I'll look into that.

New C API needs a new fuzzer added in mp4parse_capi/fuzz/fuzz_targets

Will do.

I'm not sure about read_avif_meta diverging from read_meta and generally splitting the parser implementation too much to support AVIF; we should be aiming to share as much of the code as makes sense

Yeah, I was hoping to share more code as well, but I didn't see a way that made sense. Since HEIF so so different from video, nothing in MediaContext would've been useful to AVIF. I suppose we could make dramatic changes to read_mp4 and read_meta so that they're shared, but conceptually they have nothing really in common, so it didn't seem like a win in terms of complexity of maintainability. Also, it seemed especially disadvantageous to perturb code that is working well for the video case in order to add functionality that seems basically orthogonal.

If you see a different approach that would allow more code to be shared or reduce complexity, I'm all for it.

I need to take a closer pass through the spec(s) to understand why we need the extent stuff

The iloc box is definitely at the heart of it, but it was probably the part of the spec (of what I needed to implement) that I found the most confusing.

Consequently, remove previously added test files from this repo.

Also, update mp4.dict with the new box types added for AVIF.

kinetiknz · 2020-02-27T10:12:23Z

Yeah, I was hoping to share more code as well, but I didn't see a way that made sense. Since HEIF so so different from video, nothing in MediaContext would've been useful to AVIF. I suppose we could make dramatic changes to read_mp4 and read_meta so that they're shared, but conceptually they have nothing really in common, so it didn't seem like a win in terms of complexity of maintainability. Also, it seemed especially disadvantageous to perturb code that is working well for the video case in order to add functionality that seems basically orthogonal.

Makes sense to me, thanks. We can always consider merging/sharing code later, if it makes sense, as the AVIF support grows.

- Rename ProtectionSchemeInformationBox -> ProtectionSchemeInfoBox to match naming in syntax section of ISO spec. - Remove unused MetadataItemInformationBox (itif). - Rename Mdat -> MediaDataBox. - Add vec_with_capacity, extend_from_slice and read_to_end to support those operations with fallible allocation. - Switch Vec::push calls to use vec_push for fallible allocation. - Rename u32_to_string -> be_u32_to_string - Fix typos, add TODOs with linked issues

baumanj · 2020-02-28T03:16:00Z

This will be busted until mozilla/mp4parse_fallible#6 is merged (presumably requiring a new version and so one final update here), but most of the other issues should be addressed, so I thought it was worth getting the review cycle started again.

baumanj · 2020-02-29T04:09:55Z

I note that the CI didn't fail, which leads me to believe that it doesn't test building with --features mp4parse_fallible. I'll try adding a change to this PR to rectify that.

kinetiknz

LGTM, thanks!

We'll need to do a small rebase to benefit from the switch to a git submodule for the test files.

mp4parse/tests/public.rs

mp4parse/src/lib.rs

kinetiknz · 2020-03-02T23:54:57Z

I see the CI with --all-features is failing to build afl.rs. Rather than any spend time fixing it, I think we can remove anything related to afl.rs fuzzing now, since we're using the cargo fuzz infrastructure under mp4parse_capi/fuzz, making the older fuzzing support obsolete.

All fuzzing is now handled by libfuzzer_sys

baumanj · 2020-03-03T01:36:19Z

I think the issue is that abort_on_panic was missing from mp4parse_capi/Cargo.toml. However, I do like deleting code, so I'll try removing all the afl stuff and see if CI passes.

baumanj · 2020-03-03T01:58:51Z

Well, that seemed to solve the afl.rs issue, but now travis Windows jobs with --all-features are failing in a non-obvious way. Any ideas, @kinetiknz? ~~Some quick googling makes me think we could be running into rust-lang/rust#39487 (comment).~~

After looking more closely, I see the error is

  process didn't exit successfully: `C:\Users\travis\build\mozilla\mp4parse-rust\target\debug\deps\mp4parse-3a88e66b5ef720dc.exe` (exit code: 0xc0000374, STATUS_HEAP_CORRUPTION)

Which may actually mean we've caught something. It seems to consistently occur after tests::max_table_limit, so I'll take a look there.

We want the 4 combinations of --release and --all-features being enabled, not 2 jobs with neither and one job with just one of each.

kinetiknz · 2020-03-03T02:52:47Z

The tests are crashing with STATUS_HEAP_CORRUPTION. This is because it's not safe to build with mp4parse_fallible enabled unless it's also guaranteed that the system allocator matches the Rust allocator (which is guaranteed within Gecko). That's briefly mentioned here.

Since Rust dropped jemalloc, we're getting lucky on the other Travis platforms since the system and Rust allocators happen to match. Windows has different allocator rules, so we hit this issue.

I think it's fine to run tests without mp4parse_fallible built on Travis, since we still get coverage via Gecko's testsuite.

baumanj · 2020-03-03T03:21:12Z

I think it's fine to run tests without mp4parse_fallible built on Travis, since we still get coverage via Gecko's testsuite.

Ah, that makes sense. Thanks for the explanation. In the spirit of getting a little bit of early warning, how about just running cargo check --all --tests --all-features and only running cargo test without mp4parse_fallible?

baumanj · 2020-03-03T20:27:10Z

This is kind of odd. Investigating the CI failures following adding cargo check, I can reproduce this consistently locally (macOS):

$ cargo clean && cargo check && cargo test --all
[…]
test build_ffi_test ... FAILED

failures:

---- build_ffi_test stdout ----
status: exit code: 2
--- stdout ---
rm -f test
rm -f *.a.out
rm -f *.o *.a
rm -f *.d
rustc -g --crate-type staticlib --crate-name mp4parse \
	  --emit dep-info,link=libmp4parse.a \
	  --print native-static-libs \
	  -L /Users/jbauman/src/mp4parse-rust_m-u-current/target/debug/build/mp4parse_capi-45f37232212af4dc/out/../../../deps ../src/lib.rs \
	  2> libmp4parse.a.out || cat libmp4parse.a.out >&2
c++ -g -Wall -std=c++11 -I../include/ -c test.cc
c++ -g -Wall -std=c++11 -I../include/ -o test *.o libmp4parse.a 

-- stderr ---
error[E0464]: multiple matching crates for `mp4parse`
  --> ../src/lib.rs:37:1
   |
37 | extern crate mp4parse;
   | ^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: candidates:
           crate `mp4parse`: /Users/jbauman/src/mp4parse-rust_m-u-current/target/debug/deps/libmp4parse-d0bac2999dfaf13d.rlib

error[E0463]: can't find crate for `mp4parse`
  --> ../src/lib.rs:37:1
   |
37 | extern crate mp4parse;
   | ^^^^^^^^^^^^^^^^^^^^^^ can't find crate

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0463`.
clang: error: no such file or directory: 'libmp4parse.a'
make: *** [test] Error 1

thread 'build_ffi_test' panicked at 'assertion failed: output.status.success()', mp4parse_capi/tests/build_ffi_test.rs:19:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

however both

$ cargo clean && cargo test --all && cargo test --all

$ cargo clean && cargo build && cargo test --all

work fine

So, it's not about features, or about running things multiple times, but something that cargo check is doing is causing that failure. I'll investigate.

kinetiknz · 2020-03-03T21:05:40Z

build_ffi_test is pretty wacky, sorry. It's trying to link the built mp4parse, so it relies on output from the build phase. The test phase produces a different path/binary, so the makefile fails to find the input. (Re)writing it now, it'd make more sense to use cc-rs instead of shelling out to make, and there's probably a better way to find the path of the link target than hardcoding the makefile.

It'd be great to fix it but we can fix it independently from your PR, so don't let it block your progress on merging if you'd prefer to focus on that first.

baumanj · 2020-03-03T21:28:46Z

build_ffi_test is pretty wacky, sorry. It's trying to link the built mp4parse, so it relies on output from the build phase.

Requiring output from the build phase isn't that surprising to me, but what I'm not really grokking is why adding in a cargo check beforehand would break things. Even cargo check && cargo build && cargo test fails, so it's not the absence of cargo build that's causing the failure.

In any case, since this is definitely a detour relative to the purpose of this PR, and you seem to be more familiar with the code in question, I'll leave it alone. In the meantime, I think a good workaround would be putting the cargo check --all --verbose $RELEASE --tests --all-features after the cargo test in CI. That should hopefully still get us some checking of mp4parse_fallible-feature code.

I'll try that change shortly. And if that doesn't work, I'll just give up on changing the CI to include feature checking for now.

This simplifies a number of the callers and avoids try_into() calls. Also, simplify the bound on T from ReadBytesExt to Read, which is sufficient.

Since read_to_end() takes a &mut Vec buf rather than a &mut slice, it can increase the size (i.e., length, not capacity) of the buf during operation. This removes the soundness requirement of pre-initializing the full capacity that occurred in allocate_read_buf(), thereby improving efficiency. As this was allocate_read_buf()'s only caller, remove it as well.

kinetiknz

LGTM!

baumanj added the enhancement label Feb 21, 2020

baumanj requested review from kinetiknz and SingingTree February 21, 2020 03:59

baumanj self-assigned this Feb 21, 2020

kinetiknz mentioned this pull request Feb 25, 2020

Reformat the code using rustfmt #194

Closed

kinetiknz reviewed Feb 25, 2020

View reviewed changes

SingingTree reviewed Feb 25, 2020

View reviewed changes

baumanj added 3 commits February 25, 2020 22:36

Add AVIF spec repro as a submodule to include test files

e98988e

Consequently, remove previously added test files from this repo.

Fir "unnecessary parentheses around block return value" warnings

51f87d4

Add 'avif' fuzz target for AVIF-specific parts of the C API

a7f1177

Also, update mp4.dict with the new box types added for AVIF.

baumanj mentioned this pull request Feb 27, 2020

Add support methods for AVIF changes to parser mozilla/mp4parse_fallible#6

Merged

Run CI tests with and without all features

5fc2d60

kinetiknz approved these changes Mar 2, 2020

View reviewed changes

SingingTree reviewed Mar 2, 2020

View reviewed changes

mp4parse/tests/public.rs Show resolved Hide resolved

SingingTree reviewed Mar 2, 2020

View reviewed changes

mp4parse/src/lib.rs Outdated Show resolved Hide resolved

SingingTree reviewed Mar 2, 2020

View reviewed changes

mp4parse/src/lib.rs Outdated Show resolved Hide resolved

SingingTree approved these changes Mar 2, 2020

View reviewed changes

Update mp4parse_fallible dep for FallibleVec::try_extend_from_slice

5de20f7

Remove afl fuzzing code

1666db7

All fuzzing is now handled by libfuzzer_sys

Fix travis config to properly combine env var values

cd8b3d7

We want the 4 combinations of --release and --all-features being enabled, not 2 jobs with neither and one job with just one of each.

At least run cargo check for all features in CI

4cba7f5

baumanj mentioned this pull request Mar 3, 2020

cargo check breaks build_ffi_test #197

Closed

baumanj added 5 commits March 3, 2020 13:45

Work around test failure by running cargo check after cargo test

6c13760

Fix typo

317c6c4

Implement PR feedback for testing and error handling

733eba8

Change read_buf()'s size arg from usize -> u64

ba1b7c9

This simplifies a number of the callers and avoids try_into() calls. Also, simplify the bound on T from ReadBytesExt to Read, which is sufficient.

baumanj requested review from kinetiknz and SingingTree March 4, 2020 01:19

kinetiknz approved these changes Mar 4, 2020

View reviewed changes

SingingTree approved these changes Mar 4, 2020

View reviewed changes

baumanj merged commit da2cb93 into master Mar 4, 2020

baumanj deleted the avif-pitm branch March 4, 2020 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic AVIF parsing support #193

Add basic AVIF parsing support #193

baumanj commented Feb 21, 2020 •

edited

Loading

kinetiknz left a comment

kinetiknz left a comment •

edited

Loading

kinetiknz Feb 25, 2020

baumanj Feb 28, 2020

kinetiknz Mar 2, 2020

baumanj Mar 3, 2020

kinetiknz Mar 3, 2020

baumanj Mar 3, 2020

kinetiknz Mar 3, 2020 •

edited

Loading

baumanj Mar 4, 2020

SingingTree Feb 25, 2020

baumanj commented Feb 26, 2020

kinetiknz commented Feb 27, 2020

baumanj commented Feb 28, 2020

baumanj commented Feb 29, 2020 •

edited

Loading

kinetiknz left a comment

kinetiknz commented Mar 2, 2020

baumanj commented Mar 3, 2020

baumanj commented Mar 3, 2020 •

edited

Loading

kinetiknz commented Mar 3, 2020

baumanj commented Mar 3, 2020

baumanj commented Mar 3, 2020

kinetiknz commented Mar 3, 2020

baumanj commented Mar 3, 2020

kinetiknz left a comment

Add basic AVIF parsing support #193

Add basic AVIF parsing support #193

Conversation

baumanj commented Feb 21, 2020 • edited Loading

kinetiknz left a comment

Choose a reason for hiding this comment

kinetiknz left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kinetiknz Mar 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

baumanj commented Feb 26, 2020

kinetiknz commented Feb 27, 2020

baumanj commented Feb 28, 2020

baumanj commented Feb 29, 2020 • edited Loading

kinetiknz left a comment

Choose a reason for hiding this comment

kinetiknz commented Mar 2, 2020

baumanj commented Mar 3, 2020

baumanj commented Mar 3, 2020 • edited Loading

kinetiknz commented Mar 3, 2020

baumanj commented Mar 3, 2020

baumanj commented Mar 3, 2020

kinetiknz commented Mar 3, 2020

baumanj commented Mar 3, 2020

kinetiknz left a comment

Choose a reason for hiding this comment

baumanj commented Feb 21, 2020 •

edited

Loading

kinetiknz left a comment •

edited

Loading

kinetiknz Mar 3, 2020 •

edited

Loading

baumanj commented Feb 29, 2020 •

edited

Loading

baumanj commented Mar 3, 2020 •

edited

Loading