Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added progressive refinement example files #131

Merged
merged 9 commits into from
May 17, 2021

Conversation

leo-barnes
Copy link
Collaborator

I added some examples of files that contain multiple layers as well as the lsel and a1lx properties. I could not figure out how to generate streams containing multiple operating points.

@joedrago
Copy link

It might be better to have @cconcolato review these -- I haven't personally attempted any kind of progressive implementation at all yet, so I'd be coming into this somewhat blind.

@negge
Copy link
Collaborator

negge commented Mar 11, 2021

Thanks for the files @leo-barnes. I had a look at animals_00_multilayer.avif and it appears the spatial scalability layers are not being set correctly.

Using MP4Box to dump the second item, and passing through dump_obu I get:

$ ~/git/gpac/bin/gcc/MP4Box -dump-item 2 testFiles/Apple/multilayer_examples/animals_00_multilayer.avif
$ ~/git/libaom/aom_build/tools/dump_obu item_id02
Temporal unit 0
  OBU type:        OBU_TEMPORAL_DELIMITER
      extension:   no
      length:      2
  OBU type:        OBU_SEQUENCE_HEADER
      extension:   no
      length:      16
  OBU type:        OBU_FRAME
      extension:   no
      length:      88543
  OBU type:        OBU_FRAME
      extension:   yes
      temporal_id: 0
      spatial_id:  0
      length:      1918772
  TU size: 2007333
  OBU overhead:    5
File total OBU overhead: 5

As said above, this second layer is 2048x1536 and should have spatial_id of 1. Also, is the base layer of 1024x768 missing a temporal_id and spatial_id entry?

@negge
Copy link
Collaborator

negge commented Mar 12, 2021

As said above, this second layer is 2048x1536 and should have spatial_id of 1.

This appears to be a bug in dump_obu. I have filed an issue against libaom here: https://bugs.chromium.org/p/aomedia/issues/detail?id=2992

@leo-barnes
Copy link
Collaborator Author

leo-barnes commented Mar 12, 2021

As said above, this second layer is 2048x1536 and should have spatial_id of 1.

This appears to be a bug in dump_obu. I have filed an issue against libaom here: https://bugs.chromium.org/p/aomedia/issues/detail?id=2992

Yeah, there's a bug in libaom as well with regards to what spatial_id you get out when decoding. At the moment you always get the spatial_id of the final layer even though the dimensions clearly match what I would expect of the base layer. @wantehchang confirmed by code inspection that it seems to be a bug and sent me a patch that I'm going to try today.

Here's the issue I filed about it:
https://bugs.chromium.org/p/aomedia/issues/detail?id=2993

@leo-barnes
Copy link
Collaborator Author

Turns out libaom was automatically creating multiple operating points when creating the streams. I haven't found any OBU parser that can actually output what's inside the OBUs for me, and oddly enough the libaom decoder doesn't seem to give me any way to query the number of operating points in the stream either (https://bugs.chromium.org/p/aomedia/issues/detail?id=2995).

I can verify that there are multiple operating points by setting AV1D_SET_OPERATING_POINT when decoding, so I'll create some files that use the new operating point selector property as well.

@leo-barnes
Copy link
Collaborator Author

I updated libaom to v2.0.2 per Wan-Teh's suggestion (I was using v1.0.something), so the encoded data is now slightly different.

@tdaede
Copy link
Contributor

tdaede commented Mar 15, 2021

I am unclear about the usage of a1lx here. The a1lx-containing and a1lx-missing files differ in that the a1lx example contains one Item, whereas the a1lx-less contains two (or more) Items. However, I am not sure that is enough for progressive decoding of a MIAF image - I would expect there still to be two or more Items regardless of the presence or absence of a1lx.

@leo-barnes
Copy link
Collaborator Author

leo-barnes commented Mar 16, 2021

I am unclear about the usage of a1lx here. The a1lx-containing and a1lx-missing files differ in that the a1lx example contains one Item, whereas the a1lx-less contains two (or more) Items. However, I am not sure that is enough for progressive decoding of a MIAF image - I would expect there still to be two or more Items regardless of the presence or absence of a1lx.

Good question! We may want to add some more wording to the spec with some examples of how we think files should look. As a summary of the last couple of meetings, the idea is as follows:

  1. Use lsel if you want an item to decode a specific layer. The decoder shall output this layer and no other layer. In other words, no progressive refinement allowed. The use case we had in mind was something like multiple viewpoints sharing a single base layer or something like that.
  2. If lsel is not specified, decode the highest layer, but intermittent layers may be displayed for progressive refinement.
  3. Use a1op to select operating point. lsel may be combined with a1op to specify both operating point and layer. If lsel is not specified, follow point 2 above.

Creating multiple items that selects different layers and adding them to an altr group was discussed, but the conclusion was that it ends up being pretty complicated (and altr groups are not very well supported in HEIF as of right now). It's also hard/impossible for a decoder to know if it can reuse state from decoding an earlier item, which makes decoding inefficient. The conclusion was that if you want to do progressive refinement, you should not create multiple items or specify lsel.

Next question is why we have a1lx. The main use case is to do progressive refinement while downloading. But how do you know when you have enough bits to send to the decoder? You can do "polling" by periodically sending some more data to the decoder and hope you get a frame back, but that is pretty inefficient. The AVIF writer can store the layers in separate extents, but that is ambiguous and may have been done for some other reason, like interleaving chunks between multiple tiles. So a1lx was added as an explicit way of letting the parser know when it has a full layer that can be sent to the decoder.

There is nothing stopping you from using a1lx with a file containing multiple items with lsel sharing the same data, but it's not really why it was added.

@cconcolato
Copy link
Collaborator

I had a quick look at the files. They look good to me. In the multi-item image, I was able to extract the items separately, repackage them separately and compare the 2 qualities.

One thing I noted at least in animals_00_singlelayer.avif with MP4Box.js FileReader is that the mdat box starts at offset 295, and its payload starts at offset 303 (4 bytes for the length, 4 bytes for the type), while the extent offset starts at 311. I'm curious what these additional 8 bytes are. Any idea?

I haven't found any OBU parser that can actually output what's inside the OBUs for me

I usually use MP4Box (command line) as follows:

MP4Box -dump-item 1:path=file.obu file.avif // 1 is the item id you want to extract
MP4Box -add file.obu file.mp4 // imports the item data as a track
MP4Box -dnal 1 file.mp4 // dumps the OBU structure, 1 here is the track id that was created in the previous call

This produces an XML structure that looks like:

<OBUTrack trackID="1" SampleCount="1" TimeScale="25000">
 <OBUConfig>
   <OBU size="16" type="seq_header" header_size="2" has_size_field="1" has_ext="0" temporalID="0" spatialID="0" width="2048" height="1536" bit_depth="8" still_picture="0" OperatingPointIdc="0" color_range="1" color_description_present_flag="0" color_primaries="2" transfer_characteristics="2" matrix_coefficients="2" profile="0" level="12" />
 </OBUConfig>
 <OBUSamples>
  <Sample number="1" DTS="0" CTS="0" size="1999503" RAP="1" >
   <OBU size="16" type="seq_header" header_size="2" has_size_field="1" has_ext="0" temporalID="0" spatialID="0" width="2048" height="1536" bit_depth="8" still_picture="0" OperatingPointIdc="0" color_range="1" color_description_present_flag="0" color_primaries="2" transfer_characteristics="2" matrix_coefficients="2" profile="0" level="12" />
   <OBU size="122320" type="frame" header_size="4" has_size_field="1" has_ext="0" temporalID="0" spatialID="0" uncompressed_header_bytes="29" frame_type="key" refresh_frame_flags="255" show_frame="1" show_existing_frame="0" nb_tiles="1" >
     <Tile number="0" start="33" size="122287"/>
   </OBU>
   <OBU size="1877167" type="frame" header_size="5" has_size_field="1" has_ext="1" temporalID="0" spatialID="1" uncompressed_header_bytes="17" frame_type="inter" refresh_frame_flags="1" show_frame="1" show_existing_frame="0" nb_tiles="1" >
     <Tile number="0" start="22" size="1877145"/>
   </OBU>
  </Sample>
 </OBUSamples>
</OBUTrack>

@tdaede
Copy link
Contributor

tdaede commented Mar 16, 2021

Nit: it looks like all the files start with a temporal unit OBU, but they are a SHOULD NOT in the ISOBMFF spec.

@cconcolato
Copy link
Collaborator

@tdaede I also thought it would be a Temporal Delimiter but it does not seem to be one. I've tried patching the offsets in the file and extracting the whole thing but MP4Box then fails with:

[AV1] computed OBU size -1 (input value = 0). Skipping.

@tdaede
Copy link
Contributor

tdaede commented Mar 17, 2021

I was able to see it via:

$ MP4Box -dump-item 1 animals_00_multilayer.avif
$ dump_obu item_id01 
Temporal unit 0
  OBU type:        OBU_TEMPORAL_DELIMITER
      extension:   no
      length:      2
  OBU type:        OBU_SEQUENCE_HEADER
      extension:   no
      length:      16
  OBU type:        OBU_FRAME
      extension:   no
      length:      122320
  TU size: 122338
  OBU overhead:    3
File total OBU overhead: 3

I also double checked that 0x12 0x00 appears in the .avif, in case MP4Box was cleverly adding a TU on export.

@cconcolato
Copy link
Collaborator

@tdaede I had not realized the TD were there, but in my experiment I was talking about the 8 bytes before the TD. The offset indicated in the extent points to the start of the TD. I still don't know what those bytes before are.

@leo-barnes can you double-check in the tool that generated the container what those bytes are?

@leo-barnes
Copy link
Collaborator Author

leo-barnes commented Mar 17, 2021

@cconcolato

One thing I noted at least in animals_00_singlelayer.avif with MP4Box.js FileReader is that the mdat box starts at offset 295, and its payload starts at offset 303 (4 bytes for the length, 4 bytes for the type), while the extent offset starts at 311. I'm curious what these additional 8 bytes are. Any idea?

These files are using the 64-bit form box size for mdat. In other words:

uint32_t size = 0x00000001
uint32_t 4cc = 'mdat'
uint64_t longSize = real-size

@tdaede

I was able to see it via:

$ MP4Box -dump-item 1 animals_00_multilayer.avif
$ dump_obu item_id01 
Temporal unit 0
  OBU type:        OBU_TEMPORAL_DELIMITER
      extension:   no
      length:      2
  OBU type:        OBU_SEQUENCE_HEADER
      extension:   no
      length:      16
  OBU type:        OBU_FRAME
      extension:   no
      length:      122320
  TU size: 122338
  OBU overhead:    3
File total OBU overhead: 3

I also double checked that 0x12 0x00 appears in the .avif, in case MP4Box was cleverly adding a TU on export.

Any idea how I configure libaom to not output the TU OBU? I also noticed it but couldn't figure out how to get it to stop. I can of course manually strip it, but there should ideally be some way of configuring it I hope.

@tdaede
Copy link
Contributor

tdaede commented Mar 17, 2021

Any idea how I configure libaom to not output the TU OBU? I also noticed it but couldn't figure out how to get it to stop. I can of course manually strip it, but there should ideally be some way of configuring it I hope.

Unfortunately no, it has to be manually stripped by the packager.

@leo-barnes
Copy link
Collaborator Author

I've now removed the TU delimiter from all the files from what I can see.

I have also added two grid examples. One uses lsel to create two grids, and one uses a1lx that should be able to give you progressive refinement. Both have the layers from all tiles interleaved.

I think the files are correct, but I would be very thankful if people could do some sanity checking on them in case I screwed something up in my scripts.

Also removed TU delimiter from animals_00_multilayer.avif
@leo-barnes leo-barnes force-pushed the u/lbarnes/multilayer_examples branch from 596185f to 8acb870 Compare April 27, 2021 09:01
@leo-barnes
Copy link
Collaborator Author

I've now updated the a1lx files to the new box structure. animals_00_multilayer_grid_a1lx.avif has been changed so that one of the tiles that is small enough to fit in 64k is using the small size a1lx.

I've also updated animals_00_multilayer.avif to get rid of the TU delimiter.

@leo-barnes
Copy link
Collaborator Author

I've tried running the files through the compliance warden and fixed an issue for the singlelayer file. All the multilayer files fail with the tool assertion issue in the linked ComplianceWarden issue above, so can't tell if anything is wrong with them.

@leo-barnes
Copy link
Collaborator Author

Ran through the multilayer files and fixed the av1C in all of them. Encountered two more issues with the warden:
gpac/ComplianceWarden#24
gpac/ComplianceWarden#25

@leo-barnes
Copy link
Collaborator Author

@tdaede @negge
If you have some way of sanity checking my files that would be great. Thanks!

@tdaede
Copy link
Contributor

tdaede commented May 6, 2021

I tested these with our most recent stack of patches on GPAC and noticed the following:

animals_00_multilayer_a1op.avif: This seems to still have the fullbox bytes at the beginning of the a1op box, it's 4 bytes larger than it should be.

animals_00_multilayer_a1lx.avif: The a1lx box seems to be encoding the large_size bitfield as 4 bytes rather than 1 byte.

@leo-barnes
Copy link
Collaborator Author

@tdaede

animals_00_multilayer_a1op.avif: This seems to still have the fullbox bytes at the beginning of the a1op box, it's 4 bytes larger than it should be.

Are you sure/using the latest commit? The boxes are supposed to be 9 bytes in size if they are simple boxes. If I pass it through my parser I see this:

      ('a1op' "Operating Point Selector Box", size = 9, offset = 283) {
        Operating point: 0
      }
      ('a1op' "Operating Point Selector Box", size = 9, offset = 292) {
        Operating point: 1
      }

If I do the same for testFiles/Xiph/quebec_3layer_op2.avif, I see this:

      ('a1op' "Operating Point Selector Box", size = 9, offset = 254) {
        Operating point: 2
      }

I see the same when looking at the actual bytes in a hex viewer.

animals_00_multilayer_a1lx.avif: The a1lx box seems to be encoding the large_size bitfield as 4 bytes rather than 1 byte.

My parser shows this:

      ('a1lx' "Layered Image Indexing Box", size = 21, offset = 242) {
        large_sizes: true
        Layer sizes: 122336 0 0
      }

I've looked at the bytes in my hex editor and they look correct. And the size is the size I would expect:
4 bytes for size
4 bytes for a1lx
1 byte for element size
4*3 bytes for elements
Total: 21 bytes

@tdaede
Copy link
Contributor

tdaede commented May 6, 2021

Ah indeed my bad, I failed to set up a tracking branch correctly when I pulled your changes last time. I've updated them and now everything parses correctly here. Sorry about the false alarm!

@leo-barnes
Copy link
Collaborator Author

Ah indeed my bad, I failed to set up a tracking branch correctly when I pulled your changes last time. I've updated them and now everything parses correctly here. Sorry about the false alarm!

No worries, I'm just glad someone could sanity check them!
Always so easy to miss things when you write both the writer and reader. 😊

@leo-barnes leo-barnes merged commit 4be01ec into master May 17, 2021
@leo-barnes leo-barnes deleted the u/lbarnes/multilayer_examples branch May 17, 2021 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create conformance file with multi layer image
5 participants