Read media files more efficiently from zipped files #9761

rtibbles · 2022-10-05T16:15:22Z

Currently in our implementation of H5P, we load the entire H5P file into the frontend and then extract all its constituent files into Blob objects and create URLs for them.

This has the distinct advantage of being very robust and being very predictable. It has the slight downside of causing incredibly long loading times for H5P files that contain large bundled media files such as video or audio.

Once #9157 has been implemented, it would be useful to augment the zip file wrapper in the following ways:

Instead of reading the entire file at once, first read the file metadata and listing in the zip file via a range request. To accomplish this we can use try to either directly use zipinfo.js or vendor it and modify it for our needs.

For reference, this is a Python implementation of a similar mechanism https://github.com/saulpw/unzip-http.

Once we have the listing of all the files, in a minimal number of requests, we load and unzip all files below a certain size cut off (say 500KB) at which we decide to defer loading of a file.

For any files not loaded through the above mechanism, we do the following:

First we enhance the zipcontent endpoint to support range requests, this will allow video and audio files to be easily played - due to a lack of support for a seek method in Python 3.6 on extracted files, we will have to brook some inefficiency on Python 3.6 backends.
Secondly, we enhance the zipfile wrapper to allow passing a function to generate the URL for the large file. If this is present, then it will defer to this URL generating function for large files, if not, it will behave as it currently does. We then inject this function where it is needed to provide appropriate references to the zipcontent endpoint.
Thirdly, we will update the zipcontent endpoint to serve files from within any compressed file format (again).

The text was updated successfully, but these errors were encountered:

rtibbles · 2022-11-03T23:42:11Z

A recent issue arose that showed the usefulness of this - when the H5P contains particularly large media files (whether because they are very long or have not been adequately compressed), this can cause excessive memory usage on memory constrained client devices, which will cause fflate to return undefined from its unzip command.

Avoiding the loading and unzipping of large media files until they are demanded would help to prevent this issue, while still solving the issue that client side unpacking of H5P files was intended to resolve, which is to avoid the hundreds of HTTP requests initiated by a more naive implementation.

rtibbles · 2024-01-02T23:02:07Z

I have a slight concern with this approach - mostly to do with the fact that the ability to generate an object URL for a MediaSource object, while currently widely supported, is being phased out and will be dropped in the future.

MediaSource objects instead have to be programmatically attached via the objectSrc attribute. To avoid the complexity here, I think it might be simpler to handle files too large to be loaded as part of the main zip file in this way instead:

First we enhance the zipcontent endpoint to support range requests, this will allow video and audio files to be easily played - due to a lack of support for a seek method in Python 3.6 on extracted files, we will have to brook some inefficiency on Python 3.6 backends.
Secondly, we enhance the zipfile wrapper to allow passing a function to generate the URL for the large file. If this is present, then it will defer to this URL generating function for large files, if not, it will behave as it currently does. We then inject this function where it is needed to provide appropriate references to the zipcontent endpoint.

rtibbles · 2024-10-29T22:45:47Z

Spec has been updated to address @rtibbles' concerns here.

rtibbles self-assigned this Oct 5, 2022

rtibbles added this to the Kolibri 0.18: General maintenance milestone Nov 7, 2024

rtibbles linked a pull request Nov 7, 2024 that will close this issue

Handle frontend loaded zip archive content (h5p, bloomd) that contains large files #12805

Draft

9 tasks

marcellamaki modified the milestones: Kolibri 0.18: General maintenance, upcoming major Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read media files more efficiently from zipped files #9761

Read media files more efficiently from zipped files #9761

rtibbles commented Oct 5, 2022 •

edited

Loading

rtibbles commented Nov 3, 2022

rtibbles commented Jan 2, 2024

rtibbles commented Oct 29, 2024

Read media files more efficiently from zipped files #9761

Read media files more efficiently from zipped files #9761

Comments

rtibbles commented Oct 5, 2022 • edited Loading

rtibbles commented Nov 3, 2022

rtibbles commented Jan 2, 2024

rtibbles commented Oct 29, 2024

rtibbles commented Oct 5, 2022 •

edited

Loading