Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding animated WebP is 4x slower than libwebp-sys + webp-animation #119

Open
Shnatsel opened this issue Oct 22, 2024 · 0 comments
Open

Comments

@Shnatsel
Copy link
Contributor

In image v0.25.4 and image-webp v0.2.0, decoding the attached animated WebP is 4x slower than using libwebp-sys + webp-animation: sample.zip

image

use std::error::Error;
use image::{codecs::webp::WebPDecoder, AnimationDecoder, ImageReader};

fn main() -> Result<(), Box<dyn Error>> {
    let input = std::env::args().nth(1).unwrap();    
    let reader = ImageReader::open(input)?.into_inner();
    let decoder = WebPDecoder::new(reader)?;
    let mut iter = decoder.into_frames();
    while let Some(_frame) = iter.next() {}
    Ok(())
}

hyperfine results:

  Time (mean ± σ):     411.9 ms ±   3.9 ms    [User: 372.2 ms, System: 38.8 ms]
  Range (min … max):   407.6 ms … 416.8 ms    10 runs

libwebp-sys + webp-animation

use std::error::Error;
use webp_animation::prelude::*;

fn main() -> Result<(), Box<dyn Error>> {
    let input = std::env::args().nth(1).unwrap();    

    let buffer = std::fs::read(input).unwrap();
    let decoder = Decoder::new(&buffer).unwrap();
    let mut iter = decoder.into_iter();
    while let Some(_frame) = iter.next() {}
    Ok(())
}

hyperfine results:

  Time (mean ± σ):      95.9 ms ±   0.4 ms    [User: 128.7 ms, System: 7.4 ms]
  Range (min … max):    95.3 ms …  96.7 ms    30 runs

Analysis

webp-animation shows a bit of multi-threading happening on the profile, with user time being longer than the total execution time, but even accounting for that image-webp is 3x slower.

Breakdown of where the time is spent in image, recorded by samply: https://share.firefox.dev/4fc3utg

The greatest contributors seem to be image_webp::vp8::Vp8Decoder::decode_frame (48%), image_webp::extended::do_alpha_blending (20%), image_webp::vp8::Frame::fill_rgba (16%).

Within decode_frame the biggest contributor is image_webp::vp8::Vp8Decoder::read_coefficients (12% self time, 32% total time), and the code of that function looks like it could be optimized further to reduce bounds checks, etc. #71 is also relevant, but only accounts for 20% of the total time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant