Laying out flex trees with height/width set to Auto is slow (100x slower than when set to Percent) #502

matthewgapp · 2023-06-21T17:46:23Z

`taffy` version

0.3.12

Platform

Rust

What you did

Reproduced via the script below.

use std::time::Instant;

use taffy::{
    prelude::{Rect, Size},
    style::{LengthPercentageAuto, Style},
};

fn build_style() -> Style {
    let margin = Rect {
        left: LengthPercentageAuto::Points(20.),
        top: LengthPercentageAuto::Points(20.),
        right: LengthPercentageAuto::Points(0.),
        bottom: LengthPercentageAuto::Points(0.),
    };

    let size = Size {
        width: taffy::style::Dimension::Auto,
        height: taffy::style::Dimension::Auto,

        // changing width and height to 100% decreases layout time by ~ 100x

        // width: taffy::style::Dimension::Percent(1.),
        // height: taffy::style::Dimension::Percent(1.),
    };

    let mut style = Style::DEFAULT;

    // adding flex column increases time by 10x 
    style.flex_direction = taffy::style::FlexDirection::Column;

    // adding margin increases layout time by 200 - 500ms
    style.margin = margin;

    style.size = size;
    style
}

fn main() {
    let mut app = taffy::Taffy::new();

    let root = app.new_leaf(build_style()).unwrap();

    let mut parent = root;

    // adding depth increases time ~ exponentially
    for _ in 0..200 {
        let child = app.new_leaf(build_style()).unwrap();
        app.add_child(parent, child).unwrap();

        // adding more siblings increases time ~ linearly from what I can tell
        let child = app.new_leaf(build_style()).unwrap();
        app.add_child(parent, child).unwrap();

        let child = app.new_leaf(build_style()).unwrap();
        app.add_child(parent, child).unwrap();

        parent = child;
    }

    let time_now = Instant::now();

    app.compute_layout(
        root,
        Size {
            width: taffy::style::AvailableSpace::Definite(800.),
            height: taffy::style::AvailableSpace::Definite(600.),
        },
    )
    .unwrap();

    // this takes about 1,500ms on my M1 Macbook Pro in dev build
    println!("Time elapsed: {:?}", time_now.elapsed());
}

What went wrong

Laying out flex trees with height and width set to Auto is slow (and unusable). It's 100x - 200x slower than when height and width are set to 100%. I would expect height/width of Auto to perform about the same (maybe a littler slower) as height/width 100% Percent. But not 100 - 200x slower

Computing a flex layout of a tree of nodes scales extremely poorly with increasing tree depth, to the point where Taffy is not practical for trees that exceed a depth of about 20.

So think it boils down to a combination of flex column and width, height being set to Auto (see code comments in repro snippet).

I looked at the call stack and it appears that when the dimensions are set to Auto, it will recurse through the nodes ~N^2 times. The bottom-most nodes end up getting laid out almost 10,000x.

Caching of layout results is disabled on the layout pass. There should probably be some memoization at some point in that recursion so that nodes are not laid out redundantly over and over during a layout pass.

I'm not sure why flex direction of column makes things so much worse.

Additional information

Workaround is to use width and height set to 100% instead of Auto.

The text was updated successfully, but these errors were encountered:

nicoburns · 2023-10-12T14:47:21Z

@matthewgapp To follow up on this:

In response to "this takes about 1,500ms on my M1 Macbook Pro in dev build": I think any discussion of performance or benchmarking ought to happen in the context of release mode. Taffy is small and fast to compile (even in release mode), and relies heavily on compiler optimisations for performance. Debug mode will be slow.

It is possible to override the optimisation level just for a single dependency, so you wouldn't necessarily need to compile your entire app in release mode in order to compile Taffy in release mode. See: https://doc.rust-lang.org/nightly/cargo/reference/profiles.html#overrides
We have fixed the issue with columns in Fix caching for flexbox columns #556. The issue was that our caching implementation was (unintentionally) row-centric, and column layouts caused cache thrashing. We have improved the caching logic, and columns now lay out just as quickly as rows (with a small caveat: see point 3 below). This fix has also been backported to the 0.3.x branch and is available in Taffy v0.3.16.
Regarding margins making things slower: this seems to be because the margins are cumulatively large enough that they are causing flexbox shrinking to occur. And I suspect that it is causing it to occur repeatedly at each level of the tree. Because the first N layers will shrink the fit the available space, and then the (N+1)th layer will add a little extra margin causing all N layers below it to need to relayout. If you make the margins small enough that they fit within the specified viewport then layout is much faster (making them a little smaller so that they still don't fit but by less also helps).

This is also a partial cause of columns being slower than rows in your example: You have set a viewport that is 800px wide but only 600px tall. Which means that more horizontal nodes can layout before triggering flex-shrinking. So even with the fix mentioned above, you need to set the viewport to a square size to get symmetric performance.
Regarding algorithmic complexity: I believe that you have correctly identified the (worst case) time complexity as being linear with respect to the total node count and exponential with respect to the depth of the tree. Specifically for Taffy's implementation I believe it is O(node_count * 4^depth), and while there might be scope to reduce that 4 to a 3 or even a 2 (I haven't fully thought this through), I believe that the time complexity being (worst case) exponential is an inherent property of the flexbox algorithm rather than a flaw in Taffy's implementation.

With this in mind, I think I would expect large deep trees with styles that defeat the caching strategy to be 100x - 200x slower than similarly sizes trees that are either shallow or have styles which allow the caching strategy to reduce the time complexity, as you're effectively comparing a linear time algorithm with an exponential time algorithm.
Optimisation opportunities. Having said that, there are a few places which there is scope for optimisation.
- Flexbox "column wrap" containers: These are particularly slow currently. And this is called out in the spec as being a slow case, with a faster algorithm specified which I believe Taffy does not currently implement.
- Improved caching. You mentioned that caching is disabled for the "final layout" pass, which is one of them. But it's not straightforward to change. The reason that we force cache busting because it allows us to optimise layout in earlier iterations. Specifically if at any point in the tree there is a node with a fixed width and height then then nodes below that node will be skipped entirely in earlier layout iterations, which I believe will typically be a much bigger speedup. I believe it would be possible to implement things so that we could have both. But it would be tricky and would involve tracking whether a node had actually skipped work on previous passes or not. There are probably also other ways in which caching could be improved, but from my perspective there is no obvious low-hanging fruit.
- Node "statistics" / optimisation hints: We could potentially speed things up by allowing nodes to communicate optimisation hints to their parent. For example, we currently need to take into the possibility that varying the height constraint on a node might affect it's width, because there are some nodes where that is the case (vertical text and flexbox column wrap nodes for example). But it generally isn't, and if we knew that those layouts could be skipped in that case. Again, this is quite tricky to get right as there are a lot of different cases to account for.
- Parallelisation. See Enable multithreaded layout #27. I suspect we could get close to linear speedup proportional to the number of cores by parallelising the algorithm.
Probably only the "column wrap" optimisation is likely in the short term.
New algorithm(s) As noted, the Flexbox algorithm has inherently exponential time complexity with respect to the depth of the tree. But that isn't true of all layout algorithms and we could implement other ones that are faster. Support Morphorm/Subform layout #308 explores one such possibility which I am considering pursuing, although probably with an algorithm inspired by Morphorm (and with better CSS box model interop) rather than being 100% compatible with it.
Benchmarks Just wanted to say thanks again for the benchmarking code you have provided here and in Add flexbox layout benches with all-auto node sizes #503. I have started incorporating some of those styles/trees into the benchmarks in Refactor benchmarks #472 which will hopefully land soon and allow us to track these cases better going forwards.

alice-i-cecile · 2023-10-12T15:21:10Z

Awesome response, thanks for writing that up. I know these issues aren't all fully resolved yet, but for the sake of keeping issues actionable I'm going to close this out. Feel free to spin out new issues for pathological cases / perf improvement ideas, or keep discussing the big picture here.

matthewgapp added the bug Something isn't working label Jun 21, 2023

matthewgapp changed the title ~~Laying out flex trees with height and width set to Auto is slow (and unusable)~~ Laying out flex trees with height/width set to Auto is slow (100x slower than when set to Percent) Jun 21, 2023

matthewgapp mentioned this issue Jun 25, 2023

Add flexbox layout benches with all-auto node sizes #503

Closed

1 task

nicoburns mentioned this issue Jun 25, 2023

Growing UI nodes Fix bevyengine/bevy#8931

Merged

matthewgapp mentioned this issue Jun 26, 2023

Fix slow flexbox layout when auto sized #504

Closed

nicoburns added the performance Layout go brr label Jul 13, 2023

alice-i-cecile closed this as not planned Won't fix, can't repro, duplicate, stale Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Laying out flex trees with height/width set to Auto is slow (100x slower than when set to Percent) #502

Laying out flex trees with height/width set to Auto is slow (100x slower than when set to Percent) #502

matthewgapp commented Jun 21, 2023 •

edited

Loading

nicoburns commented Oct 12, 2023 •

edited

Loading

alice-i-cecile commented Oct 12, 2023

Laying out flex trees with height/width set to Auto is slow (100x slower than when set to Percent) #502

Laying out flex trees with height/width set to Auto is slow (100x slower than when set to Percent) #502

Comments

matthewgapp commented Jun 21, 2023 • edited Loading

taffy version

Platform

What you did

What went wrong

Additional information

nicoburns commented Oct 12, 2023 • edited Loading

alice-i-cecile commented Oct 12, 2023

matthewgapp commented Jun 21, 2023 •

edited

Loading

`taffy` version

nicoburns commented Oct 12, 2023 •

edited

Loading