Quadratic code size due to combination of ? and Drop for linear code #120604

psychon · 2024-02-03T10:27:55Z

Hi,

first: Sorry if this is a duplicate. I tried, but did not find anything.

There is a bug report for x11rb about code size. Some macro caused 111 KiB of code to be emited: psychon/x11rb#912

I tried to produce a self-contained example of what is going on and here is what I came up with the following

macro_rules! the_macro {
    {
        $cookie_name:ident { $($field_name:ident,)* }
    } => {
        struct $cookie_name {
            $($field_name: support::Cookie,)*
        }
        
        #[inline(never)]
        fn send_requests() -> Result<$cookie_name, support::Error> {
            Ok($cookie_name {
                $($field_name: support::send_request()?,)*
            })
        }
    }
}

the_macro!{Cookies { a, b, c, d, e, } }

fn main() {
    let _ = send_requests();
}

mod support {
    #[inline(never)]
    pub fn send_request() -> Result<Cookie, Error> {
        // Some code to trick the optimizer into not optimising everything away
        if std::env::args().next().is_some() {
            Ok(Cookie(42))
        } else {
            Err(Error)
        }
    }
    
    pub struct Error;
    
    pub struct Cookie(u16);
    impl Drop for Cookie {
        #[inline(never)]
        fn drop(&mut self) {
            println!("{}", self.0);
        }
    }
}

(#[inline(never)] just exists so that I can read the resulting assembly; the original code does not use inline hints, but instead the functions are big enough not to be inlined. I'm not sure whether drop() of the Cookie perhaps ends up being inlined and being part of the huge code size.)

This code uses a macro, so just to make it clear what the problem is, here is the relevant function after macro expansion:

#[inline(never)]
fn send_requests() -> Result<Cookies, support::Error> {
    Ok(Cookies {
            a: support::send_request()?,
            b: support::send_request()?,
            c: support::send_request()?,
            d: support::send_request()?,
            e: support::send_request()?,
        })
}

I expected to see this happen: Linear code size (= small code)

Instead, this happened: This results in quadratic code size due to the drop calls.

Looking at the generated ASM in release mode and trying to translate this back into something rust-y, the code looks like this:

fn send_requests() -> Result<Cookies, support::Error> {
    let a = match support_send_request() {
       Err(e) => return Err(e);
       Ok(a) => a;
    };
    let b = match support_send_request() {
       Err(e) => { drop(a); return Err(e) };
       Ok(b) => b;
    };
    let c = match support_send_request() {
       Err(e) => { drop(a); drop(b); return Err(e) };
       Ok(c) => c;
    };
    let d = match support_send_request() {
       Err(e) => { drop(a); drop(b); drop(c); return Err(e) };
       Ok(d) => d;
    };
    let e = match support_send_request() {
       Err(e) => { drop(a); drop(b); drop(c); drop(d); return Err(e) };
       Ok(e) => e;
    };
    Ok(Cookies { a, b, c, d, e })
}

The actual code where this was observed has not just 5 cases (a, b, c, d, e) but 59 ones. Here, the quadratic behaviour starts to hurt a lot. I counted 1709 calls to functions with drop_in_place in their name.

Meta

rustc --version --verbose: Uhm... I tried this on the playground. But I can also reproduce this with my ancient version locally:

rustc 1.70.0
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.70.0
LLVM version: 16.0.6

The text was updated successfully, but these errors were encountered:

quaternic · 2024-02-03T21:51:05Z

As a workaround, you should get better results if you can structure the code like:

let a = make()?;
let b = make()?;
let c = make()?;
Some([a,b,c])

(At least it works when applied to the following case.)

Here's a self-contained example that reproduces the same quadratic growth:

#![allow(improper_ctypes)]
pub struct NoisyDrop;

extern {
    fn make() -> Option<NoisyDrop>;
    fn extdrop(x: &mut NoisyDrop);
}

impl Drop for NoisyDrop {
    fn drop(&mut self) {
        unsafe {extdrop(self)}
    }
}

#[inline(never)]
pub fn foo() -> Option<[NoisyDrop; 8]> {
    unsafe {
        Some(
            [
                make()?,
                make()?,
                make()?,
                make()?,
                make()?,
                make()?,
                make()?,
                make()?,
            ]
        )
    }
}

https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=8412f8f3ded022aeae0d4456ddd1658a

psychon · 2024-02-04T08:44:08Z

Thanks a lot for the hint.

Edit: PR for my work-around: psychon/x11rb#914

I worked around this by using a `Vec` and `FromIterator, E> for Result`.

let names = ["foo", "bar"]; // I left this out in my example above
let cookies: Result<Vec<_>, _> = names.into_iter().map(|name| send_request(name)).collect();
Ok(Cookies { cookies: cookies? })

This has the benefit of generating less code in the macro and thus should be way more obvious to optimise.

This has the downside of requiring an extra allocation for the Vec. But since this code does network I/O and is "cold" (usually only run once at startup), I don't care much about it.

The only code using this Cookies struct also comes from the macro and thus can be adapted as well without a "real" API break.

(Also, I like your approach of working around the optimiser way more than mine. Nice. :-) )

quaternic · 2024-02-04T14:24:12Z

I initially thought it was caused by drop-order, but the fields of the partially constructed struct or array are dropped in reverse order of construction, so I don't see any reason the compiler couldn't do a better job here.

@rustbot label +T-compiler +C-optimization +I-heavy -needs-triage

yshui · 2024-02-05T17:23:44Z

FYI clang does this for C++: (example from @psychon)

struct HasDrop {
    HasDrop();
    ~HasDrop();
};

void fn() {
    HasDrop a0;
    if (cond0) return;
    HasDrop a1;
    if (cond1) return;
    // repeat for a2-a9
    HasDrop a10;
    if (cond10) return;

    ....
}

will be lowered to something like this:

void fn() {
    HasDrop a0;
    if (cond0) goto cleanup1;
    HasDrop a1;
    if (cond1) goto cleanup2;
    // repeat for a2-a9
    HasDrop a10;
    if (cond10) goto cleanup;
    ....
cleanup:
    drop(a10);
cleanup9:
    drop(a9);
    ....
cleanup2:
    drop(a1);
cleanup1:
    drop(a0);
    return;
}

of course this is more complex in Rust's case since moved values in Rust doesn't need to have their drop() called, but I think this is more or less doable?

psychon added the C-bug Category: This is a bug. label Feb 3, 2024

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 3, 2024

matthewjasper self-assigned this Feb 5, 2024

psychon mentioned this issue Feb 5, 2024

Optimise atom_manager! implementation psychon/x11rb#914

Merged

mzeitlin11 mentioned this issue Sep 1, 2024

Be careful with ? in generated code arlyon/async-stripe#602

Merged

hanna-kruppe mentioned this issue Oct 1, 2024

slow compilation of large struct with Box-members and ? in field init #93257

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quadratic code size due to combination of ? and Drop for linear code #120604

Quadratic code size due to combination of ? and Drop for linear code #120604

psychon commented Feb 3, 2024

quaternic commented Feb 3, 2024

psychon commented Feb 4, 2024 •

edited

Loading

quaternic commented Feb 4, 2024

yshui commented Feb 5, 2024

Quadratic code size due to combination of ? and Drop for linear code #120604

Quadratic code size due to combination of ? and Drop for linear code #120604

Comments

psychon commented Feb 3, 2024

Meta

quaternic commented Feb 3, 2024

psychon commented Feb 4, 2024 • edited Loading

quaternic commented Feb 4, 2024

yshui commented Feb 5, 2024

psychon commented Feb 4, 2024 •

edited

Loading