Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadratic code size due to combination of ? and Drop for linear code #120604

Open
psychon opened this issue Feb 3, 2024 · 4 comments
Open

Quadratic code size due to combination of ? and Drop for linear code #120604

psychon opened this issue Feb 3, 2024 · 4 comments
Assignees
Labels
C-bug Category: This is a bug. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@psychon
Copy link

psychon commented Feb 3, 2024

Hi,

first: Sorry if this is a duplicate. I tried, but did not find anything.

There is a bug report for x11rb about code size. Some macro caused 111 KiB of code to be emited: psychon/x11rb#912

I tried to produce a self-contained example of what is going on and here is what I came up with the following

Playground link

macro_rules! the_macro {
    {
        $cookie_name:ident { $($field_name:ident,)* }
    } => {
        struct $cookie_name {
            $($field_name: support::Cookie,)*
        }
        
        #[inline(never)]
        fn send_requests() -> Result<$cookie_name, support::Error> {
            Ok($cookie_name {
                $($field_name: support::send_request()?,)*
            })
        }
    }
}

the_macro!{Cookies { a, b, c, d, e, } }

fn main() {
    let _ = send_requests();
}

mod support {
    #[inline(never)]
    pub fn send_request() -> Result<Cookie, Error> {
        // Some code to trick the optimizer into not optimising everything away
        if std::env::args().next().is_some() {
            Ok(Cookie(42))
        } else {
            Err(Error)
        }
    }
    
    pub struct Error;
    
    pub struct Cookie(u16);
    impl Drop for Cookie {
        #[inline(never)]
        fn drop(&mut self) {
            println!("{}", self.0);
        }
    }
}

(#[inline(never)] just exists so that I can read the resulting assembly; the original code does not use inline hints, but instead the functions are big enough not to be inlined. I'm not sure whether drop() of the Cookie perhaps ends up being inlined and being part of the huge code size.)

This code uses a macro, so just to make it clear what the problem is, here is the relevant function after macro expansion:

#[inline(never)]
fn send_requests() -> Result<Cookies, support::Error> {
    Ok(Cookies {
            a: support::send_request()?,
            b: support::send_request()?,
            c: support::send_request()?,
            d: support::send_request()?,
            e: support::send_request()?,
        })
}

I expected to see this happen: Linear code size (= small code)

Instead, this happened: This results in quadratic code size due to the drop calls.

Looking at the generated ASM in release mode and trying to translate this back into something rust-y, the code looks like this:

fn send_requests() -> Result<Cookies, support::Error> {
    let a = match support_send_request() {
       Err(e) => return Err(e);
       Ok(a) => a;
    };
    let b = match support_send_request() {
       Err(e) => { drop(a); return Err(e) };
       Ok(b) => b;
    };
    let c = match support_send_request() {
       Err(e) => { drop(a); drop(b); return Err(e) };
       Ok(c) => c;
    };
    let d = match support_send_request() {
       Err(e) => { drop(a); drop(b); drop(c); return Err(e) };
       Ok(d) => d;
    };
    let e = match support_send_request() {
       Err(e) => { drop(a); drop(b); drop(c); drop(d); return Err(e) };
       Ok(e) => e;
    };
    Ok(Cookies { a, b, c, d, e })
}

The actual code where this was observed has not just 5 cases (a, b, c, d, e) but 59 ones. Here, the quadratic behaviour starts to hurt a lot. I counted 1709 calls to functions with drop_in_place in their name.

Meta

rustc --version --verbose: Uhm... I tried this on the playground. But I can also reproduce this with my ancient version locally:

rustc 1.70.0
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.70.0
LLVM version: 16.0.6
@psychon psychon added the C-bug Category: This is a bug. label Feb 3, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 3, 2024
@quaternic
Copy link

As a workaround, you should get better results if you can structure the code like:

let a = make()?;
let b = make()?;
let c = make()?;
Some([a,b,c])

(At least it works when applied to the following case.)

Here's a self-contained example that reproduces the same quadratic growth:

#![allow(improper_ctypes)]
pub struct NoisyDrop;

extern {
    fn make() -> Option<NoisyDrop>;
    fn extdrop(x: &mut NoisyDrop);
}

impl Drop for NoisyDrop {
    fn drop(&mut self) {
        unsafe {extdrop(self)}
    }
}

#[inline(never)]
pub fn foo() -> Option<[NoisyDrop; 8]> {
    unsafe {
        Some(
            [
                make()?,
                make()?,
                make()?,
                make()?,
                make()?,
                make()?,
                make()?,
                make()?,
            ]
        )
    }
}

https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=8412f8f3ded022aeae0d4456ddd1658a

@psychon
Copy link
Author

psychon commented Feb 4, 2024

Thanks a lot for the hint.

Edit: PR for my work-around: psychon/x11rb#914

I worked around this by using a `Vec` and `FromIterator, E> for Result`.
let names = ["foo", "bar"]; // I left this out in my example above
let cookies: Result<Vec<_>, _> = names.into_iter().map(|name| send_request(name)).collect();
Ok(Cookies { cookies: cookies? })

This has the benefit of generating less code in the macro and thus should be way more obvious to optimise.

This has the downside of requiring an extra allocation for the Vec. But since this code does network I/O and is "cold" (usually only run once at startup), I don't care much about it.

The only code using this Cookies struct also comes from the macro and thus can be adapted as well without a "real" API break.

(Also, I like your approach of working around the optimiser way more than mine. Nice. :-) )

@quaternic
Copy link

I initially thought it was caused by drop-order, but the fields of the partially constructed struct or array are dropped in reverse order of construction, so I don't see any reason the compiler couldn't do a better job here.

@rustbot label +T-compiler +C-optimization +I-heavy -needs-triage

@rustbot rustbot added C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Feb 4, 2024
@matthewjasper matthewjasper self-assigned this Feb 5, 2024
@yshui
Copy link
Contributor

yshui commented Feb 5, 2024

FYI clang does this for C++: (example from @psychon)

struct HasDrop {
    HasDrop();
    ~HasDrop();
};

void fn() {
    HasDrop a0;
    if (cond0) return;
    HasDrop a1;
    if (cond1) return;
    // repeat for a2-a9
    HasDrop a10;
    if (cond10) return;

    ....
}

will be lowered to something like this:

void fn() {
    HasDrop a0;
    if (cond0) goto cleanup1;
    HasDrop a1;
    if (cond1) goto cleanup2;
    // repeat for a2-a9
    HasDrop a10;
    if (cond10) goto cleanup;
    ....
cleanup:
    drop(a10);
cleanup9:
    drop(a9);
    ....
cleanup2:
    drop(a1);
cleanup1:
    drop(a0);
    return;
}

of course this is more complex in Rust's case since moved values in Rust doesn't need to have their drop() called, but I think this is more or less doable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants