Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does inlining work through the __rust_alloc symbol? #45831

Closed
SimonSapin opened this issue Nov 7, 2017 · 16 comments
Closed

Does inlining work through the __rust_alloc symbol? #45831

SimonSapin opened this issue Nov 7, 2017 · 16 comments

Comments

@SimonSapin
Copy link
Contributor

https://github.com/rust-lang/rust/blob/1.21.0/src/liballoc_system/lib.rs#L24-L26 defines a MIN_ALLOC constant with this comment:

// The minimum alignment guaranteed by the architecture. This value is used to
// add fast paths for low alignment values. In practice, the alignment is a
// constant at the call site and the branch will be optimized out.

However, System.alloc() and friends are typically not used directly but from std::heap::Heap.alloc() and through symbols like:

extern "Rust" {
    fn __rust_alloc(size: usize, align: usize, err: *mut u8) -> *mut u8;
}

This indirection allows #[global_allocator] to change which allocator Heap uses. Doesn’t it also prevent inlining, constant-propagation of the alignment value, and elimination of the branch on comparing with MIN_ALIGN? Or can (Thin)LTO “see through” that indirection?

If the branch is not eliminated that’s probably OK, its cost is probably small compared to the total cost of performing allocation. But if the second sentence of that comment is wrong we might want to remove it.

@RalfJung
Copy link
Member

RalfJung commented Nov 7, 2017

I have noticed that while the following C code gets optimized away completely:

#include <stdio.h>
#include <stdlib.h>

void foo() {
    int *x = (int*)malloc(4);
    if (!x) return;
    *x = 0;

    int *y = (int*)malloc(4);
    if (!y) {
        free(x);
        return;
    }
    *y = 0;

    if (x == y) {
        printf("Equal!\n");
    }

    free(x);
    free(y);
}

the matching Rust code does not get the matching optimization:

fn main() {
    let x = &*Box::new(0);
    let y = &*Box::new(0);
    
    if x as *const _ == y as *const _ {
        println!("Equal!");
    }
}

Looking at the LLVM IR, the calls to __rust_alloc do have the !noalias attribute:

  %6 = call i8* @__rust_alloc(i64 4, i64 4, i8* nonnull %5) #6, !noalias !9

But that doesn't seem enough, and anyway, !noalias is much weaker than malloc. In particular, __rust_alloc definitely does not get inlined.

I was going to report a separate issue, but this actually looks like it is the same problem?

Cc @eddyb

EDIT: Updated C code to actually match Rust code.

@eddyb
Copy link
Member

eddyb commented Nov 7, 2017

cc @alexcrichton

@sfackler
Copy link
Member

sfackler commented Nov 7, 2017

The __rust_alloc symbols are inherently non-inlineable (without LTO at least) since their implementations aren't known until the final link.

@RalfJung
Copy link
Member

RalfJung commented Nov 7, 2017

So this means Rust is losing all the optimizations that are based on the return value of malloc being fresh?

@SimonSapin
Copy link
Contributor Author

It looks like these optimizations might be incompatible with the indirection that enables #[global_allocator]. Or can we rely on LTO for them?

@sfackler
Copy link
Member

sfackler commented Nov 7, 2017

I think we just need to teach LLVM that __rust_alloc is malloc. malloc isn't inlined in C either.

@SimonSapin
Copy link
Contributor Author

There’s already an #[allocator] attribute on its declaration. Is that what it’s supposed to do?

@SimonSapin
Copy link
Contributor Author

Glancing at 695dee0, that might be a copy/paste leftover.

@RalfJung
Copy link
Member

The __rust_alloc symbols are inherently non-inlineable (without LTO at least) since their implementations aren't known until the final link.

Something really funny is going on somewhere, see #45955. Even if I explicitly call System.alloc to get around the problem that __rust_alloc is an opaque symbol, it still is going through some form of "pluggable" architecture -- the behavior of the program still depends on whether the global allocator is set to the the system allocator or jemalloc. Looks like there is two layers of indirection?

@SimonSapin
Copy link
Contributor Author

@RalfJung I can’t find that second indirection in the code. On Unix System.alloc calls libc::malloc here: https://github.com/rust-lang/rust/blob/e312c8a8c3/src/liballoc_system/lib.rs#L134

@RalfJung
Copy link
Member

Doesn't jemalloc, when linked into the program, replace those symbols?

Doing ltrace on a program using System.alloc directly shows no call to malloc, but a bunch of calls to mmap.

The reason I think it is not using the system allocator is that the addresses look very different than they do in a C program calling malloc; and furthermore, two subsequent calls to allocate 8 bytes with alignment 8 actually result in adjacent addresses, while I have not yet observed system malloc ever giving out adjacent addresses.

@SimonSapin
Copy link
Contributor Author

Oh, that’s a good point. jemalloc is configured for std here: https://github.com/rust-lang/rust/blob/master/src/liballoc_jemalloc/build.rs I think libc::malloc is jemalloc when jemalloc is configured without a symbol prefix, which is the case in std on Linux since #31460

It’s a bug that alloc_system doesn’t do what it says when not using #[global_allocator], that should probably be filed in a separate issue. That bug should disappear when alloc_system becomes the default #36963 (comment)

@RalfJung
Copy link
Member

RalfJung commented Nov 13, 2017

It’s a bug that alloc_system doesn’t do what it says when not using #[global_allocator], that should probably be filed in a separate issue.

I assume that would be #45955 -- probably, the misalignment occurs because while MIN_ALIGN is enforced by the system allocator, it is not enforced by jemalloc.

@SimonSapin
Copy link
Contributor Author

I’ve filed #45966 about System.alloc using jemalloc.

This issue remains that this part of the comment is not accurate anymore and should be removed:

In practice, the alignment is a constant at the call site and the branch will be optimized out.

@RalfJung
Copy link
Member

I’ve filed #45966 about System.alloc using jemalloc.

Thanks.

This issue remains that this part of the comment is not accurate anymore and should be removed:

Agreed, that's the least that should happen. Furthermore, it would be nice to be able to tell LLVM that __rust_alloc is a "malloc-like function", if that's a thing, to get some more optimizations.

@SimonSapin
Copy link
Contributor Author

#46117 removes the obsolete part of the comment.

@bors bors closed this as completed in 43e32b5 Nov 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants