Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to build *compiler_builtins* in optimized mode #77

Closed
tjhu opened this issue May 15, 2020 · 4 comments
Closed

How to build *compiler_builtins* in optimized mode #77

tjhu opened this issue May 15, 2020 · 4 comments

Comments

@tjhu
Copy link

tjhu commented May 15, 2020

Hi,

When we run cargo xbuild --release ... --target x86_64-kernel.json, the memcpy being compiled is just a simple un-optimized for-loop. Looking at the source code, I think xargo builds sysroot crates in release mode by default.

I think there's something else in our settings that prevents xargo from building an optimized compiler_builtins but I am not sure what am I missing. We borrowed some of the setups, including the target.json, from Writing an OS in Rust.

@josephlr
Copy link

This isn't a bug in cargo-xbuild; just like xargo all of the sysroot crates (libcore, liballoc, compiler_bultins) are built with release by default. See:

cmd.arg("--release");

This issue you're hitting is rust-lang/compiler-builtins#339 which notes that the builtin memcpy implementation is just a simple un-optimized for-loop. If you want the no_std default implementation to be better, I would start there. Note that normally Rust just uses the memcpy defined by libc (which is very optimized, often written in arch-specific assembly); however, for no_std, this isn't really an option. This is esentially what GCC does as well.

The cargo-xbuild docs note that the memcpy metadata option can be used to enable/disable the default memcpy implementation, which should allow you to workaround this without changing compiler_builtins.

For example, if you enable package.metadata.cargo-xbuild.memcpy = false for your crate, you'll get a bunch of "undefined symbol" errors to memcpy/memcmp/memset. Then, you can have your crate (or an external crate) provide the appropriate definitions.

@josephlr
Copy link

Note that you can also link a custom memcpy implementation. For example, I got something working using musl via the following steps:

  1. Install musl, which (for my OS) installs a file /usr/lib/musl/lib/libc.a
  2. Add lines to your build.rs to tell Cargo where to find the library. In my example, this was:
    println!("cargo:rustc-link-search=native=/usr/lib/musl/lib");
    println!("cargo:rustc-link-lib=static=c");
  3. Now building w/ package.metadata.cargo-xbuild.memcpy = false works without linking errors.

Note that this approach is very application specific. Your libc.a must be compatible with your no_std target. My example only works because:

  • The installed musl library is for x86_64-unknown-linux
  • The musl implementations for memcpy (and friends) doesn't use OS functionality (unlike glibc).
  • My no_std target only needs to work on bare-metal x86_64

Finally, note that this complexity may not be worth it. Depending on your application, optimizing memcpy might have very little effect, as usually memory speed is the bottleneck for these sorts of operations.

@tjhu
Copy link
Author

tjhu commented May 20, 2020

@josephlr Thank you very much! Your detailed guide helps us a lot!

We tried using Redox's implementation but found it not very fast and kinda buggy(there's an infinite recursion in memset). We thought that the compiler could be smart enough to optimize the un-optimized for-loop quite a bit, at least some loop-unrolling as we see in the compiler explore. We didn't know your solution exists and we were being lazy about writing and maintaining a fast memcpy by ourselves so we thought that there might be a way to ask the compiler to do more optimization for us.

@josephlr
Copy link

josephlr commented Jul 8, 2020

@tjhu rust-lang/compiler-builtins#365 makes it so x86_64 targets will now build with a highly optimized memcpy and friends (using REP MOVSB). If that gets merged, then you should be able to use the very fast memcpy by default.

@phil-opp I think this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants