Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM 4.0 Upgrade #40123

Merged
merged 12 commits into from
Apr 25, 2017
Merged

LLVM 4.0 Upgrade #40123

merged 12 commits into from
Apr 25, 2017

Conversation

TimNN
Copy link
Contributor

@TimNN TimNN commented Feb 27, 2017

Since nobody has done this yet, I decided to get things started:

Todo:

  • push the relevant commits to rust-lang/llvm and rust-lang/compiler-rt
  • cleanup .gitmodules
  • Verify if there are any other commits from rust-lang/llvm which need backporting
  • Investigate / fix debuginfo ("<optimized out>") failures
  • Use correct emscripten version in docker image

Closes #37609.


Test results:

Everything is green 🎉

@rust-highfive
Copy link
Collaborator

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member

@TimNN you can probably work faster than going through @bors by selectively adding ALLOW_PR=1 to various entries in .travis.yml, that'll run the tests on the PR itself before we hit @bors.

I'd recommend doing that for a couple of the cross targets at least and then probably some of the other builders as well (such as emscripten)

My guess is that AppVeyor will be one of the most difficult pieces to update as part of this upgrade. Unfortunately we don't have any extra capacity there for running tests so it may be difficult to do so on the PR before bors :(

@TimNN
Copy link
Contributor Author

TimNN commented Feb 27, 2017

The debuginfo failures all occur because static muts are optimized out. (I verified that no optimisation flags were passed to rustc).

I'm unsure what the best fix would be here (something like the #[used] attribute from #39987 would probably work).

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Feb 27, 2017

static muts getting optimized out sounds like an issue with the IR/debug info we generate. Perhaps one of the debug info-related changes was incomplete and we don't emit everything we need to emit for globals? (#39528 looks very related.) To clarify, does this:

static muts are optimized out

mean that gdb outputs <optimized out>, or did you check the object files and the symbols for the globals were missing?

In any case, requiring #[used] on static muts to enable debugging would be a serious regression, not to mention that #[used] doesn't exist yet and I'm not even sure it would fix this (see above).

@petrochenkov
Copy link
Contributor

I vaguely remember about statics being optimized away the last summer already, when I wrote debuginfo tests for unions (one of the failing tests in this PR).
So I had to add at least an assignment for the static to be kept.
Maybe the LLVM optimizer become better and now sees that the optimization is still possible?

@TimNN
Copy link
Contributor Author

TimNN commented Feb 27, 2017

To clarify, does this:

static muts are optimized out

mean that gdb outputs <optimized out>, or did you check the object files and the symbols for the globals were missing?

It means that gdb prints <optimized out>

Apparently I made some assumptions that weren't quite correct.

The symbols exist (output of nm from the simple-struct test:

0000000000201008 d _ZN13simple_struct13NO_PADDING_1617h8f8e1027ea816fe7E
000000000020100c d _ZN13simple_struct13NO_PADDING_3217h9e367c36ef03eabcE
0000000000201018 d _ZN13simple_struct13NO_PADDING_6417h3c4e19994ee55915E
0000000000201050 d _ZN13simple_struct14PADDING_AT_END17h210f8219dd75f271E
0000000000201040 d _ZN13simple_struct16INTERNAL_PADDING17he308dba4866e981bE
0000000000201030 d _ZN13simple_struct17NO_PADDING_16326417h136d93d95c4cf5b5E

Since this is apparently indeed debuginfo related I guess cc @michaelwoerister, @dylanmckay

@TimNN
Copy link
Contributor Author

TimNN commented Feb 27, 2017

The generated IR, in case that helps anyone: https://gist.github.com/TimNN/92152a2f8062909805657d1bb4131998

@TimNN
Copy link
Contributor Author

TimNN commented Feb 27, 2017

The failed build of the IMAGE=cross seems to be qemu related, one of the failed tests:

---- [run-pass] run-pass/alignment-gep-tup-like-1.rs stdout ----
	
error: test run failed!
status: exit code: 101
command: /checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/qemu-test-client run /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/alignment-gep-tup-like-1.stage2-arm-unknown-linux-gnueabihf
stdout:
------------------------------------------
uploaded "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/alignment-gep-tup-like-1.stage2-arm-unknown-linux-gnueabihf", waiting for result
a=22 b=44

------------------------------------------
stderr:
------------------------------------------
thread 'main' panicked at 'client.read_exact(&mut header) failed with failed to fill whole buffer', /checkout/src/tools/qemu-test-client/src/main.rs:174
note: Run with `RUST_BACKTRACE=1` for a backtrace.

------------------------------------------

thread '[run-pass] run-pass/alignment-gep-tup-like-1.rs' panicked at 'explicit panic', /checkout/src/tools/compiletest/src/runtest.rs:2637
note: Run with `RUST_BACKTRACE=1` for a backtrace.

I saw the following panics:

     61 thread 'main' panicked at 'client.read_exact(&mut header) failed with Connection reset by peer (os error 104)', /checkout/src/tools/qemu-test-client/src/main.rs:174
      3 thread 'main' panicked at 'client.read_exact(&mut header) failed with failed to fill whole buffer', /checkout/src/tools/qemu-test-client/src/main.rs:174
      1 thread 'main' panicked at 'io::copy(&mut file, dst) failed I/O failure during tests: Error { repr: Os { code: 11, message: "Resource temporarily unavailable" } }
     96 thread 'main' panicked at 'io::copy(&mut file, dst) failed with Broken pipe (os error 32)', /checkout/src/tools/qemu-test-client/src/main.rs:220
      1 thread 'main' panicked at 'io::copy(&mut file, dst) failed with Connection reset by peer (os error 104)', /checkout/src/tools/qemu-test-client/src/main.rs:220

Do they ring a bell for anyone?

@TimNN
Copy link
Contributor Author

TimNN commented Feb 27, 2017

The emscripten failure are mainly of the kind Invalid record (Producer: 'LLVM4.0.0' Reader: 'LLVM 3.9.0'), although there are some assertion failures as well. (I would fix the llvm version mismatch first, maybe that fixes the assertion failures as well).

@TimNN
Copy link
Contributor Author

TimNN commented Feb 27, 2017

The android image fails with an LLVM assertion:

rustc: /checkout/src/llvm/lib/Target/ARM/ARMConstantIslandPass.cpp:492: void {anonymous}::ARMConstantIslands::doInitialConstPlacement(std::vector<llvm::MachineInstr*>&): Assertion `Size >= 4 && "Too small constant pool entry"' failed.
Build failed, waiting for other jobs to finish...
rustc: /checkout/src/llvm/lib/Target/ARM/ARMConstantIslandPass.cpp:492: void {anonymous}::ARMConstantIslands::doInitialConstPlacement(std::vector<llvm::MachineInstr*>&): Assertion `Size >= 4 && "Too small constant pool entry"' failed.
error: Could not compile `core`.

There are also some warnings (see below for an example), of which I am unsure how relevant / important they are.

warning: ../compiler-rt/lib/builtins/mulsc3.c:21:1: warning: conflicting types for built-in function '__mulsc3'
warning:  __mulsc3(float __a, float __b, float __c, float __d)
warning:  ^

@alexcrichton
Copy link
Member

@TimNN the former may be a bug in LLVM? (or just something we've never exposed ourselves before). The latter is normal, I believe it's happening on builds today.

@alexcrichton
Copy link
Member

Oh we've also got ~10 extra capacity on Travis so feel free to test more than one row at a time if you'd like :)

@TimNN
Copy link
Contributor Author

TimNN commented Feb 28, 2017

@alexcrichton: Have you ever seen something like the qemu failures in #40123 (comment) before?

Oh we've also got ~10 extra capacity on Travis so feel free to test more than one row at a time if you'd like :)

Ah, ok. I've been doing 3 at a time (when not debugging emscripten), but I guess I can run a few more :)

@TimNN
Copy link
Contributor Author

TimNN commented Feb 28, 2017

The dist-s390x-linux-netbsd build fails while cross compiling llvm due to missing std::thread support:

In file included from /checkout/src/llvm/include/llvm/Support/ThreadPool.h:17:0,
                 from /checkout/src/llvm/lib/Support/ThreadPool.cpp:14:
/checkout/src/llvm/include/llvm/Support/thread.h:41:14: error: 'thread' in namespace 'std' does not name a type
 typedef std::thread thread;

I guess the fix here is to backport (part of?) rust-lang/llvm@58731be as well.

@TimNN
Copy link
Contributor Author

TimNN commented Feb 28, 2017

I've been investigating the emscripten failures, see below for the findings. The one thing that both test have in common is that they use fixed sized arrays ([constexpr; len]) although if that is related / the problem, I don't know.

  • run-pass/packed-struct-vec.rs IR:

The #[repr(packed)] seems to be just broken, printing instead of asserting for equality gives the following results:

Foo { bar: 2, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 1, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 2, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 1, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 2, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 144115188075855872 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 562949953421312 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 0 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 2, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 144115188075855872 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 562949953421312 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 0 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 2, baz: 2 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 144115188075855872 }, Foo { bar: 1, baz: 33649522 }
Foo { bar: 0, baz: 0 }, Foo { bar: 1, baz: 33649522 }

(with this code:)

use std::mem;

#[repr(packed)]
#[derive(Copy, Clone, PartialEq, Debug)]
struct Foo {
    bar: u8,
    baz: u64
}

pub fn main() {
    let foos = [Foo { bar: 1, baz: 2 }; 10];

    assert_eq!(mem::size_of::<[Foo; 10]>(), 90);

    for i in 0..10 {
        println!("{:?}, {:?}", foos[i], Foo { bar: 1, baz: 2});
    }

    for &foo in &foos {
        println!("{:?}, {:?}", foo, Foo { bar: 1, baz: 2 });
    }

    assert!(false);
}
  • run-pass/issue-29663.rs IR:

The write_volatile in the following snippet is apparently not executed correctly:

        let mut x = E([0; 32]);
        write_volatile(&mut x, E([1; 32]));
        assert_eq!(read_volatile(&x), E([1; 32])); // line 61
        assert_eq!(x, E([1; 32]));                 // line 62

The output:

thread 'main' panicked at 'assertion failed: `(left == right)` (left: `E([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])`, right: `E([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])`)', /checkout/src/test/run-pass/issue-29663.rs:61

If line 61 is commented:

thread 'main' panicked at 'assertion failed: `(left == right)` (left: `E([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1])`, right: `E([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])`)', /checkout/src/test/run-pass/issue-29663.rs:62

Note that other write_volatile / read_volatile pairs work correctly.

@alexcrichton
Copy link
Member

@TimNN

The QEMU failures don't look familiar but are perhaps indicative of the program segfaulting or otherwise exiting un-cleanly. Do you have the full logs I could help take a look at?

The missing std::thread business may be related to how we compile compilers. It may be that one of our compilers is too old or something like that. I know that MinGW C++ compilers, at least, do not have std::thread (at least if I'm remembering correctly). Historically we've "dealt" with this by just deleting code that uses std::thread, but it's clearly getting harder over time! This also isn't scalable into the future really. Unfortunately I don't know of a great solution here.

@TimNN
Copy link
Contributor Author

TimNN commented Feb 28, 2017

The QEMU failures don't look familiar but are perhaps indicative of the program segfaulting or otherwise exiting un-cleanly. Do you have the full logs I could help take a look at?

Ah, sorry, I linked the logs only from the original post, here they are: https://travis-ci.org/rust-lang/rust/jobs/205860539

@alexcrichton
Copy link
Member

Fascinating! Unfortunately I may not be of much help. Also thanks for the links, I should have looked around for them! Of the failures so far:

  • armhf - this is really suspicious. In theory if qemu-test-server panics in any way we'd see it in the logs (as it doesn't daemonize that much or anything), but I'm not seeing anything. The errors mean that the server is prematurely closing the connection, but I'm not sure how that's possible without otherwise printing error information! It may be a bug in LLVM or related to the assertion failures, but it may also require more debugging the qemu instance itself :(

  • emscripten - looks like you're on top of these (maybe fastcomp regressions? maybe llvm regression? unsure myself...)

  • android - yeah looks like an LLVM bug? Although I wouldn't rule out invalid codegen on our side just yet.

  • s390x-netbsd - Ok so this specifically failed to compiled LLVM for NetBSD. This is our script to compile NetBSD gcc and I don't see anything immediately wrong that should compile the wrong C++. I wonder if the gcc options accidentally disable std::thread? Or maybe it's disabled somewhere else on NetBSD by default? Not sure why that happened :(

Some other tips I'd have is:

  • You can run docker images locally via ./src/ci/docker/run.sh $image_name which can help with debugging (e.g. avoiding going through Travis). That should in theory use the precise same environment as Travis modulo kernel and hardware.

  • For suspected LLVM bugs the best way (although certainly not the fastest way) that I know to work with them is to (a) get it to reproduce with a manual rustc invocation then (b) get it to reproduce with an opt invocation by using rustc to generate LLVM IR then switching to LLVM's tools and then (c) minimizing that IR as much as possible. If it's small enough then reporting a bug on LLVM's issue tracker is usually good for getting the issue fixed in a timely fashion.

@TimNN
Copy link
Contributor Author

TimNN commented Feb 28, 2017

  • You can run docker images locally via ./src/ci/docker/run.sh $image_name which can help with debugging (e.g. avoiding going through Travis). That should in theory use the precise same environment as Travis modulo kernel and hardware.

Yeah, I'm doing that right now :)

s390x-netbsd

Alright, so the problem is, I think, that the following ./configure check fails when compiling: checking for gthreads library... no

@TimNN TimNN force-pushed the llvm40 branch 2 times, most recently from a387dec to 03412b9 Compare February 28, 2017 16:11
@alexcrichton
Copy link
Member

@TimNN heh yeah that'd do it! I wonder if some more headers need to be copied from the NetBSD base system or something like that? Unfortunately I forget now at this point where I got those instructions from to build a NetBSD cross-compiler...

@mattico
Copy link
Contributor

mattico commented Feb 28, 2017

Those warnings aren't new. I forget the cause.
Edit: meaning the compiler-rt function signature warnings which seem to have disappeared...

@TimNN
Copy link
Contributor Author

TimNN commented Feb 28, 2017

Some notes on the armhf-gnu image:

  • I (once) got the same LLVM assertion as on android, this went away when retrying the build. I'll try to get this to reliably reproduce.
  • The qemu connection errors happen non deterministically, as far as I can tell, example: run-pass ran successfully, incremental failed afterwards, after a retry all ofrun-pass failed.
  • Now I got a segfault... and no idea what actually segfaulted...

@alexcrichton
Copy link
Member

Odd! I wouldn't entirely rule out a bug in qemu-test-{client,server} FWIW

@TimNN TimNN force-pushed the llvm40 branch 2 times, most recently from a2f620f to 934aa51 Compare March 1, 2017 12:36
@TimNN
Copy link
Contributor Author

TimNN commented Apr 24, 2017

Yay, looks like all the builds timed out again, so things seem to be good to go. (I didn't verify all the logs this time).

@alexcrichton
Copy link
Member

@TimNN looks great to me!

I hope to branch beta later today, so want to pull out the fast-fail? I'll r+ this after the beta is branched.

I'd also like to reiterate that you're at least my own personal "Rust Hero of the last N Months" where N is two and counting. If we delay this for 3 more days then it'll be a 2+ month PR!

@TimNN
Copy link
Contributor Author

TimNN commented Apr 24, 2017

@alexcrichton: I removed the always fail commit.


I'd also like to reiterate that you're at least my own personal "Rust Hero of the last N Months" where N is two and counting. If we delay this for 3 more days then it'll be a 2+ month PR!

Thanks a lot! All the positive encouragement and feedback has helped a lot in keeping me motivated to work on the upgrade :).

@Kixunil
Copy link
Contributor

Kixunil commented Apr 24, 2017

I'd also like to reiterate that you're at least my own personal "Rust Hero of the last N Months"

Mine too! :)

@alexcrichton
Copy link
Member

@bors: r+

Beta's branched, let's do this!

@bors
Copy link
Contributor

bors commented Apr 24, 2017

📌 Commit 8994277 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Apr 24, 2017

⌛ Testing commit 8994277 with merge 0777c75...

bors added a commit that referenced this pull request Apr 24, 2017
LLVM 4.0 Upgrade

Since nobody has done this yet, I decided to get things started:

**Todo:**

* [x] push the relevant commits to `rust-lang/llvm` and `rust-lang/compiler-rt`
* [x] cleanup `.gitmodules`
* [x] Verify if there are any other commits from `rust-lang/llvm` which need backporting
* [x] Investigate / fix debuginfo ("`<optimized out>`") failures
* [x] Use correct emscripten version in docker image

---

Closes #37609.

---

**Test results:**

Everything is green 🎉
@bors
Copy link
Contributor

bors commented Apr 25, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 0777c75 to master...

@bors bors merged commit 8994277 into rust-lang:master Apr 25, 2017
@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Apr 25, 2017
@brson
Copy link
Contributor

brson commented Apr 25, 2017

Thanks for slogging through this @TimNN.

@BatmanAoD
Copy link
Member

Congratulations @TimNN!

@DemiMarie
Copy link
Contributor

Thank you @TimNN!

@TimNN TimNN deleted the llvm40 branch April 25, 2017 05:12
@michaelwoerister
Copy link
Member

🎉

cnd added a commit to gentoo/gentoo-rust that referenced this pull request Apr 25, 2017
According to rust-lang/rust#40123 rust now supports LLVM 4
@pkphilip
Copy link

pkphilip commented May 6, 2017

Wow! Thanks a lot @TimNN! That is some effort!

@Kixunil
Copy link
Contributor

Kixunil commented May 6, 2017

I've just noticed that README mentions clang 3.x. Shouldn't this be updated?

@hanna-kruppe
Copy link
Contributor

@Kixunil That part of the readme is about the C and C++ compiler used for compiling C and C++ dependencies during the build, not about the LLVM version.

@Kixunil
Copy link
Contributor

Kixunil commented May 6, 2017

@rkruppe I guess I'm too hungry and tired. Thank you for pointing that out! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to LLVM 4.0