Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in rust-nightly 2018-12-02 #56618

Closed
steffengy opened this issue Dec 8, 2018 · 22 comments
Closed

Regression in rust-nightly 2018-12-02 #56618

steffengy opened this issue Dec 8, 2018 · 22 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@steffengy
Copy link
Contributor

steffengy commented Dec 8, 2018

Summary of the issue reported in tiberius at steffengy/tiberius#87.

Issue

When building tiberius with the latest nightlies,
an UDP send using tokio's UdpSockets mysteriously fails on windows:

MIOW CompletionPort::get_many returns an Error (only for release builds)

Io(Os { code: 10047, kind: Other, message: "An address incompatible with the requested protocol was used." })

Reproduction

Checkout tiberius (or built against the version on crates.io) with the following file.
This might only reproduce on windows (in release builds!).
I have not yet confirmed on linux, the code and deps are different though so a linux build might not trigger it - the part returning the error is miow.

main.rs

use tiberius::SqlConnection;

fn main() -> Result<(), Box<std::error::Error>> {
    let future = SqlConnection::connect("server=tcp:127.0.0.1\\MSSQLSERVER;database=tibtest;integratedsecurity=sspi;trustservercertificate=true");
    tokio::executor::current_thread::block_on_all(future).unwrap(); // <- panic here

    println!("Done.");
    Ok(())
}

Bisect

Bisected to commit d3ed348
@RalfJung since he might have an idea how this changeset could affect things.

Confirmed it with

  • Command cargo-bisect-rustc --start d09466ceb1374bd0ff1c490bfd50133b8ca67558 --end 4a45578bc58ff262864f72680cc02e83f5d2f5b3 --script=test.bat -v --prompt
searched toolchains d09466ceb1374bd0ff1c490bfd50133b8ca67558 through 4a45578bc58ff262864f72680cc02e83f5d2f5b3
regression in d3ed34824c31f303b98855fee6fc9213dff1f295
  • test.bat
    cargo run --release main.rs

Observations

I could not produce a minimal testcase that reproduces this, since removing code or abstractions always led to removing the error no matter how unrelated it seemed. My first guess therefore was it's somehow optimization related - since the error only seems to occur in release builds - but for me that doesn't quite align with the bisected commit.

@RalfJung
Copy link
Member

RalfJung commented Dec 8, 2018

Now that is odd. Unfortunately I do not have a good idea, nor do I have a Windows machine to even try and reproduce this. EDIT: On Linux, this code just starts and then blocks (which is expected, I have no MSSQL server running).

@eddyb @nikomatsakis @alexcrichton any idea what might be going on here?

@steffengy
Copy link
Contributor Author

(which is expected, I have no MSSQL server running)

It reproduces without a SQL server running on windows.
Basically it uses UDP to do SQL server instance discovery and the error is returned during
the sending of this discovery packet (before even knowing where to connect to).

But it still makes sense that it doesn't reproduce on linux, since the dependencies
around tokio are vastly different.

@nikic
Copy link
Contributor

nikic commented Dec 8, 2018

Assuming the mentioned commit isn't intrinsically buggy, a possible cause here could be that the use of &mut causes us to emit dereferenceable attributes, leading to different optimization behavior.

@steffengy
Copy link
Contributor Author

@nikic How would you track that down?
Comparing LLVM bitcode/IR of stable (working) with nightly or the generated ASM itself would be my first idea, I might look into that soon.

@nikic
Copy link
Contributor

nikic commented Dec 8, 2018

@steffengy My general recommendation for tracking down optimization bugs (assuming it even is one here) would be a) check whether it still reproduces with -C codegen-units=1, if so use that, otherwise specify -Z no-parallel-llvm and b) use -C llvm-args=-opt-bisect-limit=N to find the lowest value of N for which the issue occurs via bisection. The printed log will then show you which optimization for which function is responsible (though not necessarily the root cause). It's then possible to use -C llvm-args=-filter-print-funcs=mangled_function_name in conjunction with -C llvm-args=-print-before=optname and -C llvm-args=-print-after=optname to view the IR transformation.

If you already know which function is the problematic one, then instead of bisecting it may be easier to directly look at the IR transformation log -C llvm-args=-filter-print-funcs=mangled_function_name -C llvm-args=-print-after-all and see which one introduces the fault transform.

@steffengy
Copy link
Contributor Author

steffengy commented Dec 8, 2018

a) It reproduces with either option.
b) from my understanding -C llvm-args=-opt-bisect-limit=0 should disable all optimizations, which would align with the log outptut. It still reproduces unless I omit --release.
(cargo rustc --release -- -C codegen-units=1 -C llvm-args=-opt-bisect-limit=0)

@steffengy
Copy link
Contributor Author

Also f I set codegen-units=1 in Cargo.toml (not via the command line) it surprisingly does not reproduce.

[profile.release]
codegen-units = 1

Also if i set opt-level=2 in Cargo.toml, it does not reproduce.

@nikic
Copy link
Contributor

nikic commented Dec 8, 2018

@steffengy I believe cargo rustc applies the passed flags on the the last compilation, but all your dependencies are still going to be compiled with default flags. In this case the issue is likely introduced in one of the dependencies, so setting flags only on the final compilation doesn't make much of a difference.

@steffengy
Copy link
Contributor Author

steffengy commented Dec 8, 2018

Yeah, using RUSTFLAGS=-C codegen-units = 1 I get the same behavior as with the toml.
It does reproduce with RUSTFLAGS=-Z no-parallel-llvm now.
Adding -C llvm-args=-opt-bisect-limit=0 makes it work, so it aleast seems optimization related.

I think multiple compiler invocations (for each dep) make it pretty hard to narrow this down, since the bisectting limit is applied to each invocation for each dependency. If I would figure out the minimum pass number X, I still have to check which invocation actually causes the damage?
Also the log for limit=0 is 2GB, so quite a bit to bisect...

@steffengy
Copy link
Contributor Author

The rustc invocation for the tiberius crate yields the errorneous behavior.
I tracked it down using a rustc wrapper that passes -C llvm-args=-opt-bisect-limit=208995 to only the rustc invocation compiling the tiberius crate. This reproduces the issue. Passing 208994 makes the isuse disappear.
All other dependencies are compiled with full optimizations and no additional flags.

This means I tracked it down to the following pass of the tiberius-compilation:

BISECT: running pass (208995) X86 Avoid Store Forwarding Blocks on function 
(_ZN126_$LT$tiberius..SqlConnection$LT$alloc..boxed..Box$LT$$LP$dyn
$u20$tiberius..BoxableIo$u20$$u2b$$u20$$u27$static$RP$$GT$$GT$$GT$7
connect17h74d4893aded596c4E)

IR dumped before/after this pass (thanks to @nikic):
https://gist.github.com/steffengy/ed6d926798eb2ff7d1f9991dc1a37663

IR-Diff
--- ir_before.txt       2018-12-08 21:02:00.903516800 +0100
+++ ir_after.txt        2018-12-08 21:00:36.770815800 +0100
@@ -1,4 +1,4 @@
-# *** IR Dump Before X86 Avoid Store Forwarding Blocks ***:
+# *** IR Dump After X86 Avoid Store Forwarding Blocks ***:
 # Machine code for function _ZN126_$LT$tiberius..SqlConnection$LT$alloc..boxed..Box$LT$$LP$dyn$u20$tiberius..BoxableIo$u20$$u2b$$u20$$u27$static$RP$$GT$$GT$$GT$7connect17h74d4893aded596c4E: IsSSA, TracksLiveness
 Frame Objects:
   fi#0: size=1, align=1, at location [SP+8]
@@ -2054,10 +2054,20 @@
 ; predecessors: %bb.146
   successors: %bb.148(0x7ffff800), %bb.336(0x00000800); %bb.148(100.00%), %bb.336(0.00%)

-  %961:vr128 = MOVUPSrm %4:gr64, 1, $noreg, -4, $noreg :: (dereferenceable load 16 from %ir.1165, align 4)
+  %1272:gr16 = MOV16rm %4:gr64, 1, $noreg, -4, $noreg :: (dereferenceable load 2 from %ir.1165, align 4)
+  %1273:gr8 = MOV8rm %4:gr64, 1, $noreg, -2, $noreg :: (dereferenceable load 1 from %ir.1165 + 2, align 4)
+  %1274:gr32 = MOV32rm %4:gr64, 1, $noreg, 1, $noreg :: (dereferenceable load 4 from %ir.1165 + 3)
+  %1275:gr16 = MOV16rm %4:gr64, 1, $noreg, 5, $noreg :: (dereferenceable load 2 from %ir.1165 + 7, align 4)
+  %1276:gr32 = MOV32rm %4:gr64, 1, $noreg, 7, $noreg :: (dereferenceable load 4 from %ir.1165 + 9)
+  %1277:gr8 = MOV8rm %4:gr64, 1, $noreg, 11, $noreg :: (dereferenceable load 1 from %ir.1165 + 13, align 4)
   %962:vr128 = MOVUPSrm %4:gr64, 1, $noreg, 12, $noreg :: (dereferenceable load 16 from %ir.1165 + 16, align 4)
   MOVAPSmr %stack.22, 1, $noreg, 16, $noreg, killed %962:vr128 :: (store 16 into %ir.1166 + 16)
-  MOVAPSmr %stack.22, 1, $noreg, 0, $noreg, killed %961:vr128 :: (store 16 into %ir.1166)
+  MOV16mr %stack.22, 1, $noreg, 0, $noreg, killed %1272:gr16 :: (store 2 into %ir.1166, align 16)
+  MOV8mr %stack.22, 1, $noreg, 2, $noreg, killed %1273:gr8 :: (store 1 into %ir.1166 + 2, align 16)
+  MOV32mr %stack.22, 1, $noreg, 5, $noreg, killed %1274:gr32 :: (store 4 into %ir.1166 + 3, align 16)
+  MOV16mr %stack.22, 1, $noreg, 9, $noreg, killed %1275:gr16 :: (store 2 into %ir.1166 + 7, align 16)
+  MOV32mr %stack.22, 1, $noreg, 11, $noreg, killed %1276:gr32 :: (store 4 into %ir.1166 + 9, align 16)
+  MOV8mr %stack.22, 1, $noreg, 15, $noreg, killed %1277:gr8 :: (store 1 into %ir.1166 + 13, align 16)
   EH_LABEL <mcsymbol >
   ADJCALLSTACKDOWN64 32, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
   $rcx = COPY %673:gr64
@@ -2406,10 +2416,20 @@
 ; predecessors: %bb.171
   successors: %bb.173(0x7ffff800), %bb.337(0x00000800); %bb.173(100.00%), %bb.337(0.00%)

-  %882:vr128 = MOVUPSrm %4:gr64, 1, $noreg, -4, $noreg :: (dereferenceable load 16 from %ir.1363, align 4)
+  %1278:gr16 = MOV16rm %4:gr64, 1, $noreg, -4, $noreg :: (dereferenceable load 2 from %ir.1363, align 4)
+  %1279:gr8 = MOV8rm %4:gr64, 1, $noreg, -2, $noreg :: (dereferenceable load 1 from %ir.1363 + 2, align 4)
+  %1280:gr32 = MOV32rm %4:gr64, 1, $noreg, 1, $noreg :: (dereferenceable load 4 from %ir.1363 + 3)
+  %1281:gr16 = MOV16rm %4:gr64, 1, $noreg, 5, $noreg :: (dereferenceable load 2 from %ir.1363 + 7, align 4)
+  %1282:gr32 = MOV32rm %4:gr64, 1, $noreg, 7, $noreg :: (dereferenceable load 4 from %ir.1363 + 9)
+  %1283:gr8 = MOV8rm %4:gr64, 1, $noreg, 11, $noreg :: (dereferenceable load 1 from %ir.1363 + 13, align 4)
   %883:vr128 = MOVUPSrm %4:gr64, 1, $noreg, 12, $noreg :: (dereferenceable load 16 from %ir.1363 + 16, align 4)
   MOVAPSmr %stack.22, 1, $noreg, 16, $noreg, killed %883:vr128 :: (store 16 into %ir.1364 + 16)
-  MOVAPSmr %stack.22, 1, $noreg, 0, $noreg, killed %882:vr128 :: (store 16 into %ir.1364)
+  MOV16mr %stack.22, 1, $noreg, 0, $noreg, killed %1278:gr16 :: (store 2 into %ir.1364, align 16)
+  MOV8mr %stack.22, 1, $noreg, 2, $noreg, killed %1279:gr8 :: (store 1 into %ir.1364 + 2, align 16)
+  MOV32mr %stack.22, 1, $noreg, 5, $noreg, killed %1280:gr32 :: (store 4 into %ir.1364 + 3, align 16)
+  MOV16mr %stack.22, 1, $noreg, 9, $noreg, killed %1281:gr16 :: (store 2 into %ir.1364 + 7, align 16)
+  MOV32mr %stack.22, 1, $noreg, 11, $noreg, killed %1282:gr32 :: (store 4 into %ir.1364 + 9, align 16)
+  MOV8mr %stack.22, 1, $noreg, 15, $noreg, killed %1283:gr8 :: (store 1 into %ir.1364 + 13, align 16)
   EH_LABEL <mcsymbol >
   ADJCALLSTACKDOWN64 32, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
   $rcx = COPY %673:gr64
@@ -3832,11 +3852,13 @@
   %1093:gr32 = MOV32rm %stack.63, 1, $noreg, 0, $noreg :: (dereferenceable load 4 from %ir.2004)
   MOV32mr %stack.83, 1, $noreg, 0, $noreg, killed %1093:gr32 :: (store 4 into %ir.2006)
   %1094:vr128 = MOVAPSrm %stack.30, 1, $noreg, 0, $noreg :: (dereferenceable load 16 from %ir.2003)
-  %1095:vr128 = MOVAPSrm %stack.30, 1, $noreg, 16, $noreg :: (dereferenceable load 16 from %ir.2003 + 16)
+  %1284:gr64 = MOV64rm %stack.30, 1, $noreg, 16, $noreg :: (dereferenceable load 8 from %ir.2003 + 16, align 16)
+  %1285:gr64 = MOV64rm %stack.30, 1, $noreg, 24, $noreg :: (dereferenceable load 8 from %ir.2003 + 24, align 16)
   %1096:vr128 = MOVAPSrm %stack.30, 1, $noreg, 32, $noreg :: (dereferenceable load 16 from %ir.2003 + 32)
   %1097:vr128 = MOVAPSrm %stack.30, 1, $noreg, 48, $noreg :: (dereferenceable load 16 from %ir.2003 + 48)
   MOVAPSmr %stack.22, 1, $noreg, 0, $noreg, killed %1094:vr128 :: (store 16 into %ir.2005)
-  MOVAPSmr %stack.22, 1, $noreg, 16, $noreg, killed %1095:vr128 :: (store 16 into %ir.2005 + 16)
+  MOV64mr %stack.22, 1, $noreg, 16, $noreg, killed %1284:gr64 :: (store 8 into %ir.2005 + 16, align 16)
+  MOV64mr %stack.22, 1, $noreg, 24, $noreg, killed %1285:gr64 :: (store 8 into %ir.2005 + 24, align 16)
   MOVAPSmr %stack.22, 1, $noreg, 32, $noreg, killed %1096:vr128 :: (store 16 into %ir.2005 + 32)
   MOVAPSmr %stack.22, 1, $noreg, 48, $noreg, killed %1097:vr128 :: (store 16 into %ir.2005 + 48)
   %1098:vr128 = MOVAPSrm %stack.30, 1, $noreg, 64, $noreg :: (dereferenceable load 16 from %ir.2003 + 64)

@nikic
Copy link
Contributor

nikic commented Dec 8, 2018

Wow, nice work! This transformation does look rather suspect to me. The first two 16-byte load/store pairs are broken up into load/store sequences which only end up copying 14-bytes, the final two bytes seem to be disappearing into thin air. Unless there is some clever reason for that (e.g. that copy is already present somewhere in the surrounding IR) that would seem like a bug.

Before looking closer into what is happening there, it might be worthwhile to check whether this is the same issue as https://bugs.llvm.org/show_bug.cgi?id=38743, which is the one known issue in this pass that I'm aware of.

Assuming that this tooling works on Windows, it should be possible to check for this using:

cargo install rustup-toolchain-install-master
rustup-toolchain-install-master -a d3ed34824c31f303b98855fee6fc9213dff1f295
cargo +d3ed34824c31f303b98855fee6fc9213dff1f295-alt build --release

If this triggers a "Wrong size division" assertion in X86AvoidSFBPass::buildCopies(), it's probably the same issue.

@steffengy
Copy link
Contributor Author

No that doesn't result in any assertions. cargo-bisect-rustc also uses CI-builds,
so we'd have seen it there already - I still tried to make sure.

My current idea would be to try to identify that transformation in e.g. --emit=llvm-ir
and then verify with lli if changing that has any effect, but not sure how feasible that is yet.

@nikic
Copy link
Contributor

nikic commented Dec 8, 2018

No that doesn't result in any assertions. cargo-bisect-rustc also uses CI-builds,
so we'd have seen it there already - I still tried to make sure.

Thanks! I believe that by default cargo-bisect-rustc uses the normal dist CI builds without LLVM assertions, which is why it's still worthwhile to try with an assertion-enabled alt toolchain.

As you suggest, it would be helpful to get the LLVM IR for this function from --emit=llvm-ir. If the miscompilation still exists after explicitly running it through llc, then it should be possible to construct a reduced test case using bugpoint (though it will need a bit of manual setup, I'm thinking something along the lines of using FileCheck to match for one of those load sequences).

Another note on the MIR diff above, something I hadn't noticed before: The MOV displacements and MMO offsets don't line up. the MOV displacement jumps from -2 to 1 (diff 3), while the MMO offset goes from 2 to 3 (diff 1). Both of these can't be correct at the same time.

@nikic
Copy link
Contributor

nikic commented Dec 8, 2018

I think I have a reduced test case:

define i8 @test_offset(i8* %base) #0 {
entry:
  %z = alloca [128 x i8], align 16
  %gep0 = getelementptr inbounds i8, i8* %base, i64 7
  store volatile i8 0, i8* %gep0
  %gep1 = getelementptr inbounds i8, i8* %base, i64 5
  %bc1 = bitcast i8* %gep1 to i16*
  store volatile i16 0, i16* %bc1
  %gep2 = getelementptr inbounds i8, i8* %base, i64 1
  %bc2 = bitcast i8* %gep2 to i32*
  store volatile i32 0, i32* %bc2

  %y1 = getelementptr inbounds i8, i8* %base, i64 -4
  %y2 = bitcast [128 x i8]* %z to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %y2, i8* %y1, i64 16, i1 false)

  %gep4 = getelementptr inbounds [128 x i8], [128 x i8]* %z, i64 0, i64 4
  %ret = load i8, i8* %gep4
  ret i8 %ret
}

; Function Attrs: argmemonly nounwind
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1

attributes #0 = { "target-cpu"="core-avx2" }

Running this through llc gives:

	pushq	%rax
	.cfi_def_cfa_offset 16
	movb	$0, 7(%rdi)
	movw	$0, 5(%rdi)
	movl	$0, 1(%rdi)
	movzwl	-4(%rdi), %eax
	movw	%ax, -128(%rsp)
	movb	-2(%rdi), %al
	movb	%al, -126(%rsp)
	movl	1(%rdi), %eax
	movl	%eax, -123(%rsp)
	movzwl	5(%rdi), %eax
	movw	%ax, -119(%rsp)
	movb	7(%rdi), %al
	movb	%al, -117(%rsp)
	movl	8(%rdi), %eax
	movl	%eax, -116(%rsp)
	movb	-124(%rsp), %al
	popq	%rcx
	.cfi_def_cfa_offset 8
	retq

Which does not copy the -1..1 range, if I'm reading this correctly.

@nikic
Copy link
Contributor

nikic commented Dec 9, 2018

Reported as https://bugs.llvm.org/show_bug.cgi?id=39926 upstream.

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Dec 9, 2018
@nikic
Copy link
Contributor

nikic commented Dec 9, 2018

Submitted https://reviews.llvm.org/D55485 for review.

@steffengy
Copy link
Contributor Author

@nikic
Just built a nightly-stage2 without changes where it expectedly crashes.
Applying your patch to LLVM and rebuilding stage2 fixes it.
So the submitted LLVM patch indeed fixes it, great work!

@RalfJung
Copy link
Member

RalfJung commented Dec 9, 2018

@nikic That's awesome :D

@nikic
Copy link
Contributor

nikic commented Dec 10, 2018

Patch has landed upstream, so we can include this in the next batch of LLVM cherry-picks.

@danylaporte
Copy link

@nikic I don't want to put pressure here, just to thank you both @steffengy and @nikic for your hard work on this. BTW, when can I expect this LLVM patch to be included in the nightly compiler?

@nikic nikic added the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label Dec 13, 2018
@nikic
Copy link
Contributor

nikic commented Dec 13, 2018

Based on the timing, I'm assuming this regression also made it into beta, which was cut around 4th/5th Dec.

@nikic nikic added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Dec 13, 2018
@pnkfelix
Copy link
Member

T-compiler triage. P-high. Assigning to @nikic (...if I can)

@pnkfelix pnkfelix added the P-high High priority label Dec 13, 2018
Centril added a commit to Centril/rust that referenced this issue Dec 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants