-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dead code elimination in benchmarks #8261
Comments
You basically need to figure out a way to make the result (and possibly the input) opaque by using an |
I'm not sure if this is possible, but it would be awesome if we could emit a warning about this if it turns out that the function turns into a no-op. I don't think we should be turning off optimizations for benchmarks because that may defeat the purpose? |
It's not possible to emit a warning for this. There are many optimization passes able to eliminate dead code, and we don't know what LLVM is doing with the bytecode (especially the codegen after pure transformations on the bytecode). They're meaningless without optimizations, they just all need to be written with |
Yeah I wasn't sure if you could ask LLVM post-optimizations about "how did this function turn out?", but I guess not. I think that warning about a missing |
Or wait, @thestinger are you saying that we should put |
Not quite, I mean you have to make the result used by a function LLVM isn't allowed to inline and that has side effects using the result like printing it. If it can figure out how to constant-fold it or cheat, you need to also make the inputs opaque the same way. We could make benchmark utility functions for doing this - just one like |
Example: #[inline(never)]
fn out<T>(x: T) {
printfln!("%?", x);
}
fn main() {
for (x, y) in range(0, 10).zip(range(0, 20)) {
out(x);
out(y);
}
} IR for main: define void @"_ZN4main17_15bec6257b8bff787_0$x2e0E"({ i64, %tydesc*, i8*, i8*, i8 }* nocapture) #1 {
"function top level":
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 0)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 0)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 1)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 1)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 2)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 2)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 3)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 3)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 4)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 4)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 5)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 5)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 6)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 6)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 7)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 7)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 8)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 8)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 9)
tail call fastcc void @"_ZN8out_345215_136b6da5d48b8d7_0$x2e0E"(i64 9)
ret void
} Now, lets say we want to make an input opaque: #[inline(never)]
fn out<T>(x: T) {
printfln!("%?", x);
}
#[inline(never)]
fn pass<T>(x: T) -> T {
printfln!("%?", x);
x
}
fn main() {
for (x, y) in range(0, pass(10)).zip(range(0, 20)) {
out(x);
out(y);
}
} The generated IR now: define void @_rust_main({ i64, %tydesc*, i8*, i8*, i8 }* nocapture) {
"function top level":
%1 = tail call fastcc i64 @"_ZN9pass_343215_927cf3b1d8737b7_0$x2e0E"()
%2 = icmp sgt i64 %1, 0
br i1 %2, label %match_else.i, label %"_ZN4main17_15bec6257b8bff787_0$x2e0E.exit"
match_else.i: ; preds = %"function top level", %match_else.i
%..sroa.0.023.i = phi i64 [ %..sroa.0.0.i, %match_else.i ], [ 1, %"function top level" ]
%.sroa.0.022.i = phi i64 [ %..sroa.0.023.i, %match_else.i ], [ 0, %"function top level" ]
%.sroa.3.021.i = phi i64 [ %3, %match_else.i ], [ 0, %"function top level" ]
%3 = add i64 %.sroa.3.021.i, 1
tail call fastcc void @"_ZN8out_402115_136b6da5d48b8d7_0$x2e0E"(i64 %.sroa.0.022.i)
tail call fastcc void @"_ZN8out_402115_136b6da5d48b8d7_0$x2e0E"(i64 %.sroa.3.021.i)
%4 = icmp slt i64 %..sroa.0.023.i, %1
%5 = zext i1 %4 to i64
%..sroa.0.0.i = add i64 %5, %..sroa.0.023.i
%.not.i = icmp sgt i64 %3, 19
%.not19.i = xor i1 %4, true
%brmerge.i = or i1 %.not.i, %.not19.i
br i1 %brmerge.i, label %"_ZN4main17_15bec6257b8bff787_0$x2e0E.exit", label %match_else.i
"_ZN4main17_15bec6257b8bff787_0$x2e0E.exit": ; preds = %match_else.i, %"function top level"
ret void
} |
I'm more worried about not having this automatically warned about. I think it's really difficult though to know when to warn, and it probably just requires some vigilance when writing benchmarks. This would probably be most useful to include in the documentation we have about It would be kind of nice to have a |
There's really no way to warn - you have to be absolutely clear about what you actually want to measure by making some results/inputs opaque. Your benchmark could be testing the ability of LLVM to eliminate a certain allocation or really anything - warning would be incorrect. Writing a benchmark is going to involve checking the IR to make sure it's really doing the work you want and not cheating. |
If someone feels like attempting to fix the effects of this, it looks like the following benchmarks are possibly DCE'd to nothing (i.e. are very fast):
|
I believe the change that made our allocations more visible to LLVM's libcall optimisation also meant that a few of the std::vec tests (and some others) are now optimised to a no-op. |
Warn disallowed_methods and disallowed_types by default Closes rust-lang#7841 changelog: Moved [`disallowed_methods`] and [`disallowed_types`] to `style`
It is very easy for the body of
bh.iter { ... }
to be turned into a noop by DCE. e.g.Compiling with
rustc -O --emit-llvm -S
, the relevant part of the IR is:This likely means that quite a few of the benchmarks currently in the tree are pointless.
The text was updated successfully, but these errors were encountered: