-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data race spans #2646
Data race spans #2646
Conversation
I assume this is intended to fix #2205 ? (Since it is marked as a draft, I'll not put it in my queue for now.) |
☔ The latest upstream changes (presumably #2663) made this pull request unmergeable. Please resolve the merge conflicts. |
@rustbot author |
Track local frames incrementally during execution #2646 currently introduces a performance regression. This change removes that regression, and provides a minor perf improvement. The existing lazy strategy for tracking the span we want to display is as efficient as it is only because we often create a `CurrentSpan` then never call `.get()`. Most of the calls to the `before_memory_read` and `before_memory_write` hooks do not create any event that we store in `AllocHistory`. But data races are totally different, any memory read or write may race, so every call to those hooks needs to access to the current local span. So this changes to a strategy where we update some state in a `Thread` and `FrameExtra` incrementally, upon entering and existing each function call. Before: ``` Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/backtraces/Cargo.toml Time (mean ± σ): 5.532 s ± 0.022 s [User: 5.444 s, System: 0.073 s] Range (min … max): 5.516 s … 5.569 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/mse/Cargo.toml Time (mean ± σ): 831.4 ms ± 3.0 ms [User: 783.8 ms, System: 46.7 ms] Range (min … max): 828.7 ms … 836.1 ms 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde1/Cargo.toml Time (mean ± σ): 1.975 s ± 0.021 s [User: 1.914 s, System: 0.059 s] Range (min … max): 1.939 s … 1.990 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde2/Cargo.toml Time (mean ± σ): 4.060 s ± 0.051 s [User: 3.983 s, System: 0.071 s] Range (min … max): 3.972 s … 4.100 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/slice-get-unchecked/Cargo.toml Time (mean ± σ): 784.9 ms ± 8.2 ms [User: 746.5 ms, System: 37.7 ms] Range (min … max): 772.9 ms … 793.3 ms 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/unicode/Cargo.toml Time (mean ± σ): 1.679 s ± 0.006 s [User: 1.623 s, System: 0.055 s] Range (min … max): 1.673 s … 1.687 s 5 runs ``` After: ``` Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/backtraces/Cargo.toml Time (mean ± σ): 5.330 s ± 0.037 s [User: 5.232 s, System: 0.084 s] Range (min … max): 5.280 s … 5.383 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/mse/Cargo.toml Time (mean ± σ): 818.9 ms ± 3.7 ms [User: 776.8 ms, System: 41.3 ms] Range (min … max): 813.5 ms … 822.5 ms 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde1/Cargo.toml Time (mean ± σ): 1.927 s ± 0.011 s [User: 1.864 s, System: 0.061 s] Range (min … max): 1.917 s … 1.945 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde2/Cargo.toml Time (mean ± σ): 3.974 s ± 0.020 s [User: 3.893 s, System: 0.076 s] Range (min … max): 3.956 s … 4.004 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/slice-get-unchecked/Cargo.toml Time (mean ± σ): 780.0 ms ± 5.3 ms [User: 740.3 ms, System: 39.0 ms] Range (min … max): 771.2 ms … 784.5 ms 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/unicode/Cargo.toml Time (mean ± σ): 1.643 s ± 0.007 s [User: 1.584 s, System: 0.058 s] Range (min … max): 1.635 s … 1.654 s 5 runs ``` (This change is marginal, but the point is that it avoids a much more significant regression)
Track local frames incrementally during execution rust-lang/miri#2646 currently introduces a performance regression. This change removes that regression, and provides a minor perf improvement. The existing lazy strategy for tracking the span we want to display is as efficient as it is only because we often create a `CurrentSpan` then never call `.get()`. Most of the calls to the `before_memory_read` and `before_memory_write` hooks do not create any event that we store in `AllocHistory`. But data races are totally different, any memory read or write may race, so every call to those hooks needs to access to the current local span. So this changes to a strategy where we update some state in a `Thread` and `FrameExtra` incrementally, upon entering and existing each function call. Before: ``` Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/backtraces/Cargo.toml Time (mean ± σ): 5.532 s ± 0.022 s [User: 5.444 s, System: 0.073 s] Range (min … max): 5.516 s … 5.569 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/mse/Cargo.toml Time (mean ± σ): 831.4 ms ± 3.0 ms [User: 783.8 ms, System: 46.7 ms] Range (min … max): 828.7 ms … 836.1 ms 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde1/Cargo.toml Time (mean ± σ): 1.975 s ± 0.021 s [User: 1.914 s, System: 0.059 s] Range (min … max): 1.939 s … 1.990 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde2/Cargo.toml Time (mean ± σ): 4.060 s ± 0.051 s [User: 3.983 s, System: 0.071 s] Range (min … max): 3.972 s … 4.100 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/slice-get-unchecked/Cargo.toml Time (mean ± σ): 784.9 ms ± 8.2 ms [User: 746.5 ms, System: 37.7 ms] Range (min … max): 772.9 ms … 793.3 ms 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/unicode/Cargo.toml Time (mean ± σ): 1.679 s ± 0.006 s [User: 1.623 s, System: 0.055 s] Range (min … max): 1.673 s … 1.687 s 5 runs ``` After: ``` Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/backtraces/Cargo.toml Time (mean ± σ): 5.330 s ± 0.037 s [User: 5.232 s, System: 0.084 s] Range (min … max): 5.280 s … 5.383 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/mse/Cargo.toml Time (mean ± σ): 818.9 ms ± 3.7 ms [User: 776.8 ms, System: 41.3 ms] Range (min … max): 813.5 ms … 822.5 ms 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde1/Cargo.toml Time (mean ± σ): 1.927 s ± 0.011 s [User: 1.864 s, System: 0.061 s] Range (min … max): 1.917 s … 1.945 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/serde2/Cargo.toml Time (mean ± σ): 3.974 s ± 0.020 s [User: 3.893 s, System: 0.076 s] Range (min … max): 3.956 s … 4.004 s 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/slice-get-unchecked/Cargo.toml Time (mean ± σ): 780.0 ms ± 5.3 ms [User: 740.3 ms, System: 39.0 ms] Range (min … max): 771.2 ms … 784.5 ms 5 runs Benchmark 1: cargo +miri miri run --manifest-path /home/ben/miri/bench-cargo-miri/unicode/Cargo.toml Time (mean ± σ): 1.643 s ± 0.007 s [User: 1.584 s, System: 0.058 s] Range (min … max): 1.635 s … 1.654 s 5 runs ``` (This change is marginal, but the point is that it avoids a much more significant regression)
83b0f42
to
5458530
Compare
5458530
to
b06737e
Compare
@rustbot ready |
} | ||
} | ||
|
||
impl IndexMut<VectorIdx> for VClock { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side-note: we could probably use https://doc.rust-lang.org/nightly/nightly-rustc/rustc_index/vec/struct.IndexVec.html for VClock
and https://doc.rust-lang.org/nightly/nightly-rustc/rustc_index/macro.newtype_index.html for VectorIdx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, IndexVec
just wraps a Vec
and currently this uses SmallVec
, and this PR is already messing with the inline capacity. There are ambient concerns about the data race detector overhead: #2716 #1689 which in my experience are not backed up by profiling, but I still think the prudent thing to do is adjust one thing at a time here. Happy to look into this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
/// The type of the time-stamps recorded in the data-race detector | ||
/// set to a type of unsigned integer | ||
pub type VTimestamp = u32; | ||
/// The time-stamps recorded in the data-race detector consist of both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything starting at consists should be on the fields. Internal docs use the flag to generate docs for private items
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to be still open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me after a doc nit
@@ -172,7 +173,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriInterpCxExt<'mir, 'tcx> { | |||
|
|||
// Each complete happens-before the end of the wait | |||
if let Some(data_race) = &this.machine.data_race { | |||
data_race.validate_lock_release(&mut init_once.data_race, current_thread); | |||
data_race.validate_lock_release(&mut init_once.data_race, current_thread, current_span); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to move the current_span
calls behind the fn call? That seems nicer than putting the burden on all callers. Basically instead of passing current_thread and current_span it could pass &Machine
?
EDIT: Ah, that doesn't work since it overlaps with data_race
. Hm... I guess we can go ahead with separate arguments for now, but if you have a nice idea for how to avoid having to pass so many arguments, that would be great.
| | ||
LL | ... *pointer.load(Ordering::Relaxed) | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
help: The Allocate on thread `<unnamed>` is here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is "is" the right verb? One of them happened in the past. Do we know which one? Can we make the message "Earlier, the X on thread Y happened here" and "Now, the X' on thread Y' happened here"? In fact isn't the latter span redundant with the main error span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah I was also not sure what words to use here. This is a data race, so strictly does it make any sense to say that one of these operations happens before the other? They happen in a particular order in the interpreter, but should diagnostics be commenting on the interpreter or the program semantics (or lack thereof)?
I chose present tense because I imagine this diagnostic as a response to the current user experience, where someone would rightfully ask "But Miri, where is the read/write/allocate/deallocate?" so now Miri says "The read/write/allocate/deallocate is here"
(As I describe this I realize it is not consistent with how I worded the Stacked Borrows diagnostics. Ah well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With vector clocks, there could actually be considerate amounts of "real time" between the two operations, even in an observable way. But you also have a point.
What about the fact that one of these two spans is always redundant since it is the same as where the error points? I think it'd be better to just say "the other access was over there" or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've tried to reduce the duplication here, but I feel backed into a corner again by the limited diagnostics API. Doesn't change the fact that this is still much better than before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems potentially confusing, since after the help
we have a backtrace that attaches to the "main span", not the "other access span". I think that's why I think it is good to say something about the "earlier conflicting access". Also if both accesses are of the same kind, saying "the Write" is not very informative.
So what about
error: Undefined Behavior: Data race detected between (1) Atomic Store on thread `<unnamed>` and (2) Read on thread `<unnamed>` at ALLOC. Access (2) just happened here
--> $DIR/atomic_write_na_read_race1.rs:LL:CC
|
LL | *atomic_ref.get_mut()
| ^^^^^^^^^^^^^^^^^^^^^
|
help: and access (1) occurred earlier here
--> $DIR/atomic_write_na_read_race1.rs:LL:CC
|
LL | atomic_ref.store(32, Ordering::SeqCst)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside closure at $DIR/atomic_write_na_read_race1.rs:LL:CC
Yeah it is annoying that the same text is repeated for the error title and label -- that would be #2200, which will be very annoying work since all the ui tests will need adjustment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there is potential for confusion with the backtrace. I often find the Stacked Borrows errors a bit visually confusing, and have often wanted some other way to format all this information. I'll try to keep an eye on how this plays out.
I like this numbering strategy overall, I'm just not sure about calling these an "access". That word has a particular meaning in Stacked Borrows, and feel like calling an allocation or deallocation an access might be confusing to newcomers (even though it is technically correct to use the word). I think we can just drop the word without loss of clarity?
Also it occurs to me that we're now comparing a "local" span with the topmost frame, so things could look a bit odd if the race occurs through a standard library function that isn't #[track_caller]
. I can't find any examples of this, so it may be that the way people write data races, this just isn't a concern. Just mentioning this so you're aware of it, I think we shouldn't cook up a solution until we have a real example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it occurs to me that we're now comparing a "local" span with the topmost frame, so things could look a bit odd if the race occurs through a standard library function that isn't #[track_caller].
The main backtrace will also skip track_caller
frames by default so I think one needs to pass -Zmiri-backtrace=full
to get anything weird?
I pushed an attempt to clarify the backtrace situation a bit. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the clarification is good. It's a marginal improvement, but an improvement.
Here is an example of a program that produces the flavor of weird output I'm referring to:
use std::sync::atomic::AtomicPtr;
fn main() {
let mut v: Vec<i32> = Vec::with_capacity(2);
let ptr = AtomicPtr::new(v.as_mut_ptr());
let handle = std::thread::spawn(move || {
let mut v = unsafe { Vec::from_raw_parts(ptr.into_inner(), 0, 2) };
v.push(0);
});
v.push(0);
let _ = handle.join();
}
error: Undefined Behavior: Data race detected between (1) Write on thread `main` and (2) Write on thread `<unnamed>` at alloc1617. (2) just happened here
--> /home/ben/.rustup/toolchains/miri/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1839:13
|
1839 | ptr::write(end, value);
| ^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) Write on thread `main` and (2) Write on thread `<unnamed>` at alloc1617. (2) just happened here
|
help: and (1) occurred earlier here
--> demo.rs:9:5
|
9 | v.push(2);
| ^^^^^^^^^
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE (of the first span):
= note: inside `std::vec::Vec::<i32>::push` at /home/ben/.rustup/toolchains/miri/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1839:13: 1839:35
note: inside closure
--> demo.rs:7:9
|
7 | v.push(1);
| ^^^^^^^^^
The wording of our diagnostic pretty clearly asks the user to compare the topmost span of the backtrace and the first help
span, but the topmost span can be dislocated from the user code by a fair bit if it is buried inside a standard library data structure or algorithm. For example, if a data race is triggered by the PartialOrd
impl used in the depths of slice::sort
. (as I write this I'm realizing I could have cooked up a more extreme example, ah well...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Yeah... I think this PR here is still an improvement, feel free to file an issue to track this problem and possible ideas for how to fix it.
@saethlin could you do benchmarks with |
Before:
After:
This is about the size of the effect I expected. |
| | ||
help: and (2) occurred earlier here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So (2) happened before (1) in execution order. That seems confusing? I had deliberately swapped them in my proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 I was so sure I was following your proposal
@bors r+ |
☀️ Test successful - checks-actions |
update Miri Noteworthy PRs: - rust-lang/miri#2357 - rust-lang/miri#2646 - rust-lang/miri#2718 - rust-lang/miri#2721 - rust-lang/miri#2725
update Miri Noteworthy PRs: - rust-lang#2357 - rust-lang#2646 - rust-lang#2718 - rust-lang#2721 - rust-lang#2725
update Miri Noteworthy PRs: - rust-lang/miri#2357 - rust-lang/miri#2646 - rust-lang/miri#2718 - rust-lang/miri#2721 - rust-lang/miri#2725
Fixes #2205
This adds output to data race errors very similar to the spans we emit for Stacked Borrows errors. For example, from our test suite:
Because of #2647 this comes without a perf regression, according to our benchmarks.