-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project fails (or takes long time) to compile on latest stable and nightly (hang/stackoverflow) #57735
Comments
Is there perhaps at least a vague idea of what nightly it was that used to work well? |
I can confirm that master branch:
remove-unstable-features branch |
Does not reproduce on Linux. |
Updated my toolchains on OS X, was able to reproduce on stable, on nightly it succeeded but takes an incredibly long time (12 minutes, 41 seconds on 2.2ghz Intel i7) |
Unable to reproduce. I am also on linux. (triple: git clone https://github.com/bgourlie/rs-nes.git
cd rs-nes
git checkout 4d98499f # HEAD of remove-unstable-features at time of posting
cargo +stable check # using rustc 1.32.0 (9fda7c223 2019-01-16) |
Also unable to reproduce on Linux (WSL). |
Okay. I can confirm that it takes a suspiciously long time to I attached cargo check gdb backtraces (nightly rustc daa53a5, rs-nes 4d98499f (remove-unstable-features))
|
When
|
triage. P-high. Assigning to self for further bisection on OS X (to hopefully confirm or reject the reporter's hypothesis regarding #57494 |
On my Mac OS X laptop, the source of the slowdown appears to be tied to the expansion-time construction (or maybe compilation) of the following line: const CYCLES_MAP: [u8; #total_cycles] = [#(#compact_cycle_number_map),*]; Or at least, when I replace that array with one that just says: const CYCLES_MAP: [u8; #total_cycles] = [0; #total_cycles]; then the overall compilation via
|
I am now focusing on bisection; I found that I was able to comment out most of the other code, effectively generating a much reduced test case, which I will try to extract into its own crate (or perhaps pair of source files? we'll see) in a bit. |
Here is a gist that captures the test case I've reduced this to for now (unfortunately gist does not allow forward-slashes in the names so I've used spaces instead of slashes as the directory separator): Regarding that reduced test case, here's what I've observed so far:
That is, running
In the reduced test case, I tried adjusting the size of the array in the generated code by changing the values of Here is a summary of how the
The main reason why I've included the above table (in the details block) is it looks like there may be a non-linear relationship from |
Between the two nightlies I referenced in the above comment, here are the PR's that were merged:
Of the PR's referenced above, my spidey-sense is most-immediately firing in response to seeing #57004 "(Make TokenStream less recursive.) " |
According to an rust/src/libsyntax/tokenstream.rs Line 262 in 106b3e9
I'm going to review #57004 and follow-on code a little, see if I can determine whether we can get rid of that eager traversal and cloning of the contents of the vector of streams. |
cc @nnethercote |
Okay I think I have a fix. I made two different micro-optimizations to the code path I identified in my previous comment.
After rebuilding the compiler with those micro-optimizations, my demo code compiles much faster than before. Here's the current diff. I'm going to try to identify which of the two micro-optimizations was the important one: diff --git a/src/libsyntax/tokenstream.rs b/src/libsyntax/tokenstream.rs
index f5d2d6f18e..db6b701b83 100644
--- a/src/libsyntax/tokenstream.rs
+++ b/src/libsyntax/tokenstream.rs
@@ -255,11 +255,19 @@ impl TokenStream {
0 => TokenStream::empty(),
1 => streams.pop().unwrap(),
_ => {
- let mut vec = vec![];
+ let tree_count = streams.iter()
+ .map(|ts| match &ts.0 { None => 0, Some(s) => s.len() })
+ .sum();
+ let mut vec = Vec::with_capacity(tree_count);
for stream in streams {
match stream.0 {
None => {},
- Some(stream2) => vec.extend(stream2.iter().cloned()),
+ Some(stream2) => {
+ match Lrc::try_unwrap(stream2) {
+ Ok(trees) => vec.extend(trees.into_iter()),
+ Err(stream2) => vec.extend(stream2.iter().cloned()),
+ }
+ }
}
}
TokenStream::new(vec) |
Update: I overlooked @ExpHP 's report that the slowdown does reproduce on Linux. |
Okay I have now determined that on my OS X machine, it is the |
Well, there were two problems here, right? One that does not replicate on Linux (crash), and one that does (super slow |
Oh, I'm sorry, I overlooked @ExpHP 's comment that they reproduced the slowdown on Linux. |
Just to make sure I cover all the issues here, I have now replicated the stack overflow on OS X atop commit 9fda7c2 A backtrace shows that the overflow is happening during a particularly deep series of drop calls. (Chances seem high that this is related in some manner to how the token-tree/token-stream was (re)structured.) |
…h-large-tokenstream-slow, r=eddyb proc_macro: make `TokenStream::from_streams` pre-allocate its vector. This requires a pre-pass over the input streams. But that is cheap compared to the quadratic blowup associated with reallocating the accumulating vector on-the-fly. Fix rust-lang#57735
`TokenStreamBuilder` exists to concatenate multiple `TokenStream`s together. This commit removes it, and moves the concatenation functionality directly into `TokenStream`, via two new methods `push_tree` and `push_stream`. This makes things both simpler and faster. `push_tree` is particularly important. `TokenStreamBuilder` only had a single `push` method, which pushed a stream. But in practice most of the time we push a single token tree rather than a stream, and `push_tree` avoids the need to build a token stream with a single entry (which requires two allocations, one for the `Lrc` and one for the `Vec`). The main `push_tree` use arises from a change to one of the `ToInternal` impls in `proc_macro_server.rs`. It now returns a `SmallVec` instead of a `TokenStream`. This return value is then iterated over by `concat_trees`, which does `push_tree` on each element. Furthermore, the use of `SmallVec` avoids more allocations, because there is always only one or two token trees. Note: the removed `TokenStreamBuilder::push` method had some code to deal with a quadratic blowup case from rust-lang#57735. This commit removes the code. I tried and failed to reproduce the blowup from that PR, before and after this change. Various other changes have happened to `TokenStreamBuilder` in the meantime, so I suspect the original problem is no longer relevant, though I don't have proof of this. Generally speaking, repeatedly extending a `Vec` without pre-determining its capacity is *not* quadratic. It's also incredibly common, within rustc and many other Rust programs, so if there were performance problems there you'd think it would show up in other places, too.
While attempting to compile my application (or simply run
cargo check
), rustc will crash, with slightly different results depending on whether it's the stable or nightly branch.OS: Windows 10 x64 version 1809 build 17763.253
stable toolchain
cargo check
result:thread 'main' has overflowed its stack
Repro steps
cd rs-nes
git checkout remove-unstable-features
cargo check
nightly toolchain
cargo check
result:Process finished with exit code -1
Repro steps
Suspected cause
I've narrowed down the issue to a procedural macro that I've had no issues compiling in the past (on nightly). It generates a large match statement which made me think that this recent change may be related, but I have no evidence to support it other than the fact that it changes how the compiler behaves wrt large match statements.
It's also worth noting that I created the
remove-unstable-features
branch specifically to see if this was only an issue on nightly. It's possible that the error seen on stable is a manifestation of not using unstable features I had been relying on (box patterns, in this case).The text was updated successfully, but these errors were encountered: