-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TokenStream manipulations are 1000x too slow #65080
Comments
This is (AFAICT) from a brief analysis because the code appears to be allocating a fresh vector for the entire tokenstream each time we append one token. I believe that is because our Lines 170 to 184 in 032a53a
That has a FIXME from eddyb that this is super inefficient, and rightfully so. We are essentially always creating a TokenStreamBuilder, pushing on ~two streams, one with lots of elements and the other with two (token from original extend, as well as non joint). Having done that, we call TokenStreamBuilder::build, which calls TokenStream::from_streams with two streams, which allocates the vector for the contents of both streams, and we have a new tokenstream. I believe this means that in order to append N tokens we will create N vectors of 1, 2, 3, 4, .. N elements (or so, possibly modulo a constant factor of some kind). If N is large, this is really slow. cc @pnkfelix as well, as they did some investigating into a similar issue in #57735 (comment). I am not sure what we can do about this -- ideally, we'd not be re-creating new tokenstreams on every appended token, but TokenStream seems to have a structure that isn't really amenable to us appending tokens into it (despite the name). Maybe we can have proc macro keep a |
cc @nnethercote as well |
I remember the FIXME but I also remember this not being that big of a problem for some reason. I guess if the first impl wasn't ending up using the second impl, it would be faster?
Maybe the main problem here is that there is no API for creating a builder out of a Sadly the original vision of using ropes falls apart when you can't even slice a |
One nice way of doing this (if all else fails) is representing |
I think you misread that code. It's a token stream with a single element; that element is a 2-tuple.
I have confirmed (via profiling and |
I tried changing pub struct TokenStream(Option<Lrc<Vec<TreeAndJoint>>>); to this pub struct TokenStream(Vec<TreeAndJoint>>); This drastically sped up the microbenchmark under discussion, because new elements can be trivially appended to an existing TokenStream, rather than having to duplicate before appending. However, it was a big slowdown for benchmarks. Most of them slowed down 2-20%, but |
@nnethercote I'm not entirely sure where this bottoms out so this may be a distracting comment, but would |
@alexcrichton: Indeed, I had the same thought late last night! I have used |
#65198 fixes this so that the It would make sense to add a benchmark to |
Speed up `TokenStream` concatenation This PR fixes the quadratic behaviour identified in #65080. r? @Mark-Simulacrum
I'd personally prefer the microbenchmark as it's less likely to stop building in the future in some sense :) But that PR looks excellent. Do we know if the remaining 10x cost is for some particular reason, or are strings just that much faster than the token code? (Lrc, etc.) |
The 10x is because there's a lot of faffing around: a couple of |
Speed up `TokenStream` concatenation This PR fixes the quadratic behaviour identified in #65080. r? @Mark-Simulacrum
Oh I thought the implementation must be using There's likely no benefit in representing a |
@dtolnay: is the improved performance good enough for your purposes? |
I confirmed that the performance is massively improved in nightly-2019-10-10. Thanks! FYI @illicitonion @anlumo |
Context: illicitonion/num_enum#14
Switching a proc macro from being token-based to operating on strings with just a final conversion from string to TokenStream can be a 100x improvement in compile time. If we care that people continue to write macros using tokens, the performance needs to be better.
I minimized the slow part of the num_enum macro to this benchmark:
https://github.com/alexcrichton/proc-macro2/tree/12bac84dd8d090d2987a57b747c7ae7bbeb8a3d0/benches/bench-libproc-macro
On my machine the string implementation takes 8ms and the token implementation takes 25721ms.
I know that there is a proc macro server that these calls end up talking to, but I wouldn't expect this huge of a factor from that. If the server calls are the only thing making this slow, is there maybe a way we could buffer operations in memory to defer and batch the server work?
I will file issues in proc-macro2 and quote as well to see if anything can be improved on their end.
FYI @eddyb @petrochenkov @alexcrichton
The text was updated successfully, but these errors were encountered: