-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix exponential execution time by memoizing format_expr #5139
Conversation
Test are failing strangely. I think my |
I can't say definitively whether there actually is a combination that will prove sufficiently unique, so you'll probably have to continue experimenting to gauge feasibility. Perhaps the expression's nodeid? I also wouldn't be terribly keen on introducing a mutexted global, would it be feasible to attach the map to the rewrite context somehow? |
Expression's node id is shared between all expressions of a def id. But I found that if I exclude macros (memoization doesn't help there anyway) it will pass tests. Probably span value is unique outside of macros.
I tried adding a |
Sorry haven't had a chance to dig into this in detail and probably won't for the next couple weeks due to the holidays. I'd suspect that's a case of a cache miss with the former, perhaps due to some slightly different logic needed vs. the global. Overall I think you're on to something here so really appreciate you digging in! Would like to continue iterating on the implementation specifics though, as well as incorporating some tests |
a888951
to
6ff4a71
Compare
The problem was that rewrite context doesn't live long enough, so I moved it one layer up and it is now fixed.
I added examples in those issues as regular tests, so a regression would cause |
Fantastic thank you! |
src/rewrite.rs
Outdated
pub(crate) span: SpanData, | ||
} | ||
|
||
pub(crate) type Memoize = Option<HashMap<QueryId, Option<String>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the outer Option
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing! Yes, in case of None
function should clear the store at the end, but in Some(empty)
it should fill it and keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm wouldn't it be the same to do clean = map.is_empty()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*map = Some(HashMap::default());
clean = true;
If we set clean = map.is_empty()
, then *map = Some(HashMap::default());
should become inserting some dummy thing (which isn't good) to map, otherwise subsequent calls will also receive empty map and clean it, while they shouldn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I wonder if there is a cleaner solution here, such as clearing the map after each Item
is formatted.
Maybe this is because id's arent assigned until macro expansion (code), which I assume doesn't run for rustfmt? If that is the case, I wonder if it would be worthwhile to make a change upstream so that node ids may be assigned without macro expansion. |
Thanks again @HKalbasi! Few questions/asks:
|
Nothing prevent it from hitting more than once. So if we remove it from map in the first hit, we need to calculate it again. If clones are too bad, I think we can use
I don't think that's an optimization, but it would make code cleaner. I'm not confident where to clean cache so it cleans enough and not more. And in current way all of codes related to memoizing are in the same function, so lets keep it as is. I tried to explain why there is The second one (RefCell vs Cell) was an optimization, and it's done. |
re: my question in #5139 (comment) and 👇
I was trying to think of a scenario where we'd have to recompute a In short, I'm trying to be mindful of time vs. space complexity, especially since the originating issues occur in only the most extreme of circumstances derived from generated code whereas this change is going to impact quite literally everything. As such, I do want to be mindful of clones. If there's no way around it then so be it, but it's certainly worth the thought exercise |
I'm not sure at this point if we hit
Yes, now I see the second clone is pure overhead in normal cases. |
.and_then(|state| { | ||
state.sequence(|state| { | ||
self::space(state) | ||
.and_then(|state| super::hidden::skip(state)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation here is quite off. I can't say offhand whether that's directly related because we don't really have a good frame of reference (as IIRC rustfmt on master is crashing on this input), but it could be indicative of a shape issue. The human-level diff comparison is also tough to do by eye given the heavy ident reuse.
Would you be willing to try to boil this down to a smaller repo, starting with the containing function at the same indentation level to see if we can rule out there being any shape-related oddities in the map entries? If it turns out to just be a chain issue (which wouldn't surprise me in the least) that'll be good to know, but definitely doesn't need to be resolved here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any example should fill the max_width
limit, either with indentation or with decreasing the limit. example in the #4867 is fairly minimal and I tried to minimize it more:
fn func() {
state . rule (Rule :: myrule , | state | { state . sequence (| state | { state . sequence (| state | { state . match_string ("abc") . and_then (| state | { super :: hidden :: skip (state) }) . and_then (| state | { state . match_string ("def") }) }) . and_then (| state | { super :: hidden :: skip (state) }) . and_then (| state | { state . sequence (| state | { state . optional (| state | { state . sequence (| state | { state . match_string ("abc") . and_then (| state | { super :: hidden :: skip (state) }) . and_then (| state | { state . match_string ("def") }) }) . and_then (| state | { state . repeat (| state | { state . sequence (| state | { super :: hidden :: skip (state) . and_then (| state | { state . sequence (| state | { state . match_string ("abc") }) }) }) }) }) }) }) }) }) });
}
this will take 7s on my system with (max_width=70
) and then return no solution. Examples that have solution in the max_width
limit are faster since the solution is not the last thing that we try, so minimizing them is harder because each step halves the execution time. Does this example helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I don't understand your response in relation to the question that was posed. The issue here is that the emitted formatting is wrong. The runtime performance of how long it takes to produce the incorrect formatting is orthogonal.
Do you have any insight into why this is producing an incorrect result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I completely ignored the selected lines and misunderstood your message. This seems very bad, will look at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries, and thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems in lines 192-207 tabs are 2 spaces but in rest tabs are 4 spaces, any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the input had two spaces iirc, so perhaps it's somehow "successfully" falling back to the span contents of the input? fwiw it might be more helpful on human eyes to butcher the input formatting a bit more and/or use some different idents given how much similarity there is (as that would make things stand out more between input/output)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, if I change the spaces to 1, it will be 1 in the output. And I can reproduce it on rustfmt installed on my machine after ~50s, so it is not caused by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay thank you for confirming. That strikes me as very odd behavior considering that we're dealing with a chain (a massive one but a chain nevertheless) and how chain formatting behaves in less.. intense scenarios, but so long as it's not something being introduced/modified as part of this PR then we should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly minimized, takes 3s on rustfmt master: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=d36a8d8e3ddc4b6b0223144a4b73ded9
Increasing max_width
will fix the problem, so there is a possiblity that this problem have the same root with the long running time.
I won't speak to the set vs. map piece, though to the original question, the only way rustfmt would ever format one of these expressions incorrectly would be if we'd done something egregious and were storing the wrong
Will differ to your preference, but I want to note the execution time numbers I'm seeing locally with your current version (clone) and with the adjustment I suggested to remove the entry from the map via a mutable ref so as to avoid the need for the clone in the return case: (note I've elided the Current with the $ time target/debug/rustfmt tests/target/performance/issue-5128.rs --check
.....
real 0m0.632s
user 0m0.619s
sys 0m0.012s With $ time target/debug/rustfmt tests/target/performance/issue-5128.rs --check
.....
real 0m0.635s
user 0m0.622s
sys 0m0.012s And using the original/source input, as-is with $ time target/debug/rustfmt tests/source/performance/issue-5128.rs --check
.....
real 0m8.926s
user 0m8.822s
sys 0m0.069s with $ time target/debug/rustfmt tests/source/performance/issue-5128.rs --check
.....
real 0m8.776s
user 0m8.734s
sys 0m0.041s Realize the above doesn't exactly qualify as a statistically significant analysis but I'm seeing the same pattern consistently across multiple runs, and for the other input files as well. This gives me sufficient confidence the extra clone has no substantial execution time impact, at least not in any context known to us today, and we should go with one of the discussed approaches to avoid it. |
Also just want to say thanks again for all your efforts on this! I think it'll be a great step forward and address several known problems, but also want to be very careful and thoughtful given the code paths that are getting touched. |
Third hits are not impossible, we can see by testing this (it fails): if let Some(r) = map.remove(&query_id) {
map.insert(query_id, Some("we don't hit again".to_string()));
context.memoize.set(Some(map)); // restore map in the memoize cell for other users
return r;
} Your numbers shows that things which we calculate exactly two times are more than things which we calculate more times. Existence of third hits can lead to exponential time theoretically, thought it isn't probably important in practical cases. Anyway, that clone is in cold path in normal codes ( What clone that is IMO more important, and has overhead relative to previous state for normal codes, is the second clone: if let Some(mut map) = context.memoize.take() {
map.insert(query_id, r.clone()); // insert the result in the memoize map
context.memoize.set(Some(map)); // so it won't be computed again
} In normal cases, most of expressions won't even hit once and they will waste. I don't have solution for it. I tried
I completely understand your concerns, and thanks for maintaining rustfmt! |
Yes, though I'd basically accepted that we're going to have to introduce at least one, just as one of the inherent drawbacks with memoization. While I agree that the second clone won't always be at play in normal (non-generated, non-borderline-pathlogical 😆) code, I feel like it may with certain expressions in scenarios such as RHS comparisons.
That's a much more practical way of testing it than my trying to mentally execute code flows! Based on what we're seeing here I'm starting to suspect there's probably a middle ground somewhere that would both be a marked improvement in the generated code cases while also still minimizing unnecessary clones in more standard cases, even if the result is still less than optimal from an asymptotic complexity pov. I wonder if we could use a more advanced type for the map values which included some kind of a hit counter that would be used to conditionally determine whether to clone + attach the result to the map entry, and potentially conditionally remove the entry as well to split the difference on some of the cloning I was previously referring to |
I measured execution time of system and idempotence tests (I know they are not necessarily good data, just it was easy to add a timer in
results are mixed, but I think 3 is slightly faster. The difference is cost of the second clone, minus cost of recalculations in third times. I tried to make a version of 1 with an additional clone, but optimizer seems catch me and remove the clone.
I'm fine with removing from map on first hit, but data above doesn't support it makes a noticeable difference. And I want to mention |
I'd put this on hold for a bit as the release we had pending was already large and I didn't want to try to pull this in on top, but am circling back to it now. I am, however, unsure as to what you were trying to test/assert with your last comment as it seems duplicative of what was established earlier relative to timing: my prior point was that the run/execution time was largely the same regardless of whether there was an extra clone, so I'd rather avoid allocating increasingly long strings that weren't providing any benefit at runtime nor IMO pre-compile time in terms of the source code.
Apologies if I forgot to respond to this thread previously, but I saw it and agreed completely that I don't want to go down this route. |
Also, the review comment I left in #5139 (comment) is still outstanding and is something we'd need clarity on before proceeding. I can't tell if the shape is just wrong or if we've perhaps got a mapping issue, but something is wrong there. If we can come to a resolution on that then I think I'll be good to move forward with this as-is, though I'll probably spend a little time afterwards doing more experimentation. |
I don't think its affect on memory usage would be more than ten megabytes even in machine generated codes (we can measure it) so is it that bad? |
This is an increasingly moot topic of debate because as noted previously the only outstanding issue needing resolution is relative to this change producing incorrect formatting. I'm not requesting that you make any changes on this piece, however, I want to stress that we don't want to just incorporate unhelpful/unnecessary steps into the codebase unless they clear a threshold of being unnecessarily "bad enough"; we've very much a strategy of "only do that which is necessary". There's plenty of cases, both those that could easily be contrived and practical ones within the rustfmt codebase, where some minimally (or even be negligibly) impactful step could be unnecessarily taken, e.g. a requiring clone instead of working with a slice. However, we'd still want to avoid such an unnecessary action regardless, i.e. go with the slice instead of the unnecessary clone. I'm simply pointing out that the exact same principle is applicable here: we need to have a rationalization for the usage of a clone; we don't want to use clones freely/loosely and then have to rationalize the removal/avoidance of them |
Alrighty, let's give this a try! |
@calebcartwright given that we just had to revert these changes in #5403 do we want to reopen all the linked issues? |
Ah, sorry for breaking rustfmt. If I understand correctly, the problem is that the result is not a function of just |
No worries! This is a fairly tricky area and I didn't foresee this either. I suspect there will be more to it and that we may need a more advanced/conditional type of memoization flow like I was alluding to earlier in these discussions (albeit with even more conditions) |
fix #4476
fix #4867
fix #5128
The problem in those issues is, when
max_width
is tight, rustfmt should check all possible formattings, which is exponential relative to input size. By memoization offormat_expr
function, worst case execution time becomesO(n*max_width^3)
. I tried codes in those issues and it completed in a fraction of second.