Skip to content

Resolve stack overflow in tree visitation when having a reduced stack size #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 26, 2020

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Feb 25, 2020

To determine the correct specific visitation function for a syntax node, we need to switch through a huge switch statement that covers all syntax types. In debug builds, the cases of this switch statement do not share stack space (rdar://55929175). Because of this, the switch statement requires allocates about 15KB of stack space. In scenarios with reduced stack size (in particular dispatch queues), this often results in a stack overflow during syntax tree visitation.

To circumvent this problem, this commit moves the retrieval of the specific visitation function to its own function. This way, the stack frame that determines the correct visitiation function will be popped of the stack before the function is being called, making the switch's stack
space transient instead of having it linger in the call stack.

This tackles the stack overflow issue discovered that's blocking swiftlang/swift-format#117

CC: @keith

@ahoppen ahoppen requested a review from akyrtzi February 25, 2020 09:01
@ahoppen ahoppen force-pushed the circumvent-stack-overflow branch from 2929238 to 0d810ae Compare February 25, 2020 09:04
@ahoppen
Copy link
Member Author

ahoppen commented Feb 25, 2020

@swift-ci Please test

@akyrtzi
Copy link
Contributor

akyrtzi commented Feb 25, 2020

Does this have any effect in release build performance?

@ahoppen ahoppen force-pushed the circumvent-stack-overflow branch from 0d810ae to 33b313d Compare February 25, 2020 20:33
@ahoppen
Copy link
Member Author

ahoppen commented Feb 25, 2020

Ah, forgot to check. I just did and it did have a 50% performance decrease in release builds. I stinn think it is valuable for day-to-day development because dispatch queues are much nicer to work with than Threads which are required to adjust the stack size. So, I updated the PR to hide the slow but less stack-hungry implementation behind a #if flag SWIFT_SYNTAX_WORKAROUND_REDUCE_STACK_ALLOCATIONS.

@keith: You would need to specify SWIFT_SYNTAX_WORKAROUND_REDUCE_STACK_ALLOCATIONS when building a swift-format. Would that work for you?
@akyrtzi: What do you think about this approach?

@akyrtzi
Copy link
Contributor

akyrtzi commented Feb 25, 2020

#if SWIFT_SYNTAX_WORKAROUND_REDUCE_STACK_ALLOCATIONS && DEBUG

I'd recommend to use it unconditionally for DEBUG builds, the performance of the debug build does not matter in practice.

@keith
Copy link
Member

keith commented Feb 25, 2020

Yea I think expecting everyone to use that variable for debug builds if we landed my change probably wouldn't be ok

@ahoppen
Copy link
Member Author

ahoppen commented Feb 25, 2020

My idea was that I didn’t want debug builds to start using a completely different implementation than release builds. This might make debugging weird. If you specify the extra flag, you know what you are opting in to. But I’m happy to just make it conditional on DEBUG if you are OK with it @akyrtzi.

@allevato
Copy link
Member

If this is triggered by a compilation condition defined in SwiftSyntax, there's no way for swift-format to "push that down" into SwiftSyntax when it specifies that dependency in the manifest, right? AFAIK, swift-format can only set the defines in Package.swift on the targets that it defines for itself.

If that's the case, if we accept @keith's change, then it would no longer be possible to just clone swift-format and do swift build -c release anymore, because we'd need to specify -Xswiftc -D SWIFT_SYNTAX_WORKAROUND_REDUCE_STACK_ALLOCATIONS to make sure to get the small-stack-safe rewriter. That makes it very easy to build swift-format wrong and get something that breaks easily because the obvious thing no longer works. Instead, we'd need to wrap our build in some other Python/shell/make layer, and it's really unfortunate every time we're forced to do that. Can we avoid it?

@ahoppen
Copy link
Member Author

ahoppen commented Feb 25, 2020

@allevato Because the stack size issue only happens in debug builds swift build -c release woulds still work. In release builds, the compiler is smart enough, to optimise the excessive stack usage away. Only swift build would stop working.

@akyrtzi
Copy link
Contributor

akyrtzi commented Feb 25, 2020

But I’m happy to just make it conditional on DEBUG if you are OK with it @akyrtzi.

I think it is fine. If you change it and it fixes the stack overflow in debug and has no effect in release performance, it is good from my end.

But I have another question, if we reduce stack usage here are we just pushing the problem further, meaning there is no stack overflow with the swift code we are seeing now, but there can always be some hugely deep-nested piece code that ends up exhausting the stack space again. It's not clear to me, is there code recursion in swift-format that could move to being data recursive and thus being secure against deeply-nested code?

In any case, I think we should get this change in and consider code vs data recursion as a separate thing.

@allevato
Copy link
Member

Ok, thanks for confirming!

In that case, I still agree with @akyrtzi then that this should just be conditional on DEBUG. While I agree it would be nice to not have to switch between two different implementations depending on build mode, I think not breaking the obvious workflow of swift build just working, regardless of build mode, is more important.

Especially since this doesn't just affect swift-format. Requiring every client of SwiftSyntax to define a custom flag for debug builds if they use APIs in a dispatch queue would be a huge usability penalty.

@akyrtzi
Copy link
Contributor

akyrtzi commented Feb 25, 2020

It's not 100% clear to me, is it the rewriter doing code recursion, and could we switch it to data recursion without affecting the public APIs?

@allevato
Copy link
Member

I believe it's doing code recursion. The default implementation of visit calls visitChildren, which in turn ends up calling visit on those nodes after some indirection (correct me if I'm reading it wrong). In a SyntaxRewriter subclass, you have to make sure to call super.visit if you want the children to be visited (or otherwise visit them yourself manually), but either way you have to manually recurse.

Because of that, I'm not sure we could unroll the recursion into an explicitly maintained stack/loop while retaining the API contract we have today.

@akyrtzi
Copy link
Contributor

akyrtzi commented Feb 25, 2020

I see. Later on we could look into adding some rewriting API that is safer to use. Not just to protect against stack overflows with deeply nested code but also protect against messing up the tree invariants (with SyntaxRewriter you may end up with unexpected child nodes if you are not careful with what kind of nodes you return, and get a crash later on).

@ahoppen
Copy link
Member Author

ahoppen commented Feb 25, 2020

But I have another question, if we reduce stack usage here are we just pushing the problem further, meaning there is no stack overflow with the swift code we are seeing now, but there can always be some hugely deep-nested piece code that ends up exhausting the stack space again. It's not clear to me, is there code recursion in swift-format that could move to being data recursive and thus being secure against deeply-nested code?

We are, but I think in practice Swift code tends to not be nested too deply + I assume that these super-nested edge cases tend to be hit in real-world and not development scenarios that use a release build and are thus less prone to stack overflow (because the optimiser does a good job at reducing stack usage).
So, I would like to see if we actually hit more stack overflow issues after this change has landed. As a rough estimate, this change should increase the maximum nesting by a factor of 50x. If we don’t stumble across the issue again, I don’t think it’s worth to invest engineering effort to switch to data recursion (which might or might not have its own performance implications).
But AFAICT, we should be able to switch to data recursion without breaking the public API.

It's not 100% clear to me, is it the rewriter doing code recursion, and could we switch it to data recursion without affecting the public APIs?

It is doing code recursion

…duced stack size

To determine the correct specific visitation function for a syntax node,
we need to switch through a huge switch statement that covers all syntax
types. In debug builds, the cases of this switch statement do not share
stack space (rdar://55929175). Because of this, the switch statement
requires allocates about 15KB of stack space. In scenarios with reduced
stack size (in particular dispatch queues), this often results in a
stack overflow during syntax tree visitation.

To circumvent this problem, this commit moves adds a flag that moves the
retrieval of the specific visitation function to its own function. This
way, the stack frame that determines the correct visitiation function
will be popped of the stack before the function is being called, making
the switch's stack space transient instead of having it linger in the
call stack.

The workaround currently has a 50% performance decrease in release
builds, so it is only used in debug builds.
@ahoppen ahoppen force-pushed the circumvent-stack-overflow branch from 33b313d to a0fd756 Compare February 25, 2020 21:35
@ahoppen
Copy link
Member Author

ahoppen commented Feb 25, 2020

I have updated the PR to make it conditional on the DEBUG flag only.

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 0d810ae

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 0d810ae

Copy link
Contributor

@akyrtzi akyrtzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, assuming it doesn't affect release performance.
Thank you Alex!

@allevato
Copy link
Member

But AFAICT, we should be able to switch to data recursion without breaking the public API.

How would this work when overridden visit methods can choose to call super.visit, not call it, or selectively call visit directly on specific child nodes before returning the node that should be placed back into the tree?

For example, many early swift-format bugs were caused by omitting calls to super or to child nodes, which meant that we would rewrite the top-most node that we saw of a particular type, but then never traverse down to its descendants to rewrite those. So most of our rewriters rely on explicit code recursion.

I haven't thought this completely through, but it might be possible to keep the same API signatures with a bottom-up rewriting implementation, but I'm having trouble seeing how we could also retain the behavior of existing subclasses.

Happy to take this discussion offline though; I'd love to see the recursion be removed, so if there's something I'm missing that makes that possible while also preserving the rest of the contract, please show me the piece I'm missing!

@akyrtzi
Copy link
Contributor

akyrtzi commented Feb 26, 2020

So, I would like to see if we actually hit more stack overflow issues after this change has landed. As a rough estimate, this change should increase the maximum nesting by a factor of 50x. If we don’t stumble across the issue again, I don’t think it’s worth to invest engineering effort to switch to data recursion (which might or might not have its own performance implications).

I agree it is not urgent, but note that you can't really determine how safe nesting is by just examining SyntaxRewriter in isolation. The users of this class are going to add their own stack usage on top of what SyntaxRewriter is doing.

@ahoppen
Copy link
Member Author

ahoppen commented Feb 26, 2020

LGTM, assuming it doesn't affect release performance.

I just double-checked. The change does not have an impact on the testEmptyRewriterPerformance performance test.

How would this work when overridden visit methods can choose to call super.visit, not call it, or selectively call visit directly on specific child nodes before returning the node that should be placed back into the tree?

You’re right of course. I didn’t think it through properly. I believe, I was thinking of the syntax visitor, not the syntax visitor, not the syntax rewriter 🤦🏽‍♂️

I agree it is not urgent, but note that you can't really determine how safe nesting is by just examining SyntaxRewriter in isolation. The users of this class are going to add their own stack usage on top of what SyntaxRewriter is doing.

Yeah, but maybe we have given ourselves enough headroom now, that the issue simply never comes up. Anyway, I think we all agree that it’s not urgent for now.

@ahoppen ahoppen merged commit 7251727 into swiftlang:master Feb 26, 2020
@ahoppen ahoppen deleted the circumvent-stack-overflow branch February 26, 2020 08:23
@dduan
Copy link
Contributor

dduan commented Apr 4, 2020

OMG thanks for fixing this💖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants