Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOM crash, when closing a transaction while Queries are still ongoing #1193

Merged
merged 2 commits into from
May 29, 2024

Conversation

reckter
Copy link
Contributor

@reckter reckter commented May 16, 2024

When closing a transaction while queries are still running, the _onErrorCallbacks will cause an infinite loop which leads to an OOM crash.

Because FAILED is only set as a state in this function, I opted to use it as a failsafe check, to cut the recursion short.

The test triggers the OOM crash when the fix is not included, otherwise it passes in <1s on my machine.

I hope the test is ok as it is, let me know if you need any changes for this.

EDIT: Upon a bit further testing, it's not actually infinite, but just scaling exponentially.
3 simultaneous queries gets us ~3000 _onErrorCallback calls,
4 gets us ~32000,
5 gets us ~195000,
6 gets us ~840000,
7-12 gets us "Map maximum size exceeded"
13+ gets us the mentioned OOM

…ng queries

I noticed that while closing a transaction and concurrently still having queries left running the driver eventually causes an OOM node error. This test case causes this exact error.
A fix will (hopefully) come in the next commit
…ll running queries.

When closing a transaction, that still had queries open the `_onErrorCallback` in core/src/transaction.ts:303 called
`resultStreamObserver.onError` for each result. The default behaviour for `FailedObvserver` (one implementation) is to call `this._beforeError`(core/src/internal/observer.ts:179), which is set to `onError`, which happens to be set to the mentioned
`_onErrorCallback`.
This results in an infinite async-loop, which gradualy consumes all available memory and crashes the process.

Because the `FAILED` state is only set in `_onErrorCallback` we can use it as a flag, to cut off the infinite recursion, when we are allready in the `FAILED` state.
@reckter
Copy link
Contributor Author

reckter commented May 21, 2024

@bigmontz (Sorry for the ping, but this is a server-crashing issue for us, which actually causes errors for our users.).

If there is anything I can do to get this merged faster, please let me know :)

Thanks!

@bigmontz bigmontz self-requested a review May 29, 2024 09:41
Copy link
Contributor

@bigmontz bigmontz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

Thanks for the contribution and sorry for the late reply, @reckter. The PR is approved and it will be part of the next release (tomorrow).

@bigmontz bigmontz merged commit d5bd032 into neo4j:5.0 May 29, 2024
37 checks passed
@reckter reckter deleted the OOM-while-closing branch May 31, 2024 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants