-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SortPreservingMerge does not account for memory usage #5885
Comments
)" This reverts commit cf81117.
)" This reverts commit cf81117.
There is a PR open for this feature -- I think it is valuable to complete. However, it currently doesn't pass the PR tests (I think the tests need some updating) |
)" This reverts commit cf81117.
)" This reverts commit cf81117.
We saw this cause troubles for us during some internal testing. More details to come |
Specifically, we have pretty good evidence that for dictionary encoded arrays with high cardinality, the interned dictionary values consume an ever increasing amount of memory |
Is your feature request related to a problem or challenge?
SortPreservingMerge currently has extremely limited memory accounting functionality, with no accounting for buffered batches or cursors.
The only memory accounting is a static assignment at construction time by
ExternalSorter
of the size of the in memory batches, when merging spilled and in-memory data. This assignment is never decremented, and does not take into account any memory usage resulting from loading the spilled data back into memory.Describe the solution you'd like
SortPreservingMerge should account for the memory usage of the data it has buffered
Additionally the various streams created by
ExternalSorter
, both for in-memory and spilled data, should be accounted forDescribe alternatives you've considered
No response
Additional context
#5879 tracks unifying the sorting implementations, which may help make this story more consistent.
The text was updated successfully, but these errors were encountered: