Reduce memory usage of DELETE operations #11470
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When a delete is executed, we push information about that delete into the
UndoBuffer
. This information allows us to then later commit or rollback the actual delete. This happens in the form ofDeleteInfo
structs. When deleting a lot of data - for example when running aDELETE FROM large_tbl
command - many of these structs are created. Since theUndoBuffer
structure does not support offloading to disk (yet) this could lead to out-of-memory exceptions.This PR improves the memory efficiency of the
DeleteInfo
struct in two ways:row
field fromrow_t
touint16_t
. The rows that are stored are relative to the vector they refer to. As such these values can never exceedSTANDARD_VECTOR_SIZE
. As such, we can always store these values in auint16_t
. This reduces memory usage of theDeleteInfo
struct by up to 4x.0, 1, 2, 3, 4, 5, ...
) we avoid storing the row identifiers at all. Instead, we store a booleanis_consecutive
. If this is set, during actual delete operations, we reconstruct the row identifiers from the count. This improves memory usage even further, particularly when entire row groups or tables are deleted (as is the case in the previousDELETE FROM large_tbl
command).