-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(array): use-after-free in optimized String array #5377
Conversation
Test that potential use after free bugs can be exercised through the flux language by modifying executetest.ProcessTestHelper2 to use a memory allocator that scrubs the memory buffer when freeing.
The optimised string arrays did not always copy values before freeing the backing data, leading to potentially unpredictable behaviour. Because the default allocator, which is presumably used in most circumstances, uses the go runtime allocator underneath it is likely that this wouldn't normally affect the correctness of operations. The more likely problem is that large memory blocks that the execution engine thinks has been freed are not being garbage collected because of refences to strings within the memory block. The array.StringBuilder is changed to always copy the value into an internal memory buffer when a value is appended, The internal buffer will always be allocated from the the memory allocator. When retriving a string value from array.String the returned value will be a copy from the internal buffer. This makes the string much easier to reason about as it fits with the standard go string semantics. This string is allocated by the go runtime rather than the memory allocator. The above changes make the system safer, but are likely to introduce a reduction in performance due to the increase in copying of data in memory. In order to provide a faster path some byte slice based methods have been added to allow for less copying in places where it can easily be determined that values will be finished with before the memory is freed.
Create custom string addition (concatenation) functions that limit the amount of copying required for the operation.
In places where it is clearly safe to do so use the byte-slice oriented string column functions to minimize the amount of data copies that are made when processing string columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all seems reasonable, @mhilton. We'll need to cut a pre-release build to canary in Cloud 2 so we can assess the performance impact.
@brettbuddin I spoke with @mhilton today and general plan, as you've mentioned, will be to canary in staging tomorrow morning and then in the region affected by the EAR Friday morning (EU morning). Then GA next week if that all goes well. |
Undo the changes introduced in #5377. This is a trade-off where it is considerd better to have as much memory as possible allocated by the allocator, with the risk that it will be freed when still in use. Than to have significant amounts of memory allocated outside of the allocator. It should be noted that because of Go's garbage collector it is perfectly safe for memory to be "freed" when still in use. It just mean that any accounting done by the allocator will be inaccurate.
* feat: stop making string copies Undo the changes introduced in #5377. This is a trade-off where it is considerd better to have as much memory as possible allocated by the allocator, with the risk that it will be freed when still in use. Than to have significant amounts of memory allocated outside of the allocator. It should be noted that because of Go's garbage collector it is perfectly safe for memory to be "freed" when still in use. It just mean that any accounting done by the allocator will be inaccurate. * feat: remove unnecessary StringRef type Remove the recently added StringRef type as it is now unnecessary.
The optimized string array, which is used whan all column values are identical, was caching the string value that was passed to it. If this value came from an arrow String array then the string's data would be in the buffer backing that array. Once the array is finished with and freed the new string array containing the reference to the data buffer remains and prevents the entire buffer from being garbage collected.
The changes to the executetest package show that this is the case by scribbling over a buffer's memory when it is reported as free. This caused a number of tests to fail with strings that contain the data pattern rather than the expected result.
This could cause a correctness error if one uses a more sophisticated allocator that maintains a backing buffer. This would be a legitimate design choice in order to relieve pressure on the garbage collector.
In order to correct this the StringBuilder has been enhanced to always copy the input string into a new buffer whether it is in the optimised mode, or not. This buffer is allocated from the memory allocator. Once the builder has been converted into a string array retrieving a string from this array always makes a string copy of the data, this time allocated by the standard runtime, so that the string can be used with other structures without the possibility that it will keep a larger data buffer from being garbage collected.
These changes are likely to have an impact on the performance of certain operations. Increasing the amount of memory copying will mean that some transform heavy workloads take longer. I don't know how to ascertain the impact yet. In order to alleviate this slightly a new byte array based method (ValueBytes) has been added to the string array. This does not copy the data but instead returns a slice of the data backing the array. This is to be used when the caller knows that they will be using the data without storing a reference, or immediately copying it to a different location.
Checklist
Dear Author 👋, the following checks should be completed (or explicitly dismissed) before merging.
experimental/
docs/Spec.md
has been updatedDear Reviewer(s) 👋, you are responsible (among others) for ensuring the completeness and quality of the above before approval.