-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Field masking has inconsistent memory issues with certain queries #4031
Comments
AnalysisBased on these results here are a couple of the findings. Overall, I would recommend avoiding StringQueries when using masked fields as there appears to be an disproportional use of memory compared to other search types. There are likely optimizations that could be added for some kinds of fields - such as building a cache of masked values to avoid the allocation tax, but its unclear if this would be helpful in high-cardinality scenarios. The following are a couple of sources for this recommendations GC ImpactThe attempts column increases if GC was called during a test run, as that makes the memory calculation likely to be incorrect. So when attempts is higher, more GCs were performed - higher JVM usage. Baseline performs best, with Term being impacted slightly more often, and finally StringQuery seems to trip GC more frequently. Masking Value TypeThe complexity of the masking operation, masking an LONG vs a STRING did not appear to impact the usage, how ever there is a clear increase of between 15-30% memory usage when a field is masked in either Baseline or Term scenarios. This overhead is semi-expected due to the masking operations happening as the data is accessed. Memory expense of
|
[Triage] Hi @peternied, thank you for the extremely detailed issue. You provided a lot of good details and we can take a couple of paths forward:
The first task is definitely actionable, the other two tasks can be investigated on an RFC or help wanted basis since it may not be too clear how to correct this quickly. |
@peternied I noticed that field masking is being applied on search requests where the request size is 0 and the only aggregations requested are count or cardinality. This may be an area that can be optimized to reduce the memory footprint of field masking. This PR on my fork contains an example: cwperks#24
Sample Documents
SearchRequest of size 0 with cardinality aggregation
For these types of queries, is applying the field masking necessary? It looks like extra overhead for a query that is already not requesting the sensitive info. If the size is >0 or has aggregations other than cardinality and count (such as max or min where raw data can be exposed) then the field masking should be applied. |
Reproduced Steps:baseline3 indices with 5000 documentsHere's the updated table with the last row moved to the top:
3 indices with 50000 documentsHere's the updated table with the last row moved to the top:
aggregateFilterTermQueryScenarios3 indices with 5000 documents
3 indices with 50000 documents
aggregateFilterStringQueryScenarios3 indices with 5000 documents
3 indices with 50000 documentsHere's the provided data reformatted into the specified table format:
With Craig's Changes:baseline3 indices with 5000 documents
3 indices with 50000 documentsHere is the formatted data in a table:
aggregateFilterTermQueryScenarios3 indices with 5000 documents
3 indices with 50000 documents
aggregateFilterStringQueryScenarios3 indices with 5000 documents
3 indices with 50000 documents
|
In the below image, I have compared the results of the original tables with the results of the tables with Craig's change. Red in the left table means the corresponding value is greater than the equivalent position in the right column tables. As shown the change does directly impact the testing scenarios covered by Peter's test framework for a majority of the measured values. |
Focusing just on aggregateFilterStringQueryScenariosAdminIsn't lower memory usage better, admin seems to be doing better in the original vs the updated change? Seems like it was 1GB vs 4GB, that looks like a 400% regression Reader with role MASKING_RANDOM_STRINGLooking at the difference between the changes just focusing in on MASKING_RANDOM_STRING, that seems like only 5% improvement, while that isn't nothing - its pretty close to the standard deviation of 111,796. Original
With Change
|
@scrawfor99 @peternied The change in cwperks#24 is a very narrow change that applies to |
Hi @cwperks, of course. No worries. I am sure the change accomplishes what you intended. I just reran the tests to see if it would fix everything or not. It is no problem to add some more changes and see where that goes. |
I went through the code and found some possible refactoring that may help improve the heap use based on the implementation. I also found some lines I am not sure the purpose of... First, we should avoid the expanding the aggregation builder factories if we can help it. I noticed we iterate over the list at least 3 separate times within We can also avoid the frequent conditionals used when identifying the aggregationBuilder type via We can avoid storing strings in the heap by logging only when the appropriate logger level is set. This would require a larger logger reword but instead of always logging all lines and only printing based on the configured level, we would only log lines which passed the level of configured logging. This matters more than it seems since in some areas we are logging complete requests as JSONs. This will take up a fair amount of space. We can avoid checking each request for an update or resize request by adding a flag to the interface which allows us to fetch whether they are cache-able. Likewise, we can have a request signal if it has We can try to intern strings throughout the code. We can swap to HashSets etc. We can remove streams for loops since they are less memory efficient. Condense the SetFLS which duplicates the serialization process done in the DLS logic. Iws also not clear why we deserialized in this code:
Avoiding that at the |
I will start going through making these changes and rerun the tests. Leaving the interface and logger changes for last. |
Thanks for the deep dive @scrawfor99! How are the performance metrics with the suggested changes implemented? Should this be split into multiple parts?
|
Yes please - ideally each change would be in isolation with clean before and after results that prove we are moving in the right direction. |
I just finished going over the Ultimately some options I found were:
Now going to to break the changes for the DlsFlsValveImpl into the following: Change Version | Change 1 | Avoid aggregation builder expansion I will share the test results of each change type after I make them. The interface modifying changes I am not going to test at the moment since that requires more effort. I will see how much improvement these changes make and then if it is not much, will go forward with the interface changes. |
![]() These are the results of running the test suite after making the above change configurations. As we can see, there is not an obvious pattern for what is the most effective way to improve the heap performance of the different query types. I am not sure the next steps forward with improving this so going to reach out to @jainankitk to see if they have any reccomendations. |
What is the bug?
We've seen reports of heap memory usage spiking with field masking enabled. With how masking is implemented at the leaf level its possible that certain types of queries cause the masked fields to be materialized even when they are not used.
How can one reproduce the bug?
Steps to reproduce the behavior:
./gradlew integrationTest --tests org.opensearch.security.MaskingTests
Baseline behavior
Baseline query with the following format:
Creating 3 Indices with 5000 Documents
Creating 3 Indices with 50000 Documents
Query with Aggregate Filter
Query with the following format:
Creating 3 Indices with 5000 Documents
Creating 3 Indices with 50000 Documents
Term Match Query:
Query with the following format:
Creating 3 Indices with 5000 Documents
Creating 3 Indices with 50000 Documents
The text was updated successfully, but these errors were encountered: