Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bound the size of cache in deprecation logger #16724

Merged

Conversation

andrross
Copy link
Member

@andrross andrross commented Nov 26, 2024

The current implementation of the map used to de-duplicate deprecation log
messages can grow without bound. This adds a simple fixed limit to the data
structure tracking existing loggers. Once the limit is breached new loggers will
not be deduplicated. I also added a check to skip the tracking if the
deprecation logger is disabled.

Related Issues

Resolves #16702

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not using a concurrent hash map a premature optimization? After all even at high volume of requests you’re not going to hit a lock often (concurrent maps are very fast and use atomic swaps as far as I remember).

Copy link
Contributor

❌ Gradle check result for 1913d2b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@andrross
Copy link
Member Author

Is not using a concurrent hash map a premature optimization?

@dblock I don't think using the OpenSearch Cache is an optimization. It's definitely slower! The code is simple though, which is why I chose it.

What alternative do you have in mind with a concurrent hash map? The choices I see are: (1) stop deduplicating when the map hits a certain capacity, or (2) track insertion order to know which ones to evict. I could maybe be convinced that option 1 is good enough, but option 2 is what I implemented using the existing Cache data structure.

@reta
Copy link
Collaborator

reta commented Nov 27, 2024

The choices I see are: (1) stop deduplicating when the map hits a certain capacity,

I would agree with @dblock here, the Cache does much more than (expiration tracking, etc) we need, plus we actually introduce more boxing operations using hash code (int -> Integer). May be ConcurrentHashMap is not a bad idea:

    public boolean isAlreadyLogged() {
        return keys.size() <  MAX_DEDUPE_CACHE_ENTRIES && keys.putIfAbsent(keyWithXOpaqueId, Boolean.TRUE) == null;
    }

@andrross
Copy link
Member Author

andrross commented Nov 27, 2024

stop deduplicating when the map hits a certain capacity

@reta Agreed that Cache does a lot more than we need (mostly around stats...time-based expiration is a no-op if you don't configure it), but it does give us the LRU semantics which seem beneficial here. The pattern I worry about if we have a simple capacity limit is if you have clients that periodically rotate their opaque ID, then at some point if you leave an OpenSearch server running long enough then it will just stop de-duplicating deprecation warnings.

Also the hashcode thing was to make the memory utilization deterministic as every entry is the same size (a boxed integer). I'm happy to change this to a limited-capacity concurrent set of strings if you think that is good enough. That solution is definitely simple and fast.

@reta
Copy link
Collaborator

reta commented Nov 28, 2024

@reta Agreed that Cache does a lot more than we need (mostly around stats...time-based expiration is a no-op if you don't configure it), but it does give us the LRU semantics which seem beneficial here.

Thanks @andrross , I would have expected us to use LRU semantics but Cache is configured without any expiration policies set (please correct me if I am wrong), so LRU is not used. I also very much aligned with @dblock on the simplicity part, this flow is far from critical and it looks to me that logging the deprecation warning 65k times (at worst, which I believe would very likely be the same deprecated feature triggered over and over again) is sufficient to be noticeable.

@andrross
Copy link
Member Author

I would have expected us to use LRU semantics but Cache is configured without any expiration policies set (please correct me if I am wrong), so LRU is not used

@reta Without expiration policies the Cache will track insertion order, so you're correct it is not LRU, but the basic point is that the Cache will allow new entries to be deduplicated after the limit is reached. However, I can totally buy the argument that this is not a critical code path and it is better to be simple and fast than adding overhead to handle edge cases.

@andrross andrross force-pushed the deprecation-logger-cache branch from 1913d2b to 68e9bfc Compare November 29, 2024 18:41
@andrross
Copy link
Member Author

andrross commented Nov 29, 2024

@reta I pushed a simple version that simply enforces a size limit in the map. Let me know what you think.

I also removed the hashcode() change, which now means we don't really have control over the amount of memory used as the size of the keys are controlled by the caller. But in practice I think this map would be at under 10MB unless unreasonably large keys are used.

@reta
Copy link
Collaborator

reta commented Nov 29, 2024

@dblock I think you would like it ;-)

Copy link
Contributor

✅ Gradle check result for 68e9bfc: SUCCESS

Copy link

codecov bot commented Nov 29, 2024

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 72.07%. Comparing base (d2a1477) to head (6d353b6).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...g/opensearch/common/logging/DeprecatedMessage.java 87.50% 0 Missing and 1 partial ⚠️
...g/opensearch/common/logging/DeprecationLogger.java 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16724      +/-   ##
============================================
- Coverage     72.16%   72.07%   -0.09%     
+ Complexity    65257    65214      -43     
============================================
  Files          5318     5318              
  Lines        303988   303993       +5     
  Branches      43987    43990       +3     
============================================
- Hits         219358   219113     -245     
- Misses        66674    66934     +260     
+ Partials      17956    17946      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andrross andrross force-pushed the deprecation-logger-cache branch from 68e9bfc to 7e3567f Compare December 3, 2024 04:15
The current implementation of the map used to de-duplicate deprecation log
messages can grow without bound. This adds a simple fixed limit to the data
structure tracking existing loggers. Once the limit is breached new loggers will
not be deduplicated. I also added a check to skip the tracking if the
deprecation logger is disabled.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@andrross andrross force-pushed the deprecation-logger-cache branch from 7e3567f to 6d353b6 Compare December 3, 2024 04:15
Copy link
Contributor

github-actions bot commented Dec 3, 2024

❌ Gradle check result for 6d353b6: TIMEOUT

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Dec 3, 2024

✅ Gradle check result for 6d353b6: SUCCESS

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this, @reta? @shwetathareja?

@andrross andrross merged commit b1bf72f into opensearch-project:main Dec 3, 2024
39 checks passed
@andrross andrross deleted the deprecation-logger-cache branch December 3, 2024 19:59
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-16724-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 b1bf72f26e2681e4dbe726bc9605209675f6ab38
# Push it to GitHub
git push --set-upstream origin backport/backport-16724-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-16724-to-2.x.

andrross added a commit to andrross/OpenSearch that referenced this pull request Dec 3, 2024
The current implementation of the map used to de-duplicate deprecation log
messages can grow without bound. This adds a simple fixed limit to the data
structure tracking existing loggers. Once the limit is breached new loggers will
no longer log deprecation warnings. I also added a check to skip the tracking
if the deprecation logger is disabled.

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit b1bf72f)
andrross added a commit to andrross/OpenSearch that referenced this pull request Dec 4, 2024
The current implementation of the map used to de-duplicate deprecation log
messages can grow without bound. This adds a simple fixed limit to the data
structure tracking existing loggers. Once the limit is breached new loggers will
no longer log deprecation warnings. I also added a check to skip the tracking
if the deprecation logger is disabled.

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit b1bf72f)
Signed-off-by: Andrew Ross <andrross@amazon.com>
reta pushed a commit that referenced this pull request Dec 4, 2024
The current implementation of the map used to de-duplicate deprecation log
messages can grow without bound. This adds a simple fixed limit to the data
structure tracking existing loggers. Once the limit is breached new loggers will
no longer log deprecation warnings. I also added a check to skip the tracking
if the deprecation logger is disabled.


(cherry picked from commit b1bf72f)

Signed-off-by: Andrew Ross <andrross@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed bug Something isn't working Other
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] DeprecationLogger - Unbounded memory usage
6 participants