Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group service endpoints into _abandoned when endpoints have high ca… #11588

Merged
merged 3 commits into from
Nov 27, 2023

Conversation

kezhenxu94
Copy link
Member

…rdinality

  • If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.
  • Update the CHANGES log.

Comment on lines 69 to 70
private final Map<String/* service */, Map<String/* uri */, Queue<String>/* candidate patterns */>> cachedHttpUris = new ConcurrentHashMap<>();
private final Map<String/* service */, Set<String>/* unformatted uris */> unformattedHttpUris = new ConcurrentHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, these are two groups, one for formated, the other for unformated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The old one is previous logic that isn't changed. I just add some comment to the original codes for readability. The new one is only for unformatted endpoints. There is only one new variable, the other is old and never changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, work for me. Could you run a benchmark to see the performance of the latest version? And we could estimate the memory cost for 200 services all reaching maxHttpUrisNumberPerService limitation, and how the performance when another endpoint comes in at this moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question about unformattedHttpUris, there seems no clean up mechanism for it. What would happen after maxHttpUrisNumberPerService limitation is reached? Currently, I feel all following URIs should be _abandoned forever. Ideally, we should be able to reset the cache, and allow some new flowing in, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question about unformattedHttpUris, there seems no clean up mechanism for it. What would happen after maxHttpUrisNumberPerService limitation is reached? Currently, I feel all following URIs should be _abandoned forever. Ideally, we should be able to reset the cache, and allow some new flowing in, right?

As I said, unformattedHttpUris doesn't impact the original logic if the the endpoints can fit a pattern, let me make it more clear:

  • If the endpoint can be formatted by any of openapi spec, endpoint-name-grouping.yml, or the AI pipeline, everything is the same as before,
  • if the endpoint cannot be formatted, it falls back to the case in this PR, and the endpoint will be counted by service, if the count < maxHttpUrisNumberPerService everything is the same as before
  • if the count >= maxHttpUrisNumberPerService, the endpoint will be formatted as _abandoned

The mechanism added in this PR is a fallback ONLY WHEN the endpoint cannot be formatted by existing rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. My asking is more about another edge case. Let's say there is a service having 5 URIs, but only one of 5 is parameterized and never formatted. In the current logic, we would begin to see all endpoint traffic end in _abandoned including. I am thinking, whether we should allow some raw URI gets passed to have a little metric in the endpoint traffic with data, rather than all five falls into the _abandoned

Note, in this edge case, other 4 URIs have no grouping rules, just pure raw URIs without parameters.

Copy link
Member Author

@kezhenxu94 kezhenxu94 Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. My asking is more about another edge case. Let's say there is a service having 5 URIs, but only one of 5 is parameterized and never formatted. In the current logic, we would begin to see all endpoint traffic end in _abandoned including. I am thinking, whether we should allow some raw URI gets passed to have a little metric in the endpoint traffic with data, rather than all five falls into the _abandoned

Note, in this edge case, other 4 URIs have no grouping rules, just pure raw URIs without parameters.

Looks like you still read it wrong. Let me take your example to explain again.

GIVEN:

  • maxHttpUrisNumberPerService == 3
  • the service has five URIs, products/1, product/2, product/3, product/4, /users/1
  • there are patterns for users/{userId} but not for product/*

SO:

  • products/1, product/2, product/3 are unformatted and they keep the raw endpoint names products/1, product/2, product/3
  • when product/4 comes to OAP, it cannot be formatted, and because maxHttpUrisNumberPerService >= 3, product/4 becomes _abandoned
  • when /users/1 comes to OAP, because it can be formatted into /users/{userId}, it WON'T be grouped into _abandoned

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your cases are correct and as expected from the codes.
My case is product/sale, product/update, info/list, auth/login, and product/view/{id}, and there is no pattern existing by default. So, with URIs of product/view/{id} flow in, all other URIs will have no metrics as well. Because at those moments we don't have setten a pattern to group them.

So, my asking to enhance is that,

  • If product/view/{id} traffic is heavy, then other URIs still should face metrics disappearing or inaccurate, as OAP is overloaded.
  • But if product/view/{id} traffic is not that much, we should allow those metrics, or some new URI, such as profile/view to flow in.

Is this explanation clearer? I hope this limitation could be periodically reset, rather than permanently to protect a service only until OAP reboot. That may face more complain in the end user environments.

@wu-sheng
Copy link
Member

The latest version should be good. Let me know if you will add more.

@wankai123 Please take a look.

Copy link
Member

@wankai123 wankai123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kezhenxu94 kezhenxu94 marked this pull request as ready for review November 26, 2023 06:47
@wu-sheng wu-sheng added this to the 9.7.0 milestone Nov 27, 2023
@wu-sheng wu-sheng added backend OAP backend related. feature New feature labels Nov 27, 2023
@wu-sheng wu-sheng merged commit 173c290 into apache:master Nov 27, 2023
167 checks passed
@kezhenxu94 kezhenxu94 deleted the endpointdefense branch November 27, 2023 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants