-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group service endpoints into _abandoned
when endpoints have high ca…
#11588
Conversation
...e/src/main/java/org/apache/skywalking/oap/server/core/config/group/EndpointNameGrouping.java
Show resolved
Hide resolved
...e/src/main/java/org/apache/skywalking/oap/server/core/config/group/EndpointNameGrouping.java
Outdated
Show resolved
Hide resolved
private final Map<String/* service */, Map<String/* uri */, Queue<String>/* candidate patterns */>> cachedHttpUris = new ConcurrentHashMap<>(); | ||
private final Map<String/* service */, Set<String>/* unformatted uris */> unformattedHttpUris = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, these are two groups, one for formated, the other for unformated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The old one is previous logic that isn't changed. I just add some comment to the original codes for readability. The new one is only for unformatted endpoints. There is only one new variable, the other is old and never changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, work for me. Could you run a benchmark to see the performance of the latest version? And we could estimate the memory cost for 200 services all reaching maxHttpUrisNumberPerService limitation, and how the performance when another endpoint comes in at this moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question about unformattedHttpUris
, there seems no clean up mechanism for it. What would happen after maxHttpUrisNumberPerService limitation is reached? Currently, I feel all following URIs should be _abandoned
forever. Ideally, we should be able to reset the cache, and allow some new flowing in, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question about
unformattedHttpUris
, there seems no clean up mechanism for it. What would happen after maxHttpUrisNumberPerService limitation is reached? Currently, I feel all following URIs should be_abandoned
forever. Ideally, we should be able to reset the cache, and allow some new flowing in, right?
As I said, unformattedHttpUris
doesn't impact the original logic if the the endpoints can fit a pattern, let me make it more clear:
- If the endpoint can be formatted by any of
openapi spec
,endpoint-name-grouping.yml
, or theAI pipeline
, everything is the same as before, - if the endpoint cannot be formatted, it falls back to the case in this PR, and the endpoint will be counted by service, if the count <
maxHttpUrisNumberPerService
everything is the same as before - if the count >=
maxHttpUrisNumberPerService
, the endpoint will be formatted as_abandoned
The mechanism added in this PR is a fallback ONLY WHEN the endpoint cannot be formatted by existing rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. My asking is more about another edge case. Let's say there is a service having 5 URIs, but only one of 5 is parameterized and never formatted. In the current logic, we would begin to see all endpoint traffic end in _abandoned
including. I am thinking, whether we should allow some raw URI gets passed to have a little metric in the endpoint traffic with data, rather than all five falls into the _abandoned
Note, in this edge case, other 4 URIs have no grouping rules, just pure raw URIs without parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. My asking is more about another edge case. Let's say there is a service having 5 URIs, but only one of 5 is parameterized and never formatted. In the current logic, we would begin to see all endpoint traffic end in
_abandoned
including. I am thinking, whether we should allow some raw URI gets passed to have a little metric in the endpoint traffic with data, rather than all five falls into the_abandoned
Note, in this edge case, other 4 URIs have no grouping rules, just pure raw URIs without parameters.
Looks like you still read it wrong. Let me take your example to explain again.
GIVEN:
maxHttpUrisNumberPerService == 3
- the service has five URIs,
products/1
,product/2
,product/3
,product/4
,/users/1
- there are patterns for
users/{userId}
but not forproduct/*
SO:
products/1
,product/2
,product/3
are unformatted and they keep the raw endpoint namesproducts/1
,product/2
,product/3
- when
product/4
comes to OAP, it cannot be formatted, and becausemaxHttpUrisNumberPerService >= 3
,product/4
becomes_abandoned
- when
/users/1
comes to OAP, because it can be formatted into/users/{userId}
, it WON'T be grouped into_abandoned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your cases are correct and as expected from the codes.
My case is product/sale
, product/update
, info/list
, auth/login
, and product/view/{id}
, and there is no pattern existing by default. So, with URIs of product/view/{id}
flow in, all other URIs will have no metrics as well. Because at those moments we don't have setten a pattern to group them.
So, my asking to enhance is that,
- If
product/view/{id}
traffic is heavy, then other URIs still should face metrics disappearing or inaccurate, as OAP is overloaded. - But if
product/view/{id}
traffic is not that much, we should allow those metrics, or some new URI, such asprofile/view
to flow in.
Is this explanation clearer? I hope this limitation could be periodically reset, rather than permanently to protect a service only until OAP reboot. That may face more complain in the end user environments.
The latest version should be good. Let me know if you will add more. @wankai123 Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…rdinality
CHANGES
log.