Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate MAL CounterWindow cache by metric name. #11526

Merged

Conversation

weixiang1862
Copy link
Member

@weixiang1862 weixiang1862 commented Nov 9, 2023

Fix MAL CounterWindow Isolation.

  • Add a unit test to verify that the fix works.
  • Explain briefly why the bug exists and how to fix it.

@weixiang1862
Copy link
Member Author

Test not ready, I want confirm code first.

@wu-sheng wu-sheng added backend OAP backend related. metrics Metrics and meter labels Nov 9, 2023
@wu-sheng wu-sheng added this to the 9.7.0 milestone Nov 9, 2023
@wu-sheng wu-sheng added backend OAP backend related. and removed backend OAP backend related. labels Nov 9, 2023
@wu-sheng
Copy link
Member

wu-sheng commented Nov 9, 2023

Please make sure the codes work first. Otherwise, we don't know it is missing change or on purpose.

@weixiang1862
Copy link
Member Author

Please make sure the codes work first. Otherwise, we don't know it is missing change or on purpose.

Sorry, I will complete test first.

@weixiang1862
Copy link
Member Author

Unit Test is ready.

return Optional.ofNullable(CACHE.get());
}

private final static ThreadLocal<ExpressionRunningContext> CACHE = new ThreadLocal<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you are forwarding the metric cache into this runtime CACHE for MAL runtime? Is this your basic principle?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ThreadLocal has sampleFamilies before, I newly add metric name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics name is from the YAML, which is constant instead of a variable value. It's better to add it to the "ExpressionDelegate" as a field. Then inject the metric name into the samplefamilily in "propertyMissing"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanahmily Ok, thanks. I will try this way.

@wu-sheng
Copy link
Member

wu-sheng commented Nov 9, 2023

Let's wait for others to review this. I am not familiar with this part of the codes.

@weixiang1862
Copy link
Member Author

Let's wait for others to review this. I am not familiar with this part of the codes.

Ok.

@wu-sheng
Copy link
Member

wu-sheng commented Nov 9, 2023

One question, if two expressions share the same metric source(counter), but use different rate, how does the cache keep the longer time windows for the further calculation?

@weixiang1862
Copy link
Member Author

One question, if two expressions share the same metric source(counter), but use different rate, how does the cache keep the longer time windows for the further calculation?

Two expression will create two cache in window, one for shorter and one for longer.

@weixiang1862
Copy link
Member Author

One question, if two expressions share the same metric source(counter), but use different rate, how does the cache keep the longer time windows for the further calculation?

Two expression will create two cache in window, one for shorter and one for longer.

Seems use metric name as id name will cause memory waste, expressions can not share cache in some cases?

@wu-sheng
Copy link
Member

One question, if two expressions share the same metric source(counter), but use different rate, how does the cache keep the longer time windows for the further calculation?

Two expression will create two cache in window, one for shorter and one for longer.

Seems use metric name as id name will cause memory waste, expressions can not share cache in some cases?

Yes, but it seems easier at the implementation level. Do you have idea to optimize it by using the longer range?

@weixiang1862
Copy link
Member Author

Seems use metric name as id name will cause memory waste, expressions can not share cache in some cases?

Yes, but it seems easier at the implementation level. Do you have idea to optimize it by using the longer range?

I have no idea about optimize yet.

@wu-sheng
Copy link
Member

OK, let's keep the status for now. I think this not a big issue of a little more extra memory cost.

@weixiang1862
Copy link
Member Author

weixiang1862 commented Nov 10, 2023

Seems use metric name as id name will cause memory waste, expressions can not share cache in some cases?

Yes, but it seems easier at the implementation level. Do you have idea to optimize it by using the longer range?

How about use expression prefix as Id name and maintain a field in window to record lower bound calculate by long rate,
then metrics can share same cache if they has same expression prefix and window just need keep the longer one cache?

For example:

- name: http_requests_pt1m
  exp: nginx_http_requests_total.sum(['service']).rate('PT1M')
- name: http_requests_pt1h
  exp: nginx_http_requests_total.sum(['service']).rate('PT1H')
- name: http_4xx_requests
  exp: nginx_http_requests_total.tagMatch("status", "400|401|403|404|405").sum(['service']).rate('PT1M')

the 3 metrics above will create window by 2 ids, and the first one window length is 1h, the second one window length is 1M:

ID(name=nginx_http_requests_total, labels={service=nginx::portal})
ID(name=nginx_http_requests_total.tagMatch("status", "400|401|403|404|405"), labels={service=nginx::portal})

@wu-sheng
Copy link
Member

Let's not bring this complexity into the codes this time. Let's keep one expression per cache if it is a counter.

@weixiang1862
Copy link
Member Author

Let's not bring this complexity into the codes this time. Let's keep one expression per cache if it is a counter.

Ok, I will follow this.

@weixiang1862
Copy link
Member Author

The metrics name is from the YAML, which is constant instead of a variable value. It's better to add it to the "ExpressionDelegate" as a field. Then inject the metric name into the samplefamilily in "propertyMissing"

I have add a field named metricName in ExpressionDelegate and inject it to samplefamily runningcontext in "propertyMissing"

@wu-sheng
Copy link
Member

@kezhenxu94 @wankai123 Could any of you run another check?

@wu-sheng wu-sheng merged commit 019c6fe into apache:master Nov 10, 2023
166 checks passed
@weixiang1862
Copy link
Member Author

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. metrics Metrics and meter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants