Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orca: add application utilization and range checking #6357

Merged
merged 2 commits into from
Jun 7, 2023

Conversation

dfawley
Copy link
Member

@dfawley dfawley commented Jun 7, 2023

Implement the new field described here, and add missing range checks and tests.

RELEASE NOTES: none

@dfawley dfawley added the Type: Feature New features or improvements in behavior label Jun 7, 2023
@dfawley dfawley added this to the 1.56 Release milestone Jun 7, 2023
@dfawley dfawley requested a review from zasweq June 7, 2023 17:14
Comment on lines 221 to 222
// DeleteMemoryUtilization deletes the relevant server metric to prevent it
// from being sent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit on these docstrings and also the interface docstrings. Most of them define "the relevant server metric", but I feel like it should just be as specific as "for the memory utilization metric" as on line 211 for all docstrings because it maps 1:1 with the name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it didn't feel like any documentation at all should be necessary, and was redundant...Changed, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree it's redundant but Go lint

}

// SetMemoryUtilization records a measurement for the memory utilization metric.
func (s *serverMetricsRecorder) SetMemoryUtilization(val float64) {
if val < 0 || val > 1 {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and all other places with a range validation on this serverMetricsRecorder: do we want to plumb a logger in here and log in a non spammy way that we're ignoring an out of range metric, similar to how xDS Client gives info if nack, or say CDS encounters an error in it's serial handling of Client Conn State that returns no error, it logs the reason why the Client Conn State wasn't processed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +70 to +72
mergeMap(sm.Utilization, o.Utilization)
mergeMap(sm.RequestCost, o.RequestCost)
mergeMap(sm.NamedMetrics, o.NamedMetrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if one of the values of this map is also -1 like below? Do we still want to overwrite the value for the key in the map? Should we ignore as below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For maps, -1 has no special meaning. Deleting from the map is how you delete the value. -1 is unique to the fields that are scalar values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok.

@@ -63,21 +67,24 @@ func (sm *ServerMetrics) toLoadReportProto() *v3orcapb.OrcaLoadReport {

// merge merges o into sm, overwriting any values present in both.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct? You overwrite any values for keys in map unconditionally, but only overwrite sm's memory if it's set. I guess if an explicit -1 means unset 1:1 this is right. I just don't know all the invariants of this system/how this data is plumbed around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You overwrite any values for keys in map unconditionally, but only overwrite sm's memory if it's set.

No? I range over o.<map field> and add all those key/value pairs into sm.<map field>. It's a merge, not a replacement, so the idea is to keep anything in sm that's not in o.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad. I wrote this assuming there was a distinction on map values between -1 and !-1. I guess values refers to both map values and top level struct values.

@@ -444,8 +444,8 @@ func (s) TestProducerMultipleListeners(t *testing.T) {
// Define a load report to send and expect the client to see.
loadReportWant := &v3orcapb.OrcaLoadReport{
CpuUtilization: 10,
MemUtilization: 100,
Utilization: map[string]float64{"bob": 555},
MemUtilization: 0.1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you had to move these floats to be within 0 and 1, can you please add a test for the case it isn't within range and also set to -1 wrt overwriting behavior in merge(). This will also show what happens at this producer layer if these cases happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have range checks in the new tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but what happens at this layer? Just ignored? I want it tested at this layer too, or are you saying specific behaviors are fine to encapsulate to it's granular components units.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens at this layer is, it directly uses the other layer. So tests there for corner-casey details like that are sufficient. These tests are about testing the ORCA producer, not the semantics of the server metrics recorder.

smr.SetEPS(-0.6)
smr.SetNamedUtilization("x", -2)

// Memory and named utilizations are max of 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this should be ignored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified.

if d := cmp.Diff(got, want); d != "" {
t.Fatalf("unexpected server metrics: -got +want: %v", d)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about my comment on merge() wrt overwriting at each layer (maps and top level metrics), can you also add tests for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit tests for merge.. It is already covered implicitly by the call metrics tests that use it, though.

@@ -60,16 +60,18 @@ func (t *testServiceImpl) UnaryCall(context.Context, *testpb.SimpleRequest) (*te
t.requests++
t.mu.Unlock()

t.smr.SetNamedUtilization(requestsMetricKey, float64(t.requests))
t.smr.SetNamedUtilization(requestsMetricKey, float64(t.requests)*0.01)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like my comment above, what happens in this layer if these metrics aren't within range? Can you please add a test to show what behavior is going to be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered in the unit tests. It's just calls into the same object so if that works right then it must work right here, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to discussions we've had and I've had with Eric about specific functionality scoped to the granular component for testing, and higher layers just do sanity to avoid excessive higher level tests. I'm actually running into this problem with my od change, since ParseConfig gets invoked at every layer of unit test, whether it's cds, cluster resolver, and obv. on the od ParseConfig test itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Which is why I was saying to test the proto->JSON conversion as a unit test so you can get all the corner cases in there and not worry about it when testing the higher level logic.

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -193,6 +195,9 @@ func copyMap(m map[string]float64) map[string]float64 {
// SetCPUUtilization records a measurement for the CPU utilization metric.
func (s *serverMetricsRecorder) SetCPUUtilization(val float64) {
if val < 0 {
if logger.V(2) {
logger.Infof("Ignoring CPU Utilization value out of range: %v", val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log valid range here and elsewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehh, they'll figure it out :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"[0, infy)" this is very few characters to add to logs lol


sm2 := &ServerMetrics{
CPUUtilization: -1,
AppUtilization: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate that MemUtilization's zero value = set valid metric which overwrites, but can't do anything about that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the only option would be to use pointers and I don't really want to do that either. We basically control everything around this struct and the values of the fields, though, so it's pretty safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah as long as we control it and we make sure it's good. Pointers would add a lot of complexity.

Copy link
Contributor

@zasweq zasweq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zasweq zasweq assigned dfawley and unassigned zasweq Jun 7, 2023
@dfawley dfawley merged commit 7aeea8f into grpc:master Jun 7, 2023
1 check passed
@dfawley dfawley deleted the app_util branch June 7, 2023 20:31
dfawley added a commit to dfawley/grpc-go that referenced this pull request Jun 7, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants