-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orca: add application utilization and range checking #6357
Conversation
// DeleteMemoryUtilization deletes the relevant server metric to prevent it | ||
// from being sent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit on these docstrings and also the interface docstrings. Most of them define "the relevant server metric", but I feel like it should just be as specific as "for the memory utilization metric" as on line 211 for all docstrings because it maps 1:1 with the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it didn't feel like any documentation at all should be necessary, and was redundant...Changed, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree it's redundant but Go lint
} | ||
|
||
// SetMemoryUtilization records a measurement for the memory utilization metric. | ||
func (s *serverMetricsRecorder) SetMemoryUtilization(val float64) { | ||
if val < 0 || val > 1 { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and all other places with a range validation on this serverMetricsRecorder: do we want to plumb a logger in here and log in a non spammy way that we're ignoring an out of range metric, similar to how xDS Client gives info if nack, or say CDS encounters an error in it's serial handling of Client Conn State that returns no error, it logs the reason why the Client Conn State wasn't processed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
mergeMap(sm.Utilization, o.Utilization) | ||
mergeMap(sm.RequestCost, o.RequestCost) | ||
mergeMap(sm.NamedMetrics, o.NamedMetrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if one of the values of this map is also -1 like below? Do we still want to overwrite the value for the key in the map? Should we ignore as below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For maps, -1
has no special meaning. Deleting from the map is how you delete the value. -1 is unique to the fields that are scalar values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok.
@@ -63,21 +67,24 @@ func (sm *ServerMetrics) toLoadReportProto() *v3orcapb.OrcaLoadReport { | |||
|
|||
// merge merges o into sm, overwriting any values present in both. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct? You overwrite any values for keys in map unconditionally, but only overwrite sm's memory if it's set. I guess if an explicit -1 means unset 1:1 this is right. I just don't know all the invariants of this system/how this data is plumbed around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You overwrite any values for keys in map unconditionally, but only overwrite sm's memory if it's set.
No? I range over o.<map field>
and add all those key/value pairs into sm.<map field>
. It's a merge, not a replacement, so the idea is to keep anything in sm
that's not in o
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my bad. I wrote this assuming there was a distinction on map values between -1 and !-1. I guess values refers to both map values and top level struct values.
@@ -444,8 +444,8 @@ func (s) TestProducerMultipleListeners(t *testing.T) { | |||
// Define a load report to send and expect the client to see. | |||
loadReportWant := &v3orcapb.OrcaLoadReport{ | |||
CpuUtilization: 10, | |||
MemUtilization: 100, | |||
Utilization: map[string]float64{"bob": 555}, | |||
MemUtilization: 0.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you had to move these floats to be within 0 and 1, can you please add a test for the case it isn't within range and also set to -1 wrt overwriting behavior in merge(). This will also show what happens at this producer layer if these cases happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have range checks in the new tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but what happens at this layer? Just ignored? I want it tested at this layer too, or are you saying specific behaviors are fine to encapsulate to it's granular components units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens at this layer is, it directly uses the other layer. So tests there for corner-casey details like that are sufficient. These tests are about testing the ORCA producer, not the semantics of the server metrics recorder.
orca/server_metrics_test.go
Outdated
smr.SetEPS(-0.6) | ||
smr.SetNamedUtilization("x", -2) | ||
|
||
// Memory and named utilizations are max of 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this should be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified.
if d := cmp.Diff(got, want); d != "" { | ||
t.Fatalf("unexpected server metrics: -got +want: %v", d) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about my comment on merge() wrt overwriting at each layer (maps and top level metrics), can you also add tests for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added unit tests for merge.. It is already covered implicitly by the call metrics tests that use it, though.
@@ -60,16 +60,18 @@ func (t *testServiceImpl) UnaryCall(context.Context, *testpb.SimpleRequest) (*te | |||
t.requests++ | |||
t.mu.Unlock() | |||
|
|||
t.smr.SetNamedUtilization(requestsMetricKey, float64(t.requests)) | |||
t.smr.SetNamedUtilization(requestsMetricKey, float64(t.requests)*0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like my comment above, what happens in this layer if these metrics aren't within range? Can you please add a test to show what behavior is going to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered in the unit tests. It's just calls into the same object so if that works right then it must work right here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relates to discussions we've had and I've had with Eric about specific functionality scoped to the granular component for testing, and higher layers just do sanity to avoid excessive higher level tests. I'm actually running into this problem with my od change, since ParseConfig gets invoked at every layer of unit test, whether it's cds, cluster resolver, and obv. on the od ParseConfig test itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...Which is why I was saying to test the proto->JSON conversion as a unit test so you can get all the corner cases in there and not worry about it when testing the higher level logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -193,6 +195,9 @@ func copyMap(m map[string]float64) map[string]float64 { | |||
// SetCPUUtilization records a measurement for the CPU utilization metric. | |||
func (s *serverMetricsRecorder) SetCPUUtilization(val float64) { | |||
if val < 0 { | |||
if logger.V(2) { | |||
logger.Infof("Ignoring CPU Utilization value out of range: %v", val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log valid range here and elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehh, they'll figure it out :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"[0, infy)" this is very few characters to add to logs lol
|
||
sm2 := &ServerMetrics{ | ||
CPUUtilization: -1, | ||
AppUtilization: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunate that MemUtilization's zero value = set valid metric which overwrites, but can't do anything about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the only option would be to use pointers and I don't really want to do that either. We basically control everything around this struct and the values of the fields, though, so it's pretty safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah as long as we control it and we make sure it's good. Pointers would add a lot of complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Implement the new field described here, and add missing range checks and tests.
RELEASE NOTES: none