-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating malachite metric for supporting model inference #368
Updating malachite metric for supporting model inference #368
Conversation
caf470c
to
18c7b8c
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #368 +/- ##
==========================================
- Coverage 53.59% 53.57% -0.03%
==========================================
Files 444 445 +1
Lines 49021 49349 +328
==========================================
+ Hits 26273 26437 +164
- Misses 19791 19916 +125
- Partials 2957 2996 +39
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
18c7b8c
to
c7f47d7
Compare
@luomingmeng this pr changes too much for fundamental metric-info in meta-server, should it be tested carefully before it can be merged? |
3597266
to
620cfa3
Compare
83ccfef
to
30310c2
Compare
7e8d55b
to
642f53f
Compare
containerInfo.PodNamespace, containerInfo.PodName, containerInfo.ContainerName) | ||
klog.Warningf("getContainerFeatureValue for pod: %s/%s, container: %s failed, err: %v", | ||
containerInfo.PodNamespace, containerInfo.PodName, containerInfo.ContainerName, err) | ||
goto IgnoreThisContainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is someone against using "goto"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already got some goto
, don't mind
642f53f
to
1b8ffff
Compare
c.InferenceServiceSocketAbsPath = o.InferenceServiceSocketAbsPath | ||
c.NodeFeatureNames = o.NodeFeatureNames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewBorweinConfiguration has some todo comments, pls delete them
@@ -31,4 +31,5 @@ type BorweinParameter struct { | |||
|
|||
// the first key is the pod UID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add prefix for this comments
if respContainersCnt != len(requestContainers) { | ||
return nil, fmt.Errorf("count of resp containers: %d and request containers: %d are not same", | ||
klog.Warningf("count of resp containers: %d and request containers: %d are not same", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If respContainersCnt != len(requestContainers), the possibility of abnormal state may increase since denominator may decrease; so we still return error here.
And if some containers lack essential features, we also inference it forcibly.
} | ||
|
||
// AddFlags adds flags to the specified FlagSet. | ||
func (o *BorweinOptions) AddFlags(fs *pflag.FlagSet) { | ||
fs.StringVar(&o.InferenceServiceSocketAbsPath, "borwein-inference-svc-socket-path", o.InferenceServiceSocketAbsPath, | ||
"socket path which borwein inference server listens at") | ||
fs.StringSliceVar(&o.NodeFeatureNames, "borwein-node-feature-names", o.NodeFeatureNames, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after discussion, we use config file to configure NodeFeatureNames and ContainerFeatureNames.
if err != nil { | ||
return nil, fmt.Errorf("getContainerFeatureValue for pod: %s/%s, container: %s failed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after discussion,
- getContainerFeatureValue should be refined to return nil err and zero value when input feature isn't a key feature
- if err != nil, we just return err here to avoid meaningless request to inference sever
85c4ec2
to
bc38275
Compare
bc38275
to
0fb5476
Compare
0fb5476
to
66dc554
Compare
There are several things has been done in this pr:
NewBorweinConfiguration
.cpu.nr.throttled.container
,cpu.nr.period.container
,cpu.throttled.time.container
etc... There metrics are meaningless as the raw data. And they are put into model training in the form of rate.