Promote Improved multi-numa alignment in Topology Manager to beta #4079

PiotrProkop · 2023-06-12T07:28:33Z

One-line PR description: Promote Improved multi-numa alignment in Topology Manager to beta

Issue link: Improved multi-numa alignment in Topology Manager #3545

Other comments:

ffromani · 2023-06-12T07:29:28Z

/cc

ffromani

Minor: you may want to check the Release Signoff Checklist to see if you can/should check more items.

It seems to me there's a TBD left in the Production Readiness Questionnaire, could you please check?

keps/prod-readiness/sig-node/3545.yaml

keps/sig-node/3545-improved-multi-numa-alignment/README.md

PiotrProkop · 2023-06-12T07:42:56Z

Minor: you may want to check the Release Signoff Checklist to see if you can/should check more items.

It seems to me there's a TBD left in the Production Readiness Questionnaire, could you please check?

Thanks for the review. Updated both sections.

keps/sig-node/3545-improved-multi-numa-alignment/README.md

jeremyrickard

👋 Taking a look at this as a PRR shadow and I had a general question about troubleshooting and operators understanding that the feature is working correctly.

There is a fair bit of detail around how someone can know that it's working by comparing specific workloads but there isn't much detail about cluster operators really understanding this (including with upgrade/downgrade. Could you provide a little more thought/insight from that perspective? Specifically on a rollout, if the kubelet may fail to tart or the kubelet may crash, how do we determine those scenarios are due to this enhancement?

jeremyrickard · 2023-06-14T04:04:13Z

keps/sig-node/3545-improved-multi-numa-alignment/README.md

 N/A.

 ###### What are other known failure modes?

-TBD.
+No known failure modes.


On Line 372 under the rollout / rollback section, this is mentioned:

Kubelet may fail to start. The kubelet may crash.

Is that statement valid, and if so could we identify what those failure modes might be? How does someone recover from that failure mode?

Thanks for pointing that out. I think there are only 2 scenarios where kubelet can crash due to this feature:

bad policy option name, we are already logging appropriate logs for this, to recover one just have to provide correct policy name or disable TopologyManagerPolicyOptions

cadvisor is not exposing distances for NUMA domains, we are also logging it, to recover one has to disable TopologyManagerPolicyOptions

I'll update KEP with those steps.

Section updated.

Signed-off-by: pprokop <pprokop@nvidia.com>

dchen1107 · 2023-06-15T17:19:00Z

/lgtm
/approve

johnbelamaric · 2023-06-15T19:01:33Z

/approve

k8s-ci-robot · 2023-06-15T19:01:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, johnbelamaric, PiotrProkop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [johnbelamaric]
~~keps/sig-node/OWNERS~~ [dchen1107,johnbelamaric]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 12, 2023

k8s-ci-robot requested review from dchen1107 and derekwaynecarr June 12, 2023 07:28

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 12, 2023

k8s-ci-robot requested a review from ffromani June 12, 2023 07:29

ffromani reviewed Jun 12, 2023

View reviewed changes

keps/prod-readiness/sig-node/3545.yaml Show resolved Hide resolved

keps/sig-node/3545-improved-multi-numa-alignment/README.md Show resolved Hide resolved

PiotrProkop force-pushed the multi-numa-topology branch from fc22159 to 1e0cd0d Compare June 12, 2023 07:40

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 12, 2023

dchen1107 added this to the v1.28 milestone Jun 13, 2023

dchen1107 reviewed Jun 13, 2023

View reviewed changes

keps/sig-node/3545-improved-multi-numa-alignment/README.md Show resolved Hide resolved

jeremyrickard reviewed Jun 14, 2023

View reviewed changes

Promote Improved multi-numa alignment in Topology Manager to beta

af6d25f

Signed-off-by: pprokop <pprokop@nvidia.com>

PiotrProkop force-pushed the multi-numa-topology branch from 1e0cd0d to af6d25f Compare June 14, 2023 09:06

Atharva-Shinde mentioned this pull request Jun 15, 2023

Improved multi-numa alignment in Topology Manager #3545

Open

12 tasks

k8s-ci-robot assigned dchen1107 Jun 15, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 15, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2023

k8s-ci-robot merged commit 954f2cd into kubernetes:master Jun 15, 2023

PiotrProkop mentioned this pull request Jun 22, 2023

topologymanager: Promote support for improved multi-numa alignment in Topology Manager to beta kubernetes/kubernetes#118816

Merged

klueska mentioned this pull request Feb 9, 2024

KEP-4176: A Static Policy Option to spread hyperthreads across physical CPUs #4177

Merged

PiotrProkop mentioned this pull request Oct 16, 2024

topologymanager: Promote support for improved multi-numa alignment in Topology Manager to GA kubernetes/kubernetes#128124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promote Improved multi-numa alignment in Topology Manager to beta #4079

Promote Improved multi-numa alignment in Topology Manager to beta #4079

PiotrProkop commented Jun 12, 2023

ffromani commented Jun 12, 2023

ffromani left a comment

PiotrProkop commented Jun 12, 2023

jeremyrickard left a comment

jeremyrickard Jun 14, 2023

PiotrProkop Jun 14, 2023

PiotrProkop Jun 14, 2023

dchen1107 commented Jun 15, 2023

johnbelamaric commented Jun 15, 2023

k8s-ci-robot commented Jun 15, 2023

Promote Improved multi-numa alignment in Topology Manager to beta #4079

Promote Improved multi-numa alignment in Topology Manager to beta #4079

Conversation

PiotrProkop commented Jun 12, 2023

ffromani commented Jun 12, 2023

ffromani left a comment

Choose a reason for hiding this comment

PiotrProkop commented Jun 12, 2023

jeremyrickard left a comment

Choose a reason for hiding this comment

jeremyrickard Jun 14, 2023

Choose a reason for hiding this comment

PiotrProkop Jun 14, 2023

Choose a reason for hiding this comment

PiotrProkop Jun 14, 2023

Choose a reason for hiding this comment

dchen1107 commented Jun 15, 2023

johnbelamaric commented Jun 15, 2023

k8s-ci-robot commented Jun 15, 2023