Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for Scheduler performance tuning #10048

Merged
merged 2 commits into from
Sep 12, 2018

Conversation

bsalamat
Copy link
Member

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 22, 2018
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 22, 2018
@bsalamat bsalamat changed the base branch from master to release-1.12 August 22, 2018 18:59
@k8sio-netlify-preview-bot
Copy link
Collaborator

Deploy preview for kubernetes-io-master-staging ready!

Built with commit ec7aaf1

https://deploy-preview-10048--kubernetes-io-master-staging.netlify.com

@bsalamat bsalamat force-pushed the scheduler-perf-opt branch from ec7aaf1 to 1cec270 Compare August 22, 2018 19:04
@k8sio-netlify-preview-bot
Copy link
Collaborator

k8sio-netlify-preview-bot commented Aug 22, 2018

Deploy preview for kubernetes-io-vnext-staging processing.

Built with commit 09fb770

https://app.netlify.com/sites/kubernetes-io-vnext-staging/deploys/5b988bac82d3f16944598814

@bsalamat bsalamat force-pushed the scheduler-perf-opt branch from 1cec270 to 597089d Compare August 22, 2018 20:40
@Bradamant3 Bradamant3 added this to the 1.12 milestone Aug 27, 2018
@bsalamat
Copy link
Member Author

bsalamat commented Sep 6, 2018

/assign @ravisantoshgudimetla

Hi Ravi,
could you please help us review this PR?


{{< feature-state for_k8s_version="1.12" >}}

Kube-scheduler is the Kubernetes default scheduler. It is responsible for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this statement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in the overview section. It would be helpful for someone unfamiliar with the topic to know what they are reading about.

Kube-scheduler is the Kubernetes default scheduler. It is responsible for
placement of Pods on proper Nodes of a cluster. Nodes of a cluster that meet the
scheduling requirements of a Pod are called feasible Nodes for the Pod. The
scheduler find feasible Nodes for a Pod and then runs a set of functions to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/find/finds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{{% capture overview %}}

{{< feature-state for_k8s_version="1.12" >}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or probably, name this paragraph as background.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in the overview section. In all of our documents the first section serves as the overview/background.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 7, 2018
Copy link
Member Author

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{% capture overview %}}

{{< feature-state for_k8s_version="1.12" >}}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in the overview section. In all of our documents the first section serves as the overview/background.


{{< feature-state for_k8s_version="1.12" >}}

Kube-scheduler is the Kubernetes default scheduler. It is responsible for
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in the overview section. It would be helpful for someone unfamiliar with the topic to know what they are reading about.

Kube-scheduler is the Kubernetes default scheduler. It is responsible for
placement of Pods on proper Nodes of a cluster. Nodes of a cluster that meet the
scheduling requirements of a Pod are called feasible Nodes for the Pod. The
scheduler find feasible Nodes for a Pod and then runs a set of functions to
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

score a higher value for running the given Pod might not even be passed to the
scoring phase. This would result in a less than ideal placement of the Pod. For
this reason, the value should not be set to very low percentages. A general rule
of thumb is to never set the value to anything lower than 30. Lower values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower than 30 in any cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is percentage. So, general rule of thumb is not to set this value below 30% unless you know what you are doing.

Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsalamat Mostly LGTM except for nits.

scheduler continues from the point in the array that it stopped when checking
feasibility of the previous Pod.

If Nodes are in multiple zones, the scheduler iterates over Nodes in various
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we can mention the side-effect of having unequal number of nodes in zones causing unequal spread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unequal number of nodes in zones shouldn't be a problem. It of course causes more pods to be scheduled in zones with larger number of zones, but that's expected and not different than before.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 7, 2018
@zparnold
Copy link
Member

@bsalamat @ravisantoshgudimetla Is this doc in a "technical" shape (i.e. ready for my review from a english/grammar perspective?) I'd like to get this merged this week ideally. 😄


{{% capture overview %}}

{{< feature-state for_k8s_version="1.12" >}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we show its version state here ? e.g. :
{{< feature-state for_k8s_version="v1.12" state="alpha" >}}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not include an API change. It is a performance improvement shipped with 1.12. So, the backward compatibility and its associated guarantees that exist with alpha, beta, GA features do not apply to this one.


{{% capture body %}}

## Percentage of Nodes To Score
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/To/to ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

scheduler still checks all the nodes, simply because there are not enough
feasible nodes to stop the scheduler's search early. {{< /note >}}

**To disable this feature**, you can set `percentageOfNodesToScore` to 100.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a feature gate for this feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no feature gate, but as is mentioned in the doc, it can be practically disabled by setting the percentage to 100.

Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks @bsalamat for the PR. The PR LGTM

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 10, 2018
@ravisantoshgudimetla
Copy link
Contributor

Is this doc in a "technical" shape (i.e. ready for my review from a english/grammar perspective?) I'd like to get this merged this week ideally. smile

@zparnold - This looks good to me from technical perspective. Thanks

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 10, 2018
@justaugustus
Copy link
Member

@bsalamat -- what's the status here? Docs are overdue at this point and we need to merge ASAP.
@ravisantoshgudimetla -- if this still looks good from the technical, please re-/lgtm when you have a chance.

@bsalamat
Copy link
Member Author

The PR is ready. I asked Ravi on Slack to re-lgtm it.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 11, 2018
Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2018
@zparnold
Copy link
Member

@bsalamat I updated this PR with some style/grammar points. Let me know if it still means what you meant for it to. Otherwise it's ready to merge.
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zparnold

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 12, 2018
@bsalamat
Copy link
Member Author

@zparnold Your changes look good to me. Thanks for the edit!

@zparnold
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2018
@k8s-ci-robot k8s-ci-robot merged commit 9d3671d into kubernetes:release-1.12 Sep 12, 2018
k8s-ci-robot pushed a commit that referenced this pull request Sep 27, 2018
* Update docs for fields allowed at root of CRD schema (#9973)

* add plugin docs and examples (#10053)

* docs update to promote TaintNodesByCondition to beta (#9626)

* HPA Specificity Improvements (#8757)

Updated the HPA docs to reference the `autoscaling/v2beta2` API version,
and added documentation about the new fields.

* adjust docs for pod ready++ (#10049)

* Remove --cadvisor-port - has been deprecated since v1.10 (#10023)

Change-Id: Id2a685473a243aef492a98ff450759f39e362557

* Add Documentation for Snapshot Feature (#9948)

* Add documentation for snapshot feature

* Update volume-snapshots.md

* Add dry-run to api-concepts (#10033)

* kubeadm-init: Update the offline support section (#10062)

The update includes the following things (in mind with Kubernetes 1.12):

- Remove the 1.8 image versions
- Add the 1.10 image versions that were missing until now
- Include a comment for the missing arch suffixes in 1.12

Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>

* Say bye to `DynamicProvisioningScheduling` (#10157)

The mentioned feature gate is now collapsed into `VolumeScheduling`.

xref: kubernetes/kubernetes#67432

* Update ResourceQuota per PriorityClass state for 1.12 (#10229)

* TokenRequest and TokenRequestProjection now beta (#10161)

xref: kubernetes/kubernetes#67349

* Change feature state for kms provider to beta. (#10230)

KMS Provider will be graduating to beta in v1.12, reflecting this change on the website.

* coredns default (#10200)

* Promote ShareProcessNamespace to beta in docs (#9996)

* Add CoreDNS details to DNS Debug docs (#10201)

* add coredns details

* address nits, add query logging section

* Update docs with topology aware dynamic provisioning (#9939)

* Document topology aware volume binding feature

* update for readability

* Update storage-classes.md

* comma splice

* don't abbreviate

* HPA Algorithm Information Improvements (#9780)

* Update HPA docs with more algorithm details

The HPA docs pointed to an out-of-date document for information on the
algorithm details, which users were finding confusing.  This sticks a
section on the algorithm in the HPA docs instead, documenting both
general behavior and corner cases.

* Add glossary info, HPA docs on quantities

People often ask about the quantity notation when working with the
metrics APIs, so this adds a glossary entry on quantities (since they're
used elsewhere in the system), and a short explantation in the HPA walkthough.

* Information about HPA readiness and stabilization

This adds information about the new changes to HPA readiness and
stabilization from kubernetes/enhancements#591, and other minor changes that
landed in Kubernetes 1.12.

* Update horizontal-pod-autoscale.md

* Audit 1.12 doc (#9953)

* audit 1.12 document

* remove legacy audit feature

kubernetes/kubernetes#65862

* update feature gate doc

* MountPropagation is now GA (#10090)

* RuntimeClass documentation (#10102)

* RuntimeClass documentation

* Update runtime-class.md

* Add documentation for Scheduler performance tuning (#10048)

* Add documentation for Scheduler performance tuning

* Update scheduler-perf-tuning.md

* TTL controller for cleaning up finished resources (#10064)

* TTL controller for cleaning up finished resources

* Address comments

* Update ttlafterfinished.md

* Bump quota configuration api version (#10217)

* Incremental update from master (#10278)

* fix invalid href of cloud controller manager (#10240)

* fix invalid yaml format (#10238)

* update storage-limits doc with Azure disk part (#10224)

update storage-limits doc with Azure disk part

fix comments

* Update kubelet-config-file.md (#10222)

Update link to KubeletConfiguration struct.

* fix a trivial misspelling (#10244)

* Fix cassandra-statefulset.yaml indent level (#10243)

* Mention minimum etcd versions (#10208)

Source: https://groups.google.com/d/msg/kubernetes-dev/jMPA4JzKiY4/HIx2ugvLBAAJ

* fix 404 error (#10250)

* Small verb tweak (#10190)

Present participle, ftw.

* Add AnchorJS logic for header links (#10155)

* Add AnchorJS JavaScript

* Remove existing inpage_heading logic

* Remove underline from anchor tags

* Use single icon and add touch visibility

* Use paragraph link icon for AnchorJS

* Update Sass to use code formatting in docsContent headers

* Update header size coverage to H3-H6

* fix broken link in kubefed.md (#10254)

* Update the version numbers for the X-Remote-Extra- and Impersonate-Extra- key fixes (#9827)

The fix was cherry picked into 1.11.3, 1.10.7, and 1.9.11:

kubernetes/kubernetes#67162
kubernetes/kubernetes#67163
kubernetes/kubernetes#67164

* fix typo (#10168)

* fix typo

* addressing comments.

* Update setup-ha-etcd-with-kubeadm.md

* fix typos (#10252)

* fix description of contribute guide (#10253)

* describe truncate feature about advanced audit (#10236)

* describe truncate feature about advanced audit

* Update audit.md

* docs update to promote ScheduleDaemonSetPods to beta (#9923)

* Dynamic volume limit updates for 1.12 (#10211)

* add a placeholder commit

* Update docs for csi volume limits

* Update storage-limits.md

* Add "MayRunAs" value among other GroupStrategies (#9888)

* Add CoreDNS details to the customize DNS doc (#10228)

* Add CoreDNS details to the customize DNS doc

Rewrite the document to include more details about CoreDNS, since it's now the default from v1.12

* Address comments

* Improve doc wording

* Fix link

* Update dns-custom-nameservers.md

* Update dns-custom-nameservers.md

* Fix secrets docs in 1.12 branch (#10056)

* Fix secrets docs

* Update secret.md

* Revert CoreDNS Docs (#10319)

* Revert "Add CoreDNS details to DNS Debug docs (#10201)"

This reverts commit 462817a.

* Revert "Add CoreDNS details to the customize DNS doc (#10228)"

This reverts commit e7319ee.

* Revert "coredns default (#10200)"

This reverts commit 698e93b.

* Add CRI installation instructions page

Added cri-installation page with CRI installation instructions
Referenced it from kubeadm-init and install-kubeadm pages.

* kubeadm: update API types documentation for 1.12 (#10283)

v1alpha2 -> v1alpha3
MasterConfiguration -> [new-api-types]

* TokenRequest feature documentation (#10295)

* AdvancedAuditing is now GA (#10156)

xref: kubernetes/kubernetes#65862

`AdvancedAuditing` feature is GA in 1.12. This PR adjusts the related
docs.

* update runtime-class.md (#10332)

* update runtime-class.md

* Update runtime-class.md

* Document cross-authorizer permissions for creating RBAC roles (#10015)

* Document cross-authorizer permissions for creating RBAC roles

* Update rbac.md

* kubeadm: update authored content for 1.12 (reference docs and cluster creation) (#10348)

* kubeadm: update authored content in reference docs for 1.12

* kubeadm: add time frame in create-cluster-kubeadm for 1.12

* add AllowedProcMountTypes and ProcMountType to docs (#9911)

Signed-off-by: Jess Frazelle <acidburn@microsoft.com>

* kubeadm: add new command line reference (#10306)

Add:
- placeholder files
- include place holder files
- include "renew" sub command
- add missing tabs for "alpha phase kubelet"

* Documenting SCTP support in Kubernetes (#10279)

* Documenting SCTP support in Kubernetes Service, Endpoint, NetworkPolicy and Pod

* Updates based on comments on the PR

* kubectl expose update with SCTP support

* Updated according to comments in the PR

* Revert "kubectl expose update with SCTP support"

This reverts commit 0d5a1e6.

* TLS Bootstrap and Server Cert Rotation feature documentation (#10232)

* TokenRequest feature documentation

* line wrapping to make review not insane

* update content for GA without major refactor

* Update kubelet-tls-bootstrapping.md

* Add clarifications for volume snapshots (#10296)

* Update kubadm ha installation for 1.12 (#10264)

* Update kubadm ha installation for 1.12

Signed-off-by: Chuck Ha <ha.chuck@gmail.com>

* update stable version

Signed-off-by: Chuck Ha <ha.chuck@gmail.com>

* Update stacked control plane for v1.12 (#2)

* use v1alpha3

Signed-off-by: Chuck Ha <ha.chuck@gmail.com>

* more v1alpha3 (#4)

* updates

Signed-off-by: Chuck Ha <ha.chuck@gmail.com>

* Document how to run in-tree cloud providers with kubeadm (#10357)

Change-Id: Iab6b996a830503d74a6eb0c507c5f8ca7a39235b

* kubeadm reference doc for release 1.12 (#10359)

* Revert "Revert "Add CoreDNS details to DNS Debug docs (#10201)""

This reverts commit bb30f4d.

* Revert "Revert "Add CoreDNS details to the customize DNS doc (#10228)""

This reverts commit bc23d45.

* Revert "Revert "coredns default (#10200)""

This reverts commit 7f4350d.

* add missing instruction for ha guide (#10374)

Signed-off-by: Chuck Ha <ha.chuck@gmail.com>

* kubeadm - Ha upgrade updates (#10340)

* Update HA upgrade docs

* Adds external etcd HA upgrade guide

Signed-off-by: Chuck Ha <ha.chuck@gmail.com>

* copyedit

* more edits

* add runasgroup in psp (#10076)

* update KubeletPluginsWatcher feature gate (#10205)

* generated 1.12 docs

* Building Multi-arch images with Manifests (#10379)

In 1.12, a variety of images used in a typical kubernetes installation
have started to using manifests to better support environments with arm
or ppc64le architectures. For example all images used with kubeadm by
default have manifests, another would be all the tests in the
conformance test suite. Here we capture the best practices for everyone
to start using manifests in their own workflows.

Change-Id: I5ba4c5fe55ffc9486a8251760f3352be4f2e1494

* Upgrade docs for v1.12 (#10344)

* generated assets and docs

* remove 1.7

* update 1.12

* update plugin documentation under docs>tasks>extend-kubectl (#10259)

* update plugin documentation under docs>tasks>extend-kubectl

* Update kubectl-plugins.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants