Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job controller support modify sync flag #79264

Conversation

chengxiangwang
Copy link

@chengxiangwang chengxiangwang commented Jun 21, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

/kind feature

What this PR does / why we need it:
Our team USES batch.Job in large quantities for ai task training, and the number of jobs is about 20w per day. We need to adjust the number of Job controller workers to speed up the processing speed, so we need to expose this parameter
Which issue(s) this PR fixes:

Fixes #
allow the number of job controller workers to be modified.
Special notes for your reviewer:

Does this PR introduce a user-facing change?:

   minor change:
     A larger number means more responsive jobs, but more CPU and network load

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 21, 2019
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 21, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @chengxiangwang. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 21, 2019
@chengxiangwang
Copy link
Author

/assign @lavalamp

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 21, 2019
@chengxiangwang
Copy link
Author

/approve

@chengxiangwang
Copy link
Author

/lgtm

@k8s-ci-robot
Copy link
Contributor

@chengxiangwang: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chengxiangwang
Copy link
Author

/approve no-issue

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chengxiangwang
To complete the pull request process, please assign lavalamp
You can assign the PR to them by writing /assign @lavalamp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include a sentence under the "user facing" part of the OP, that explains the change.
i will defer to api-machinery whether this inclusion makes sense.

@@ -32,6 +32,7 @@ func (o *JobControllerOptions) AddFlags(fs *pflag.FlagSet) {
if o == nil {
return
}
fs.Int32Var(&o.ConcurrentJobSyncs, "concurrent-job-syncs", o.ConcurrentJobSyncs,"The number of job objects that are allowed to sync concurrently. Larger number = more responsive jobs, but more CPU (and network) load")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor change:
A larger number means more responsive jobs, but more CPU and network load

Copy link
Author

@chengxiangwang chengxiangwang Jul 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neolit123 done,thxs

Copy link
Member

@neolit123 neolit123 Jul 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see your update.
did you forget to push?

please use git push -f .... so that this PR only has one commit.

@roycaihw
Copy link
Member

/sig apps
/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jun 24, 2019
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 4, 2019
@chengxiangwang
Copy link
Author

/approve

@chengxiangwang
Copy link
Author

/lgtm

@k8s-ci-robot
Copy link
Contributor

@chengxiangwang: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 2, 2019
@quinton-hoole
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 7, 2019
@quinton-hoole
Copy link
Contributor

/retest

@quinton-hoole
Copy link
Contributor

quinton-hoole commented Oct 7, 2019

The failure of pull-kubernetes-e2e-gce-100-performance looks like a flake in the test infrastructure, specifically "ERROR: (gcloud.compute.routers.create) Quota 'ROUTERS' exceeded. Limit: 10.0 globally."

It does not appear to have anything to do with this particular PR.

See kubernetes/test-infra#14611 fixed in kubernetes/test-infra#14598 .

W1007 18:20:13.785] Created [https://www.googleapis.com/compute/v1/projects/k8s-presubmit-scale/regions/us-east1/subnetworks/e2e-79264-95a39-custom-subnet].
I1007 18:20:14.001] NAME                           REGION    NETWORK          RANGE
I1007 18:20:14.002] e2e-79264-95a39-custom-subnet  us-east1  e2e-79264-95a39  10.40.0.0/22
I1007 18:20:14.065] Created subnetwork e2e-79264-95a39-custom-subnet
I1007 18:20:14.066] Using subnet e2e-79264-95a39-custom-subnet
W1007 18:20:15.447] Creating router [e2e-79264-95a39-nat-router]...
W1007 18:20:19.703] ......................failed.
W1007 18:20:19.869] ERROR: (gcloud.compute.routers.create) Quota 'ROUTERS' exceeded.  Limit: 10.0 globally.
W1007 18:20:19.948] 2019/10/07 18:20:19 process.go:155: Step './hack/e2e-internal/e2e-up.sh' finished in 1m41.565428081s
W1007 18:20:19.948] 2019/10/07 18:20:19 e2e.go:519: Dumping logs from nodes to GCS directly at path: gs://kubernetes-jenkins/pr-logs/pull/79264/pull-kubernetes-e2e-gce-100-performance/1181269224534839296/artifacts
W1007 18:20:19.948] 2019/10/07 18:20:19 process.go:153: Running: ./cluster/log-dump/log-dump.sh /workspace/_artifacts gs://kubernetes-jenkins/pr-logs/pull/79264/pull-kubernetes-e2e-gce-100-performance/1181269224534839296/artifacts
W1007 18:20:20.019] Trying to find master named 'e2e-79264-95a39-master'
W1007 18:20:20.019] Looking for address 'e2e-79264-95a39-master-ip'
I1007 18:20:20.120] Checking for custom logdump instances, if any
I1007 18:20:20.120] Sourcing kube-util.sh
I1007 18:20:20.121] Detecting project
I1007 18:20:20.121] Project: k8s-presubmit-scale
I1007 18:20:20.121] Network Project: k8s-presubmit-scale
I1007 18:20:20.121] Zone: us-east1-b
I1007 18:20:20.121] Dumping logs from master locally to '/workspace/_artifacts'
W1007 18:20:20.751] ERROR: (gcloud.compute.addresses.describe) Could not fetch resource:
W1007 18:20:20.752]  - The resource 'projects/k8s-presubmit-scale/regions/us-east1/addresses/e2e-79264-95a39-master-ip' was not found

@quinton-hoole
Copy link
Contributor

/retest

@quinton-hoole
Copy link
Contributor

quinton-hoole commented Oct 8, 2019

Sounds like the underlying quota issue has been fixed:

#83493 (comment)

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 8, 2019

@chengxiangwang: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-verify 9b643eb link /test pull-kubernetes-verify
pull-kubernetes-e2e-gce 9b643eb link /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 5, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants