Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libvirt: Add Terraform variables for bootstrap node memory/CPU #852

Closed
wants to merge 1 commit into from

Conversation

sjug
Copy link

@sjug sjug commented Dec 10, 2018

Similar to #785, but for the bootstrap node.

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 10, 2018
@sjug
Copy link
Author

sjug commented Dec 10, 2018

Would be nice to have the terraform fmt also show -diff.

@wking
Copy link
Member

wking commented Dec 10, 2018

The bootstrap node is done and gone before the heavy monitoring stuff rolls out. Is yours OOMing anyway?

@sjug
Copy link
Author

sjug commented Dec 10, 2018

@wking It's not OOMing, more so I'm just trying to give it more resources to speed things up a bit.

@sjug
Copy link
Author

sjug commented Dec 10, 2018

/test e2e-aws

@wking
Copy link
Member

wking commented Dec 10, 2018

I'd expect we're network-limited by all the images we pull (especially if you aren't installing from a local registry). Can you post some benchmarks of time until bootstrap-complete on you system before and after this commit?

@abhinavdahiya
Copy link
Contributor

I'd expect we're network-limited by all the images we pull (especially if you aren't installing from a local registry). Can you post some benchmarks of time until bootstrap-complete on you system before and after this commit?

+1 I don't think we should add knobs for bootstrap node. If we see that we need more memory lets just use that.

@abhinavdahiya
Copy link
Contributor

@sjug I don't think we can accept this change of adding this tunable. But if you have some metrics that the current setup of bootstrap node is not adequate in terms of cpu / memory we can accept changing the defaults in this PR itself or close this one and open another with change to defaults.
Whatever you think is more appropriate. 😇

@sjug
Copy link
Author

sjug commented Jan 3, 2019

The install time is not very consistent so it was a bit challenging to measure any improvement or lack of. The times are an average of two runs, with large std dev.

As we increase cores:

2 cores 4 cores 6 cores
0:04:59 0:03:52 0:03:45

As we increase memory:

2G 4G 8G
0:04:59 0:04:45 0:04:10

That formatting was painful, apologies. Do we want to change the hardcoded default?

@crawford
Copy link
Contributor

crawford commented Jan 4, 2019

It looks like we should change the number of cores from two to four on the bootstrap domain. Is that valid for hosts which have less than four cores? Either way, I think we can close this PR in favor of changing the default.

@wking
Copy link
Member

wking commented Jan 4, 2019

It looks like we should change the number of cores from two to four on the bootstrap domain.

Memory should have a reasonable cap, but won't CPUs just be "the more, the merrier"? I don't know if bootstrap CPU allocation is something we want to try to fine-tune.

Is that valid for hosts which have less than four cores?

Check by setting it to 256? If that works, maybe we should make 256 the default ;).

@cgwalters
Copy link
Member

It looks like we should change the number of cores from two to four on the bootstrap domain.

Sanest thing is to match the host. On this topic, see https://pagure.io/standard-test-roles/pull-request/223

@mykaul
Copy link
Contributor

mykaul commented Jan 10, 2019

What about the worker nodes, btw? Any thought on their sizing?

@wking
Copy link
Member

wking commented Jan 10, 2019

What about the worker nodes, btw?

Worker nodes are created by the libvirt cluster-API provider, not by Terraform. So adjust their MachineSet objects just like you'd manage other Kubernetes resources (and ping the libvirt-provider maintainers if something isn't working ;).

@mykaul
Copy link
Contributor

mykaul commented Jan 10, 2019

@wking - how do I fix the 1st worker though? It's created and overloaded with 93% of its memory consumed! I can ping them of course.

@wking
Copy link
Member

wking commented Jan 10, 2019

how do I fix the 1st worker though...

We shouldn't distract here. Spin off to a libvirt-provider or separate installer issue? There may already be issues discussing this; I haven't checked

@sjug
Copy link
Author

sjug commented Jan 29, 2019

@cgwalters I'm not aware of any way to make terraform variables dynamic to get the host CPU count

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2019
@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 17, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sjug
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: enxebre

If they are not already assigned, you can assign the PR to them by writing /assign @enxebre in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sjug
Copy link
Author

sjug commented May 17, 2019

No longer have any variables for bootstrap node cpu or memory, closing.

@sjug sjug closed this May 17, 2019
@sjug sjug deleted the bootstrap_libvirt_vars branch May 17, 2019 12:36
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 17, 2019

@sjug: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-openstack f82c0ca link /test e2e-openstack

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform/libvirt size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants