Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scalability tests #45

Closed
Tracked by #360 ...
ahg-g opened this issue Feb 22, 2022 · 15 comments
Closed
Tracked by #360 ...

Add scalability tests #45

ahg-g opened this issue Feb 22, 2022 · 15 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/productionization lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@ahg-g
Copy link
Contributor

ahg-g commented Feb 22, 2022

This is critical to better understand kueue's limits and where its bottlenecks. We should check if there is a way to use clusterloader for this

@ahg-g ahg-g added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 22, 2022
@ahg-g
Copy link
Contributor Author

ahg-g commented Mar 9, 2022

/help

@k8s-ci-robot
Copy link
Contributor

@ahg-g:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 9, 2022
@ArangoGutierrez
Copy link
Contributor

I think the first we can do for this issue is to lay down what we want to test/bench for Kueue.
I think I good first approach is to get a baseline of resource utilization vs scale.

  • How many sample queues can kueue hold when running on 1 cpu and 200MB mem.
  • On an empty cluster with "infinite" resources, how long does it take to Kueue to fully admit all the jobs into a ClusterQueue

those are init test ideas

@alculquicondor
Copy link
Contributor

I think we should test "infinite" resources but also "limited" (say, the ClusterQueues have 20% of the total required requests for all jobs)

@ArangoGutierrez
Copy link
Contributor

With config

tuned-adm profile virtual-host
minikube start \
        --driver=kvm2 --container-runtime=cri-o \
        --extra-config=kubelet.cgroup-driver=systemd \
        --kubernetes-version=latest \
        --kvm-numa-count=1 \
        --nodes=3 --cpus=3 --memory=4g \

10 jobs CPU/Mem usage:
Screenshot from 2022-03-15 16-27-00

@ArangoGutierrez
Copy link
Contributor

Can't get more on Minikube :( , will have to move to something else. will report more later

@ArangoGutierrez
Copy link
Contributor

empty queues, staging Kueue, after 15 mins of inactivity (bootstrap stabilization)

Base line

Screenshot from 2022-03-15 19-33-30

@ArangoGutierrez
Copy link
Contributor

We need to create a list of target features to stress test to move forward with this issue

@ArangoGutierrez
Copy link
Contributor

Do we want to do resource utilization testing?
e.g :

  • How many CPU and MEM is used by Kueue while processing X number of Workloads
  • CPU and MEM on a system with "infinite" resources vs a system with no resources, to see how a high traffic moment can affect resource consumption by Kueue.

@ArangoGutierrez
Copy link
Contributor

  • In a system with "infinite" resources, how long does it take to Queue to admit X number of workloads

@ahg-g
Copy link
Contributor Author

ahg-g commented Apr 13, 2022

We need to run tests that measure:

  1. Workload admission latency
  2. Workload admission throughput
  3. cpu/memory (we don't set the limits, so it can compete with whatever available resources on the master)

The scale test should measure that while varying

  1. The number of Namespaces/Queues; tens to hundreds.
  2. The number of ClusterQueues; few to as many as Namespaces
  3. The number workloads; hundreds, thousands, to few tens of thousands

As for CPU and memory requests, we can use settings similar to kube-scheduler.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2022
@alculquicondor
Copy link
Contributor

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 12, 2022
@alculquicondor
Copy link
Contributor

A first load test was added in #462.

We can create separate issues for follow ups and to improve performance.

/close

@k8s-ci-robot
Copy link
Contributor

@alculquicondor: Closing this issue.

In response to this:

A first load test was added in #462.

We can create separate issues for follow ups and to improve performance.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/productionization lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

5 participants