From c6aa802865a2f35cb9645bd3155a79bf14ee30a9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Frederico=20Mu=C3=B1oz?= Date: Fri, 26 Jul 2024 15:08:42 +0100 Subject: [PATCH] Add SIG Scheduling spotlight Closes: #519 Co-authored-by: Arvind Parekh --- .../en/blog/2024/sig-scheduling-spotlight.md | 384 ++++++++++++++++++ 1 file changed, 384 insertions(+) create mode 100644 content/en/blog/2024/sig-scheduling-spotlight.md diff --git a/content/en/blog/2024/sig-scheduling-spotlight.md b/content/en/blog/2024/sig-scheduling-spotlight.md new file mode 100644 index 000000000..4764548f9 --- /dev/null +++ b/content/en/blog/2024/sig-scheduling-spotlight.md @@ -0,0 +1,384 @@ +--- +layout: blog +title: "Spotlight on SIG Scheduling" +slug: sig-scheduling-spotlight-2024 +date: 2024-09-06 +author: "Arvind Parekh" +--- + +In this SIG Scheduling spotlight we talked with [Kensei Nakada](https://github.com/sanposhiho/), an +approver in SIG Scheduling. + +## Introductions + +**Arvind:** **Hello, thank you for the opportunity to learn more about SIG Scheduling! Would you +like to introduce yourself and tell us a bit about your role, and how you got involved with +Kubernetes?** + +**Kensei**: Hi, thanks for the opportunity! I’m Kensei Nakada +([@sanposhiho](https://github.com/sanposhiho/)), a software engineer at +[Tetrate.io](https://tetrate.io/). I have been contributing to Kubernetes in my free time for more +than 3 years, and now I’m an approver of SIG-Scheduling in Kubernetes. Also, I’m a founder/owner of +two SIG subprojects, +[kube-scheduler-simulator](https://github.com/kubernetes-sigs/kube-scheduler-simulator) and +[kube-scheduler-wasm-extension](https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension). + +# About SIG Scheduling + +**AP: That's awesome! You've been involved with the project since a long time. Can you provide a +brief overview of SIG Scheduling and explain its role within the Kubernetes ecosystem?** + +**K**: As the name implies, our responsibility is to enhance scheduling within +Kubernetes. Specifically, we develop the components that determine which Node is the best place for +each Pod. In Kubernetes, our main focus is on maintaining the +[kube-scheduler](https://kubernetes.io/docs/concepts/scheduling-eviction/kube-scheduler/), along +with other scheduling-related components as part of our SIG subprojects. + +**AP: I see, got it! That makes me curious--what recent innovations or developments has SIG +Scheduling introduced to Kubernetes scheduling?** + +**K**: From a feature perspective, there have been [several +enhancements](https://kubernetes.io/blog/2023/04/17/fine-grained-pod-topology-spread-features-beta/) +to `PodTopologySpread` recently. `PodTopologySpread` is a relatively new feature in the scheduler, +and we are still in the process of gathering feedback and making improvements. + +Most recently, we have been focusing on a new internal enhancement called +[QueueingHint](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/4247-queueinghint/README.md) +which aims to enhance scheduling throughput. Throughput is one of our crucial metrics in +scheduling. Traditionally, we have primarily focused on optimizing the latency of each scheduling +cycle. QueueingHint takes a different approach, optimizing when to retry scheduling, thereby +reducing the likelihood of wasting scheduling cycles. + +**A: That sounds interesting! Are there any other interesting topics or projects you are currently +working on within SIG Scheduling?** + +**K**: I’m leading the development of `QueueingHint` which I just shared. Given that it’s a big new +challenge for us, we’ve been facing many unexpected challenges, especially around the scalability, +and we’re trying to solve each of them to eventually enable it by default. + +And also, I believe +[kube-scheduler-wasm-extention](https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension) +(SIG sub project) that I started last year would be interesting to many people. Kubernetes has +various extensions from many components. Traditionally, extensions are provided via webhooks +([extender](https://github.com/kubernetes/design-proposals-archive/blob/main/scheduling/scheduler_extender.md) +in the scheduler) or Go SDK ([Scheduling +Framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/) in the +scheduler). However, these come with drawbacks - performance issues with webhooks and the need to +rebuild and replace schedulers with Go SDK, posing difficulties for those seeking to extend the +scheduler but lacking familiarity with it. The project is trying to introduce a new solution to +this general challenge - a [WebAssembly](https://webassembly.org/) based extension. Wasm allows +users to build plugins easily, without worrying about recompiling or replacing their scheduler, and +sidestepping performance concerns. + +Through this project, sig-scheduling has been learning valuable insights about WebAssembly's +interaction with large Kubernetes objects. And I believe the experience that we’re gaining should be +useful broadly within the community, beyond sig-scheduling. + +**A: Definitely! Now, there are currently 8 subprojects inside SIG Scheduling. Would you like to +talk about them? Are there some interesting contributions by those teams you want to highlight?** + +**K**: Let me pick up three sub projects; Kueue, KWOK and descheduler. + +[Kueue](https://github.com/kubernetes-sigs/kueue): +: Recently, many people have been trying to manage batch workloads with Kubernetes, and in 2022, +Kubernetes community founded +[WG-Batch](https://github.com/kubernetes/community/blob/master/wg-batch/README.md) for better +support for such batch workloads in Kubernetes. [Kueue](https://github.com/kubernetes-sigs/kueue) +is a project that takes a crucial role for it. It’s a job queueing controller, deciding when a job +should wait, when a job should be admitted to start, and when a job should be preempted. Kueue aims +to be installed on a vanilla Kubernetes cluster while cooperating with existing matured controllers +(scheduler, cluster-autoscaler, kube-controller-manager, etc). + +[KWOK](https://github.com/kubernetes-sigs/kwok): +: KWOK is a component in which you can create a cluster of thousands of Nodes in seconds. It’s + mostly useful for simulation/testing as a lightweight cluster, and actually another SIG sub + project [kube-scheduler-simulator](https://github.com/kubernetes-sigs/kube-scheduler-simulator) + uses KWOK background. + +[descheduler](https://github.com/kubernetes-sigs/descheduler): +: Descheduler is a component recreating pods that are running on undesired Nodes. In Kubernetes, +scheduling constraints (`PodAffinity`, `NodeAffinity`, `PodTopologySpread`, etc) are honored only at +Pod schedule, but it’s not guaranteed that the contrtaints are kept being satisfied afterwards. +Descheduler evicts Pods violating their scheduling constraints (or other undesired conditions) so +that they’re recreated and rescheduled. + +[Descheduling Framework](https://github.com/kubernetes-sigs/descheduler/blob/master/keps/753-descheduling-framework/README.md). +: One very interesting on-going project, similar to [Scheduling + Framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/) in the + scheduler, aiming to make descheduling logic extensible and allow maintainers to focus on building + a core engine of descheduler. + +** AP: Thank you for letting us know! And I have to ask, what are some of your favorite things about +this SIG?** + +**K**: What I really like about this SIG is how actively engaged everyone is. We come from various +companies and industries, bringing diverse perspectives to the table. Instead of these differences +causing division, they actually generate a wealth of opinions. Each view is respected, and this +makes our discussions both rich and productive. + +I really appreciate this collaborative atmosphere, and I believe it has been key to continuously +improving our components over the years. + +## Contributing to SIG Scheduling + +**AP: Kubernetes is a community-driven project. Any recommendations for new contributors or +beginners looking to get involved and contribute to SIG scheduling? Where should they start?** + +**K**: Let me start with a general recommendation for contributing to any SIG: a common approach is +to look for +[good-first-issue](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). +However, you'll soon realize that many people worldwide are trying to contribute to the Kubernetes +repository. + +I suggest starting by examining the implementation of a component that interests you. If you have +any questions about it, ask in the corresponding Slack channel (e.g., #sig-scheduling for the +scheduler, #sig-node for kubelet, etc). Once you have a rough understanding of the implementation, +look at issues within the SIG (e.g., +[sig-scheduling](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Asig%2Fscheduling)), +where you'll find more unassigned issues compared to good-first-issue ones. You may also want to +filter issues with the +[kind/cleanup](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue++label%3Akind%2Fcleanup+) +label, which often indicates lower-priority tasks and can be starting points. + +Specifically for SIG Scheduling, you should first understand the [Scheduling +Framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/), which is +the fundamental architecture of kube-scheduler. Most of the implementation is found in +[pkg/scheduler](https://github.com/kubernetes/kubernetes/tree/master/pkg/scheduler). I suggest +starting with +[ScheduleOne](https://github.com/kubernetes/kubernetes/blob/0590bb1ac495ae8af2a573f879408e48800da2c5/pkg/scheduler/schedule_one.go#L66) +function and then exploring deeper from there. + +Additionally, apart from the main kubernetes/kubernetes repository, consider looking into +sub-projects. These typically have fewer maintainers and offer more opportunities to make a +significant impact. Despite being called "sub" projects, many have a large number of users and a +considerable impact on the community. + +And last but not least, remember contributing to the community isn’t just about code. While I +talked a lot about the implementation contribution, there are many ways to contribute, and each one +is valuable. One comment to an issue, one feedback to an existing feature, one review comment in PR, +one clarification on the documentation; every small contribution helps drive the Kubernetes +ecosystem forward. + +**AP: Those are some pretty useful tips! And if I may ask, how do you assist new contributors in +getting started, and what skills are contributors likely to learn by participating in SIG +Scheduling?** + +**K**: Our maintainers are available to answer your questions in the #sig-scheduling Slack +channel. By participating, you'll gain a deeper understanding of Kubernetes scheduling and have the +opportunity to collaborate and network with maintainers from diverse backgrounds. You'll learn not +just how to write code, but also how to maintain a large project, design and discuss new features, +address bugs, and much more. + +## Future Directions + +**AP: What are some Kubernetes-specific challenges in terms of scheduling? Are there any particular +pain points?** + +**K**: Scheduling in Kubernetes can be quite challenging because of the diverse needs of different +organizations with different business requirements. Supporting all possible use cases in +kube-scheduler is impossible. Therefore, extensibility is a key focus for us. A few years ago, we +rearchitected kube-scheduler with [Scheduling +Framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/), which +offers flexible extensibility for users to implement various scheduling needs through plugins. This +allows maintainers to focus on the core scheduling features and the framework runtime. + +Another major issue is maintaining sufficient scheduling throughput. Typically, a Kubernetes cluster +has only one kube-scheduler, so its throughput directly affects the overall scheduling scalability +and, consequently, the cluster's scalability. Although we have an internal performance test +([scheduler_perf](https://github.com/kubernetes/kubernetes/tree/master/test/integration/scheduler_perf)), +unfortunately, we sometimes overlook performance degradation in less common scenarios. It’s +difficult as even small changes, which look irrelevant to performance, can lead to degradation. + +**AP: What are some upcoming goals or initiatives for SIG Scheduling? How do you envision the SIG evolving in the future?** + +**K**: Our primary goal is always to build and maintain _extensible_ and _stable_ scheduling +runtime, and I bet this goal will remain unchanged forever. + +As already mentioned, extensibility is key to solving the challenge of the diverse needs of +scheduling. Rather than trying to support every different use case directly in kube-scheduler, we +will continue to focus on enhancing extensibility so that it can accommodate various use +cases. [kube-scheduler-wasm-extention](https://github.com/kubernetes-sigs/kube-scheduler-wasm-extension) +that I mentioned is also part of this initiative. + +Regarding stability, introducing new optimizations like QueueHint is one of our +strategies. Additionally, maintaining throughput is also a crucial goal towards the future. We’re +planning to enhance our throughput monitoring +([ref](https://github.com/kubernetes/kubernetes/issues/124774)), so that we can notice degradation +as much as possible on our own before releasing. But, realistically, we can't cover every possible +scenario. We highly appreciate any attention the community can give to scheduling throughput and +encourage feedback and alerts regarding performance issues! + +## Closing Remarks + +**AP: Finally, what message would you like to convey to those who are interested in learning more +about SIG Scheduling?** + +**K**: Scheduling is one of the most complicated areas in Kubernetes, and you may find it difficult +at first. But, as I shared earlier, you can find many opportunities for contributions, and many +maintainers are willing to help you understand things. We know your unique perspective and skills +are what makes our open source so powerful :) + +Feel free to reach out to us in Slack +([#sig-scheduling](https://kubernetes.slack.com/archives/C09TP78DV)) or +[meetings](https://github.com/kubernetes/community/blob/master/sig-scheduling/README.md#meetings). +I hope this article interests everyone and we can see new contributors! + +**AP: Thank you so much for taking the time to do this! I'm confident that many will find this +information invaluable for understanding more about SIG Scheduling and for contributing to the SIG.** + + + + + + + + + + + + + + + + + + + + + + + + + + + + +----------------------------------- + +In this SIG Architecture spotlight I talked with [Madhav Jivrajani](https://github.com/MadhavJivrajani) +(VMware), a member of the Code Organization subproject. + +## Introducing the Code Organization subproject + +**Frederico (FSM)**: Hello Madhav, thank you for your availability. Could you start by telling us a +bit about yourself, your role and how you got involved in Kubernetes? + +**Madhav Jivrajani (MJ)**: Hello! My name is Madhav Jivrajani, I serve as a technical lead for SIG +Contributor Experience and a GitHub Admin for the Kubernetes project. Apart from that I also +contribute to SIG API Machinery and SIG Etcd, but more recently, I’ve been helping out with the work +that is needed to help Kubernetes [stay on supported versions of +Go](https://github.com/kubernetes/enhancements/tree/cf6ee34e37f00d838872d368ec66d7a0b40ee4e6/keps/sig-release/3744-stay-on-supported-go-versions), +and it is through this that I am involved with the Code Organization subproject of SIG Architecture. + +**FSM**: A project the size of Kubernetes must have unique challenges in terms of code organization +-- is this a fair assumption? If so, what would you pick as some of the main challenges that are +specific to Kubernetes? + +**MJ**: That’s a fair assumption! The first interesting challenge comes from the sheer size of the +Kubernetes codebase. We have ≅2.2 million lines of Go code (which is steadily decreasing thanks to +[dims](https://github.com/dims) and other folks in this sub-project!), and a little over 240 +dependencies that we rely on either directly or indirectly, which is why having a sub-project +dedicated to helping out with dependency management is crucial: we need to know what dependencies +we’re pulling in, what versions these dependencies are at, and tooling to help make sure we are +managing these dependencies across different parts of the codebase in a consistent manner. + +Another interesting challenge with Kubernetes is that we publish a lot of Go modules as part of the +Kubernetes release cycles, one example of this is +[`client-go`](https://github.com/kubernetes/client-go).However, we as a project would also like the +benefits of having everything in one repository to get the advantages of using a monorepo, like +atomic commits... so, because of this, code organization works with other SIGs (like SIG Release) to +automate the process of publishing code from the monorepo to downstream individual repositories +which are much easier to consume, and this way you won’t have to import the entire Kubernetes +codebase! + +## Code organization and Kubernetes + +**FSM**: For someone just starting contributing to Kubernetes code-wise, what are the main things +they should consider in terms of code organization? How would you sum up the key concepts? + +**MJ**: I think one of the key things to keep in mind at least as you’re starting off is the concept +of staging directories. In the [`kubernetes/kubernetes`](https://github.com/kubernetes/kubernetes) +repository, you will come across a directory called +[`staging/`](https://github.com/kubernetes/kubernetes/tree/master/staging). The sub-folders in this +directory serve as a bunch of pseudo-repositories. For example, the +[`kubernetes/client-go`](https://github.com/kubernetes/client-go) repository that publishes releases +for `client-go` is actually a [staging +repo](https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/client-go). + +**FSM**: So the concept of staging directories fundamentally impact contributions? + +**MJ**: Precisely, because if you’d like to contribute to any of the staging repos, you will need to +send in a PR to its corresponding staging directory in `kubernetes/kubernetes`. Once the code merges +there, we have a bot called the [`publishing-bot`](https://github.com/kubernetes/publishing-bot) +that will sync the merged commits to the required staging repositories (like +`kubernetes/client-go`). This way we get the benefits of a monorepo but we also can modularly +publish code for downstream consumption. PS: The `publishing-bot` needs more folks to help out! + +For more information on staging repositories, please see the [contributor +documentation](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/staging.md). + +**FSM**: Speaking of contributions, the very high number of contributors, both individuals and +companies, must also be a challenge: how does the subproject operate in terms of making sure that +standards are being followed? + +**MJ**: When it comes to dependency management in the project, there is a [dedicated +team](https://github.com/kubernetes/org/blob/a106af09b8c345c301d072bfb7106b309c0ad8e9/config/kubernetes/org.yaml#L1329) +that helps review and approve dependency changes. These are folks who have helped lay the foundation +of much of the +[tooling](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/vendor.md) +that Kubernetes uses today for dependency management. This tooling helps ensure there is a +consistent way that contributors can make changes to dependencies. The project has also worked on +additional tooling to signal statistics of dependencies that is being added or removed: +[`depstat`](https://github.com/kubernetes-sigs/depstat) + +Apart from dependency management, another crucial task that the project does is management of the +staging repositories. The tooling for achieving this (`publishing-bot`) is completely transparent to +contributors and helps ensure that the staging repos get a consistent view of contributions that are +submitted to `kubernetes/kubernetes`. + +Code Organization also works towards making sure that Kubernetes [stays on supported versions of +Go](https://github.com/kubernetes/enhancements/tree/cf6ee34e37f00d838872d368ec66d7a0b40ee4e6/keps/sig-release/3744-stay-on-supported-go-versions). The +linked KEP provides more context on why we need to do this. We collaborate with SIG Release to +ensure that we are testing Kubernetes as rigorously and as early as we can on Go releases and +working on changes that break our CI as a part of this. An example of how we track this process can +be found [here](https://github.com/kubernetes/release/issues/3076). + +## Release cycle and current priorities + +**FSM**: Is there anything that changes during the release cycle? + +**MJ** During the release cycle, specifically before code freeze, there are often changes that go in +that add/update/delete dependencies, fix code that needs fixing as part of our effort to stay on +supported versions of Go. + +Furthermore, some of these changes are also candidates for +[backporting](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-release/cherry-picks.md) +to our supported release branches. + +**FSM**: Is there any major project or theme the subproject is working on right now that you would +like to highlight? + +**MJ**: I think one very interesting and immensely useful change that +has been recently added (and I take the opportunity to specifically +highlight the work of [Tim Hockin](https://github.com/thockin) on +this) is the introduction of [Go workspaces to the Kubernetes +repo](/blog/2024/03/19/go-workspaces-in-kubernetes/). A lot of our +current tooling for dependency management and code publishing, as well +as the experience of editing code in the Kubernetes repo, can be +significantly improved by this change. + +## Wrapping up + +**FSM**: How would someone interested in the topic start helping the subproject? + +**MJ**: The first step, as is the first step with any project in Kubernetes, is to join our slack: +[slack.k8s.io](https://slack.k8s.io), and after that join the `#k8s-code-organization` channel. There is also a +[code-organization office +hours](https://github.com/kubernetes/community/tree/master/sig-architecture#meetings) that takes +place that you can choose to attend. Timezones are hard, so feel free to also look at the recordings +or meeting notes and follow up on slack! + +**FSM**: Excellent, thank you! Any final comments you would like to share? + +**MJ**: The Code Organization subproject always needs help! Especially areas like the publishing +bot, so don’t hesitate to get involved in the `#k8s-code-organization` Slack channel.