Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 ✨ 🧑‍🤝‍🧑 add proposal for Node Bootstrapping working group #11407

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

t-lo
Copy link
Contributor

@t-lo t-lo commented Nov 12, 2024

What this PR does / why we need it:

Propose a working group for node bootstrapping and cluster provisioning.
The need for this working group originated from an ongoing discussion around separating cluster provisioning and node bootstrapping, as stated in the WG's User Story.

Which issue(s) this PR fixes

CC

Tags

/area provider/bootstrap-kubeadm
/area bootstrap

/kind documentation
/kind proposal

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 12, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/needs-area PR is missing an area label label Nov 12, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @t-lo. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 12, 2024
@t-lo
Copy link
Contributor Author

t-lo commented Nov 12, 2024

"@elmiko" "@eljohnson92" I took the liberty to add you as co-stakeholders to the WG proposal - IIRC you expressed interest in participating. I hope that's OK?

@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch 2 times, most recently from 49bf126 to bf5ce21 Compare November 12, 2024 16:46
@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch from 22ad278 to 6353aad Compare November 13, 2024 07:29
@t-lo t-lo changed the title docs: add proposal for Node Bootstrapping working group 📖 ✨ 🧑‍🤝‍🧑 add proposal for Node Bootstrapping working group Nov 13, 2024
@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch from 6353aad to f61f4ee Compare November 15, 2024 15:19
@t-lo
Copy link
Contributor Author

t-lo commented Nov 15, 2024

Thank you Johanan, Fabrizio, and Stefan for tuning in! This is immensely helpful.
I also love the fact that we're already iterating over design thoughts; this is exactly the momentum we were hoping for with the working group.

Made a few changes to the proposal; reworked the whole user story part to focus on goals instead of implementations, and rephrased the "problem statement" section a bit to not hint at a solution when describing the issue.

Added a new section on stability and compatibility - this really was the proverbial elephant in the room for me since in Flatcar, we put a massive (occasionally painful) focus on never breaking user workloads ever - kudos to Stefan for calling this out. I'll make sure to hold the working group proposals to an equally high standard.

I don't think we're quite there yet but we're definitely making progress. Ready for another round of feedback!

@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch from f61f4ee to dbc9ed4 Compare November 15, 2024 15:31
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 15, 2024
@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch from dbc9ed4 to 857dd3d Compare November 15, 2024 16:29
@t-lo
Copy link
Contributor Author

t-lo commented Nov 19, 2024

@sbueringer , @fabriziopandini what do you think? Could you give it another pass?

Copy link
Member

@johananl johananl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added a few comments.

docs/community/20241112-node-bootstrapping.md Outdated Show resolved Hide resolved
docs/community/20241112-node-bootstrapping.md Outdated Show resolved Hide resolved
docs/community/20241112-node-bootstrapping.md Outdated Show resolved Hide resolved
docs/community/20241112-node-bootstrapping.md Show resolved Hide resolved
docs/community/20241112-node-bootstrapping.md Outdated Show resolved Hide resolved
docs/community/20241112-node-bootstrapping.md Outdated Show resolved Hide resolved
@fabriziopandini
Copy link
Member

fabriziopandini commented Nov 21, 2024

I will try to come back to this after code freeze next week, I need to focus on stuff to get merged + CI signal and bandwidth is limited 😢

@chrischdi
Copy link
Member

Also showing up, I will need some more time to read myself into all of this but my focus now is first the upcoming CAPI release!

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general this makes sense to me, i think it would be nice to have a few more details in the proposal. i left a couple suggestions.

also, i'm just starting to read the comments here.

docs/community/20241112-node-bootstrapping.md Outdated Show resolved Hide resolved
docs/community/20241112-node-bootstrapping.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 9, 2024
@johananl
Copy link
Member

johananl commented Dec 9, 2024

@t-lo the CI doesn't like the link to the calendar file. I wonder if it's OK to ignore this test.

@rothgar
Copy link

rothgar commented Dec 9, 2024

I'm a bit confused by the problem statement in the document.

  1. Is this intended to only apply to configuration after the OS has booted and been installed? Cloud-init and ignition make a lot of assumptions about what's on the node and cannot be abstracted.
  2. Is this only intended to be used for kubeadm based provisioning? I know it mentions other options but the focus and ultimate implementation is based on kubeadm with others being optional
  3. Is this meant to re-configure nodes after they've been provisioned? From my understanding this is mostly designed to configure a node from a base, unconfigured state. Ignition and cloud-init don't re-configure or modify nodes and the only option is to destroy/replace

I'm trying to figure out if I should be involved at all (right now it seems like the answer is no) because Talos Linux doesn't use ignition, cloud-init, or kubeadm to configure (and re-configure) a node for kubernetes.

@johananl
Copy link
Member

johananl commented Dec 10, 2024

Hey @rothgar.

This working group is about improving the separation between bootstrap and provisioning. "Bootstrap" means "turning a server into a k8s node". "Provisioning" means "running cloud-init/Ignition to customize a server". "Server" in this context means a physical bare metal server or a VM.

We don't intend to alter the functionality of bootstrap or provisioning in this working group, we're concerned mainly with how we can maintain and evolve multiple provisioner implementations (e.g. cloud-init, Ignition) over time, which hopefully requires only under-the-hood changes with no user impact. This does entail touching CABPK (the kubeadm bootstrap provider) because right now bootstrap and provisioning are tightly coupled, which means we can't touch e.g. cloud-init code without touching kubeadm-related code.

We hope to get to an acceptable state in CABPK which could serve as a "template" for other bootstrap providers, however there's still a way to go before we get there. We want to avoid making API changes to the extent that's possible, but of course if we do end up proposing API changes then this would affect all relevant providers, including Talos.

I suggest you keep an eye on the products of this working group to determine if/when you need to be involved (we will provide updates in the CAPI office hours so you can watch the notes/recordings), however I think that at the moment this work is irrelevant for anything which doesn't use kubeadm, cloud-init or Ignition. We'll be sure to broadcast any significant changes loudly to the community in case we realize they're inevitable.

@t-lo
Copy link
Contributor Author

t-lo commented Dec 11, 2024

I'd like to keep the iCal link - it's very convenient to be able to import the meeting series. Here's the link, it works: https://calendar.google.com/calendar/ical/90d22cde4972f248d6516a96de05ef62553644fff261e2150f5f229546d59d41%40group.calendar.google.com/public/basic.ics Tested on a number of machines. If we could figure out what makes this test fail then that would be awesome...

@rothgar If you like to know more, feel free to join our WG office hours Thursday next week (Dec 19), at 5pm UTC - happy to chat and to give an intro to the whole thing.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 12, 2024
@k8s-ci-robot k8s-ci-robot requested a review from schrej December 12, 2024 09:00
@t-lo
Copy link
Contributor Author

t-lo commented Dec 13, 2024

/ok-to-test

@t-lo t-lo requested a review from chrischdi December 17, 2024 15:46
@johananl
Copy link
Member

/retest

@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch from 77ca5e0 to 536ecda Compare December 18, 2024 18:16
@t-lo
Copy link
Contributor Author

t-lo commented Dec 18, 2024

/retest

Co-authored-by: Johanan Liebermann <j@liebermann.io>
Co-authored-by: Jakob Schrettenbrunner <dev@schrej.net>
Signed-off-by: Thilo Fromm <thilofromm@microsoft.com>
@t-lo t-lo force-pushed the t-lo/propose-wg-node-provisioning branch from 536ecda to ba7442a Compare December 18, 2024 18:25
@schrej
Copy link
Member

schrej commented Dec 18, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 18, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5aed488f3075793a52ddf47fa7eda64d36334ac0

@t-lo
Copy link
Contributor Author

t-lo commented Jan 17, 2025

@fabriziopandini @elmiko I think this can be merged now? The WG is well established, up and running. First stabs at implementations are in progress, and we also started related contributions to development docs (Johanan is working on a PR to dev docs on using KubeVirt as a dev environment).

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think so too @t-lo . thanks for the work here =)

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: elmiko
Once this PR has been reviewed and has the lgtm label, please assign sbueringer for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@johananl
Copy link
Member

/assign sbueringer

@sbueringer
Copy link
Member

/lgtm

Would be great if other maintainers can also take a (final) look

/assign @chrischdi @enxebre @fabriziopandini @vincepri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/community-meeting Issues or PRs that should potentially be discussed in a Kubernetes community meeting. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.