Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add KEP 2420: Reducing Kubernetes Build Maintenance #2421

Merged
merged 10 commits into from
Feb 4, 2021

Conversation

BenTheElder
Copy link
Member

see: #2420

previous discussion: kubernetes/kubernetes#88553

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 4, 2021
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Feb 4, 2021
@BenTheElder
Copy link
Member Author

/cc @spiffxp @dims @liggitt @justaugustus
(provisional requested reviewers + approvers from testing + release)

@BenTheElder BenTheElder changed the title add KEP 2420: Reducing Build Maintenance add KEP 2420: Reducing Kubernetes Build Maintenance Feb 4, 2021
@justaugustus
Copy link
Member

/assign @justaugustus
/sig release
/area release-eng
FYI @kubernetes/release-engineering

@k8s-ci-robot k8s-ci-robot added sig/release Categorizes an issue or PR as relevant to SIG Release. area/release-eng Issues or PRs related to the Release Engineering subproject labels Feb 4, 2021

## Production Readiness Review Questionnaire

**TODO**: This entire section seems completely irrelevant for KEPs that do not
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is TBD, I'm not sure what the policy is here, but I don't think PRR is actually relevant to a KEP like this with literally no change to release artifacts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussing KEP types here: #2311
Drop some notes when you have a chance! :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another pass at these fields and the metadata in general tomorrow, and definitely take a look at #2311

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested edits to drop the text under each heading to "n/a" with explanation as to why, but we can address as followup

[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
-->

This will be declared stable/GA when:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the approach outlined here for GA!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt @dims, I think this implies then that stable is 1.23 release / 1.24 milestone (when we phase out 1.20 from support). not sure what to call "beta".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very little of the standard KEP template applies to this proposal. I'd probably go more in terms of phase 1, phase 2, done, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"beta" would be a bazel-less stable branch? :)

- A few other issues related to CGO / $CC on non-amd64 build hosts have already received pull requests / fixes.
- Remaining issues largely appear trivial, and should be fixed regardless, especially with the proliferation of non-amd64 developer hardware, where contributors will want to build releases matching the official release (with make).

- Some contributors may be used to the Bazel CLI, however indications are that this is not true for the majority of contributors. Existing discussions on the matter of removing Bazel from the Kubernetes project have received overwhelming support, most (but certainly not all) of the few suggestions against it do not appear to be from [community members] / active upstream contributors. See: [kubernetes/kubernetes#88533]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. we are not going to take the long term pain/maintenance just to keep things going.

information to express the idea and why it was not acceptable.
-->

1. Continue maintaining both build systems
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-starter! we tried it, it does not work, we cant keep it going, it's too costly. time to rip the bandaid off!

- many of the code generators are optimized to load the entire Go tree, then generate lots of code at once, which is somewhat incompatible with Bazel's approach to working on a package-by-package level. Generating code package-by-package is much slower (due to having to reparse the tree each time).
- separate "hack/tools" go module for linters etc: not only do linters not work well (because any source change busts the cache anyhow), but rules_go does not do multiple go modules well. Multiple go modules allows us to isolate development dependencies (like linters) from release binary dependencies, easing dependency management

2. Improve bazel integration and drop the make based build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-100000

latest-milestone: "v1.19"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need to fix these milestones @BenTheElder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put in something provisionally, but I think beta/GA are slightly TBD: #2421 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justaugustus can we delete the alpha/beta milestones or can they be listed as "n/a"?

@dims
Copy link
Member

dims commented Feb 4, 2021

@BenTheElder AWESOME!!! Yes please. my only nit (other than the unhelpful comments) is we should fix the milestones and i am happy to approve this for 1.21.

owning-sig: sig-testing
participating-sigs:
- sig-release
status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start with provisional and we can iterate while we figure out what to do with the non-applicable PRR bits

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@liggitt
Copy link
Member

liggitt commented Feb 4, 2021

lgtm as well

Copy link
Member

@spiffxp spiffxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approve, a few suggestions and questions

- Eliminate the friction of generating `BUILD` files from Kubernetes contributors using Go natively
- Simplify Golang upgrades for SIG Release
- Remove qualification gaps caused by duplicate-but-slightly-different binary builds
- Remove testing gaps caused by duplicate-but-slightly-different test-invocation methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider kubernetes/kubernetes#97921 the umbrella issue for this

- Bazel-related configuration/presets are removed from kubernetes/kubernetes CI jobs in kubernetes/test-infra


As bazel-built artifacts are not built or distributed as part of a kubernetes/kubernetes release, there is no deprecation window required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote the "no deprecation window" part, but in retrospect I want to be clear that this merits a visible announcement. It's not clear to me that "deprecation" or "action required" are a fit here.

Suggested change
As bazel-built artifacts are not built or distributed as part of a kubernetes/kubernetes release, there is no deprecation window required.
Removal of bazel-related source files from kubernetes/kubernetes will be announced visibly (e.g. in the CHANGELOG, notification to kubernetes-dev@). However, as bazel-built artifacts are not built or distributed as part of a kubernetes/kubernetes release, there is no deprecation window required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK addressing this in followup, sounds like @justaugustus has thoughts on improving how we would announce something like this (ref: kubernetes/community#5344)

- Removing Bazel from sub-projects other than the core repo
- Bazel comes with tradeoffs, each subproject can make its own decision in this regard. This KEP **only** covers the main Kubernetes repository, and nothing else.
- Improving the previously existing make build
- We should strongly consider improving the implementation and behavior of this build in the future, but this is largely orthogonal to whether we should consider maintaining two build systems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- We should strongly consider improving the implementation and behavior of this build in the future, but this is largely orthogonal to whether we should consider maintaining two build systems.
- We should strongly consider improving the implementation and behavior of this build in the future, but this is largely orthogonal to whether we should consider maintaining two build systems. An anticipated outcome of this KEP is increased bandwidth available to improve our single build system.

2. Improve bazel integration and drop the make based build

In addition to the points made in 1.) above as to why this is not particularly viable for some of our existing development patterns:
- We would need to improve support for CGO or eliminate CGO in Kubernetes, both of which would be somewhat expensive to develop
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other technical issue apart from cgo/rules_go integration (and the mentioned below consumed-by-go-get aspect)? It seems like the technical issues left to fix in rules_go/cgo integration are not insurmountable.

If these were to be tracked in an issue somewhere (and referred to here) it would make technical discussion easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other portions of the doc. CGO comments are completely unsupported by rules_go and have no intention of being supported because they don't meet bazel's concepts of "hermetic" (though C/C++ under bazel in general doesn't anyhow).

We depend on hacking around this in brittle ways currently:
https://github.com/kubernetes/kubernetes/blob/81d8ccfa8e641f7295af6dbfe5bbb3c157a10b55/hack/update-bazel.sh#L108-L113

bazel-contrib/bazel-gazelle#203 (comment)
https://github.com/bazelbuild/bazel-gazelle/blob/987c29003d1616c0b2499fa7f5ce0f3bc11bc9a3/language/go/fileinfo.go#L399

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not a technical issue to surmount: rules_go has outright rejected support. bazel-contrib/bazel-gazelle#556 (comment)

- CGO pkg-config directives do not work in `rules_go`, requiring brittle work-arounds
- Cross compiling with CGO under bazel is tricky, and despite @ixdy getting close at one point, never shipped in Kubernetes, let alone portably, or capable of shipping a full release.
- @ixdy suspects this would largely have to be reimplemented now
- This is unlikely to be popular with contributors
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem like a real problem? If contributors have technical issues with the Bazel workflow, they should be outlined here explicitly. If these are non-technical (eg. dislike for the sake of dislike), then these probably shouldn't be taken into account when making this decision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIG Testing serves to improve the workflow of the project's contributors.
SIG Release exists to ensure we provide high quality binary releases.

Reducing developer friction is absolutely a concern. Our developers are not happy with this system (see previous discussion).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but then these friction issues should IMO be explicitly stated within this KEP. It should obviously show that Bazel-only is a no-go.

Without knowing what the current pain points with Bazel are (and not these with two buildsystems in parallel!), it's impossible to discuss specifics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would re-frame as: this is less likely to grow the amount of potential contributor bandwidth available to improve the kubernetes/kubernetes build system. We have a larger pool of contributors today who have demonstrated experience updating our existing make build system vs. updating bazel's components and our usage of it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes/kubernetes#88553 contains links to specific technical issues we've had in the past with bazel, I don't think it's necessary to re-enumerate here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @spiffxp, that's the intent of this admittedly poorly written bullet, the "popularity" contest is surely not relevant, but the availability of experience and support within the project is / interest in doing so. I also think kubernetes/kubernetes#88553 has sufficient technical details and is referenced in the doc.

I will update this bullet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been replaced with a variant of @spiffxp's comment.

- Cross compiling with CGO under bazel is tricky, and despite @ixdy getting close at one point, never shipped in Kubernetes, let alone portably, or capable of shipping a full release.
- @ixdy suspects this would largely have to be reimplemented now
- This is unlikely to be popular with contributors
- Large portions of Kubernetes are intended for dual use internally and as exported go libraries ("staging") to be consumed by other projects, where we'd still need to support use without bazel (i.e. checked-in generated code etc.).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already a solved problem even with the current Bazel build? go building kuberntes/kubernetes is atotally different can of worms from go building re-exported libraries anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already a solved problem even with the current Bazel build? go building kuberntes/kubernetes is atotally different can of worms from go building re-exported libraries anyway.

This means that we don't get to gain from integrating deeply with bazel, because all of this code must already work without it. (so for example code generation must not be bazel style, it needs to be checked in / on disk).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure you could integrate deeply with Bazel here, by continuing to use code-generating rules within kubernetes/kubernetes, and then leveraging these same rules to export generated code to staging/ repositories when publishing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless we must ensure this code builds for all supported targets with just go, it is not feasible to have a world where we only qualify with bazel, unless the entire go / Kubernetes ecosystem adopts bazel, which is unrealistic.

Testing these with bazel has lead to surprises in the past, see the linked issue about moving away from it in kubernetes/kubernetes.

@justaugustus
Copy link
Member

This has been a long time coming and it's a welcome improvement to maintainer overhead.
Of course, there will be finer details to work out as we move towards transition/deprecation, let's get this in as provisional and iterate between SIG Testing and SIG Release on implementation details and communications plan.

For SIG Release:
/lgtm
/approve
cc: @kubernetes/sig-release-leads

/hold since my approves have superpowers here
(remove when ready)

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Feb 4, 2021
Copy link
Member

@spiffxp spiffxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm
as SIG Testing approver

@dims
Copy link
Member

dims commented Feb 4, 2021

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenTheElder, dims, justaugustus, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@justaugustus
Copy link
Member

And away we go to provisional! 🚀
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 4, 2021
@k8s-ci-robot k8s-ci-robot merged commit 4db97fd into kubernetes:master Feb 4, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Feb 4, 2021
@BenTheElder BenTheElder deleted the KEP-2420 branch February 4, 2021 23:29
@spiffxp spiffxp mentioned this pull request Feb 8, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants