-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial provider implementation for nerdctl (+finch) #3429
Conversation
Welcome @estesp! |
Hi @estesp. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
||
// ensure the pre-requisite network exists | ||
networkName := fixedNetworkName | ||
if n := os.Getenv("KIND_EXPERIMENTAL_DOCKER_NETWORK"); n != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this should be still called "DOCKER"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that we not support this in the nerdctl provider, I consider it to have been a mistake in the docker provider that has caused no end of headaches (e.g. users trying to run in host network (!))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good; happy to remove this piece
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in latest commit
// note: requires API v1.41+ from Dec 2020 in Docker 20.10.0 | ||
// this is the default with cgroups v2 but not with cgroups v1, unless | ||
// overridden in the daemon --default-cgroupns-mode | ||
// https://github.com/docker/cli/pull/3699#issuecomment-1191675788 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't make a sense for nerdctl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in latest commit
pkg/internal/runtime/runtime.go
Outdated
@@ -18,6 +18,8 @@ func GetDefault(logger log.Logger) cluster.ProviderOption { | |||
case "docker": | |||
logger.Warn("using docker due to KIND_EXPERIMENTAL_PROVIDER") | |||
return cluster.ProviderWithDocker() | |||
case "nerdctl", "finch": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a CI?
ref:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I saw that PR.. should definitely collaborate on getting this into CI. Wanted to make sure there weren't any glaring misses here before getting CI set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is finch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is finch?
Amazon's distribution of nerdctl wrapped in Lima
https://aws.amazon.com/jp/blogs/opensource/ready-for-flight-announcing-finch-1-0-ga/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there compatibility issues such that we need to expand the test matrix that far as both upstream nerdctl and the finch distro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be no compat issues; Finch is a distribution that includes nerdctl. The only minor issue could be point-in-time discrepancies where nerdctl has released a version, but Finch has no release (yet) with the new version of nerdctl.
) | ||
|
||
// NewProvider returns a new provider based on executing `nerdctl ...` | ||
func NewProvider(logger log.Logger, binaryName string) providers.Provider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if the nerdctl
package can be merged into the docker
package. and the docker
package should have something like NewProviderWithVariant()
that takes a custom Docker-like binary path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are close; I also see some of the changes I had to make to a few capabilities around inspect
in this PR are fixed (at least one around Names
JSON object in container inspect) in recent PRs to nerdctl. There are also inconsistencies in a few other Go template responses that I worked around with index
; probably could be fixed in nerdctl as it is essentially Docker compatibility mismatches at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those inconsistencies should have been fixed in nerdctl v1.7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice; so the options are 1) try and use the docker provider with "binary name customization" and require nerdctl 1.7.0 or greater, or 2) create this provider and assume that over time it could be deprecated when any/all differences are resolved. Interested to hear from the kind maintainers what they would prefer. I think both are potentially reasonable, but obviously there is code duplication in creating yet-another-provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I appreciate that nerdctl considers incompatibilities to be bugs and that there should be minimal skew, eventually we're bound to run into another incompatibility when developing the docker provider and need to work around these differences until a nerdctl update is released (and we don't know how long users will take to upgrade).
It's not a big deal to request that incoming changes be ported between essentially two copies of the docker provider, and in this PR we already have an example of something that should only be tested for docker and not nerdctl, #3429 (comment)
We already need to ensure that any new features target podman as well, often the code is nearly the same but not quite.
The node providers are intended to be very small shims that isolate most of the rest of the logic from which runtime we're using for nodes. We can afford another copy and it will give us more room to handle quirks even if nerdctl patches them in later releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree; that makes sense to me.
released: https://github.com/containerd/nerdctl/releases/tag/v1.7.0 |
Oops! Well, you can tell how well I'm paying attention! I'm testing using Finch 1.0 which hasn't updated to 1.7.0 yet. |
// filter for nodes with the cluster label | ||
"--filter", fmt.Sprintf("label=%s=%s", clusterLabelKey, cluster), | ||
// format to include the cluster name | ||
"--format", `{{range .Names}}{{println .}}{{end}}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @estesp
It's awsome PR :-)
The nerdctl ps -a --filter label=io.x-k8s.kind.cluster=kind --format '{{range .Names}}{{println .}}{{end}}'
can not work in nerdctl , but nerdctl ps -a --filter label=io.x-k8s.kind.cluster=kind --format '{{.Names}}'
same with docker can work correctly in v1.7.0 :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to match behavior from nerdctl 1.7.0 and above
HI @kubernetes-sigs/kind-maintainers , @estesp would you please review the PR , and give some advice? |
Sorry, I'm back from conference / post-conference-cold / holidays but I'm still going to have to come back to this later: https://kubernetes.slack.com/archives/C2C40FMNF/p1701112863247509 At a high level: +1 to doing this and we've had some other conversations and PRs related to CI. |
/ok-to-test |
I see Akihiro, already got there https://github.com/kubernetes-sigs/kind/pull/3429/files#r1397788447 |
"sigs.k8s.io/kind/pkg/internal/integration" | ||
) | ||
|
||
func TestIntegrationEnsureNetworkConcurrent(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails on the CI because the job pod does not contain the nerdctl binary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need this test for nerdctl and eventually even docker won't need it as the docker bug/quirk we were working around was fixed recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the test in latest commit
Is the user action to |
+1 on this |
I'm thinking for simplicity to make nerdctl 1.7.0 the lowest supported version; given there has been no support until now, that seems reasonable to me, but curious if anyone has strong opinions. I also would like input on whether to simply allow the provider to fail if the user has an older version, or do version checks during provider initialization? |
I really don't understand why the container is exiting with error; although ending with At least for now, this is mitigated in the latest pushed code in the PR and several create, delete, create, delete attempts properly kill and remove nodes and allow for clusters to be created and deleted properly. |
/retest |
They're already approved? Sorry, last week was enhancements freeze so pretty much all my time went to PRR reviews https://www.kubernetes.dev/resources/release/#timeline aojea is OOO currently. I'm going to be working on a bug fix release related to #3510 and some other work but aim to come back to this.
SGTM
The current behavior is to fallback to docker if we don't detect another working provider unless the user explicitly sets Detecting providers is supposed to be fast and cheap because it happens on every invocation unless explicitly set, we're just looking for binaries in path. In the podman provider we version check during
I think we should keep that pattern for now.
sounds OK for now, long term would love to get reboot working on all the providers but it's been thorny for podman as well and not perfect under docker either. re: image-loading, we've had an open issue to outline a better command that is multi-provider, would love some more input there. the intention is to leave |
Oh cool! when I first commented I had made another small fix and re-pushed and they were stuck waiting on approval--looks like someone did that and they ran successfully, so yay!
No worries at all; no rush on this. Hopefully it is in good enough shape for review and has basic CI now for nerdctl. |
@@ -18,6 +18,8 @@ func GetDefault(logger log.Logger) cluster.ProviderOption { | |||
case "docker": | |||
logger.Warn("using docker due to KIND_EXPERIMENTAL_PROVIDER") | |||
return cluster.ProviderWithDocker() | |||
case "nerdctl", "finch", "nerdctl.lima": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Warn("using nerdctl due to KIND_EXPERIMENTAL_PROVIDER")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch; repushed. Thanks!
one nit, missing a log on the provider detection, the rest looks ok, the code is isolated and is almost the same with the |
@aojea fixed the missing logger; should be ready to go then! |
@estesp: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Adds implementation for a provider based on nerdctl. Several todos in the code but the core functionality of creating/deleting clusters is working and a simple application deployed works properly Signed-off-by: Phil Estes <estesp@gmail.com>
Signed-off-by: Kay Yan <kay.yan@daocloud.io> Signed-off-by: Phil Estes <estesp@gmail.com>
/lgtm /hold just if @BenTheElder wants to check something, we can unhold tomorrow if no answer |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, estesp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@BenTheElder any thoughts/concerns? |
/hold cancel lazy consensus :) |
great to see this @estesp! |
Thank you all! |
Thanks @estesp 🎉🎉🎉 |
Adds implementation for a provider based on nerdctl. Several todos in the code but the core functionality of creating/deleting clusters is working and a simple application deployed works properly.
Fixes: #2317
I don't like that it's a bit messy to pass around
binaryName
to support finch alongside nerdctl. Clearly users could alias finch -> nerdctl and this code could be simplified but I would love to find a clean way to support any nerdctl wrapper/implementation if possible without requiring user action.(Updated the following on 14 Dec 2023)
Some TODOs in the code around IPv6 support (coming in nerdctl 1.7.0 but isn't released yet, and would be good to support nerdctl < 1.7.0 as well) and restart policy (supported, but containers are exiting with 137 withkind delete cluster
so the one-time restart is actually happening, causing a recreate of the same cluster to fail without user action.This PR now relies on nerdctl 1.7.0 or above, and/or finch 1.0.1 or above. That solves the lack of IPv6 support, although I have not tested IPv6 yet. The delete cluster flow was updated to use stop/wait/rm and works properly now, and leaves the restart policy on create matching the Docker provider.
Update on 7 Feb 2024
Now added the initial CI testing matrix for
nerdctl
from #3408 ; carried that PR's commit and modified it with recent versions and not aliasing todocker
and instead using the implementation from this PR with experimental provider set.