-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gate CNI support behind the cni
build tag
#1767
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dcermak, rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two cases to consider:
First of if the cni build tag is set it should keep behaving like it does today. Although I could also see a use case for dropping this awkward cni vs netavark detection logic and only keep reading the defaultNetworkBackend file and if the file does not exists (new users) assume empty == netavark. That would mean an update to 3.X to 5.X (with CNI) could break but I don't think that is a problem given that most 5.X build should not have cni enabled anyway.
Second if the cni build tag is not set then we have the network_backend key from the config, assuming it is set to netavark all is good and we can just use it, if it is set to cni we should return an error. However if it is empty then it reads the defaultNetworkBackend file and if the content is cni it is not clear to me that we should force an error here. This means the user (or distro) never explicitly configured anything and throwing a hard error because the users just kept updating from 3.X seems harsh to me. Assuming there are no CNI networks created and the user only ever used the default network then a update should not require manually intervention. A reboot would be enough to fix the old temporary CNI state to be cleanup up and stuff should work again. And there are also cases were people only run with --network host in which case this choice here shouldn't matter so breaking them will be unjustified (this is very common in the toolbox use case).
Lastly we need to fix the containers.conf man page to document these changes. Unfortunately I don't think we have the infra to properly build man pages based on build tags used. Thus the option is either to document both options or only document netavark only and then require distros that compile with cni support to patch the man pages. I would prefer the first option.
if val == string(types.CNI) { | ||
return types.CNI, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break backwards compatibility pretty hard. I haven't made up my mind yet what exactly should happen but this alone is broken because it will break even if CNI build tag is specified you will reject this old cni configuration which should not happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is why I have introduced the alternativeNetworkBackend
function:
https://github.com/containers/common/pull/1767/files#diff-d9ace9efdaad94d582a127750587831dff5b06bd443107ba64557bd243bbcd72R57-R99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will never end up in there because it now returns unknown network backend value "cni" ...
driectly below here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right, my bad!
e9dfdf7
to
3c3e841
Compare
@Luap99 PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, added some comments but I would like to have another look when I am back from the holiday in January.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @dcermak!
|
||
aardvarkBin, _ := conf.FindHelperBinary(aardvarkBinary, false) | ||
func defaultNetworkBackend(store storage.Store, conf *config.Config) (backend types.NetworkBackend, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this function would be a good candidate to have different build tagged versions - it would alleviate the need for the CNI supported const. The CNI unsupported version would just return netavark and do the update check/warn, while the supported version would go through the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this, or even one level above, the NetworkBackend() function, could be the diverging point for the buildtag. I think either of these might work better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a good idea, although I am not sure the additional duplication of code is worth it.
Either way is fine for me so I wouldn't block on this.
b0a482e
to
57b951f
Compare
@ashley-cui PTANL |
Open comment still applies, but I'm open to discussion on it. Would like @Luap99 to take another look after the holidays though. |
@Luap99 waiting on you. |
I try to take a look tomorrow, regardless this needs buildah/podman test PR first to ensure CI is passing there before merging here. I guess this would mostly be disabling all the CNI tests there. Or at least temporarily add the CNI build tag there to ensure it continues working and we can rip it out later (I don't want to force you to deal with all of this work). |
I can take a look at the podman/buildah portion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI you have to rebase as you have a conflict due #1780. Should be easy to solve as you only need to drop the old go build tags.
@ashley-cui Can you open the podman and buildah test PRs. I don't think there should be any problems with buildah and we can just build test without the cni tag there but in podman will need some test changes as well. The open question is do we want to continue to test CNI in podman CI? |
@dcermak @Luap99 @ashley-cui Where do we stand on this one? |
Getting back onto the podman side of things after last week - will get back to this tomorrow. |
41dde62
to
8ce464a
Compare
} | ||
return "", fmt.Errorf("unknown network backend value %q in %q", val, file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is tripping our podman tests.
This test is running netavark with the netavark only build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help in debugging: the failure is in f39 rootless and only in two of the e2e tests: add ssh socket path and kube play using a user namespace, and both failures look like this:
→ Enter [It] (test name)
Error: Error: failed to get default network backend: unknown network backend value "" in "[share]/containers/storage/defaultNetworkBackend"
...
No future change is possible. Bailing out early after 0.741s.
I have no idea what "No future change is possible" means, it seems to be coming from ginkgo itself. All other e2e tests pass (1985 of them). I have no idea what is special about those two tests. I'm just posting direct links in hopes that someone else can understand the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edsantiago The test run that @ashley-cui linked includes a lot more test failures. Could it be that there is some leftover polluted cache maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that it says unknown network backend value ""
seems to suggest there is a conditioning in which it writes (or reads) an empty file which seems very bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcermak @edsantiago FWIW I changed the test pr to always build CNI, and it looks like all the rootless runs are failing. I think this file should only be created by this network backend code? I can try to help dig a little failure but @Luap99 's diagnosis seems correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashley-cui could you please rebase the test PRs? I've fixed the issue that @Luap99 found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased!
libnetwork/network/interface.go
Outdated
// cache the network backend to make sure always the same one will be used | ||
defer writeBackendToFile(backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is that this backend arg is evaluated here right away and not in the defer context as last step so backend ends up being empty all the time.
Given you restructured the function anyway you can just move this call directly before the return and not call it with defer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a silly mistake, thank you for spotting it!
I have removed the defer
and adjusted the code. Now with the defer
gone, we could actually properly treat the error if the defaultNetworkBackend
file cannot be written. Although I am honestly unsure whether we should not just continue the current approach and just log it. Worst case: the autodetection will run again next time.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging is fine
8ce464a
to
33c65f4
Compare
Looks like the Podman CI stuff passed - the other failures are unrelated to this PR, but need some fixing. We'll have to wait for those to be fixed to merge this. Thanks for your patience with this PR! LGTM |
linter is complaining |
33c65f4
to
a261c1c
Compare
@dcermak Thanks so much for your patience! Could you rebase once more, and I'll kick off the podman tests one more time as we just fixed out podman CI. Then, this should be good to go! |
- add cni build tag to libnetwork/cni - split libnetwork/network into multiple files so that cni support can be made optionally available - add -cni build targets to Makefile and build for amd64 with and without cni - add a simple upgrade mechanism if the user never set the network backend explicitly - add cni build tag to .golangci.yml to prevent false positives See also https://issues.redhat.com/browse/RUN-1943 Signed-off-by: Dan Čermák <dcermak@suse.com>
a261c1c
to
609a9be
Compare
Done |
containers/podman#21179 is green. LGTM from my side, and ready to merge. |
/lgtm |
cni
build tag tolibnetwork/cni
libnetwork/network
into multiple files so that cni support can be made optionally availableSee also https://issues.redhat.com/browse/RUN-1943