-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split controller and apiserver start #14775
Conversation
oh joy, conflict |
4b01eba
to
2221823
Compare
[test] |
@stevekuznetsov how hard would it be to have verify failures fail the job, but allow the unit tests to run anyway? |
Right now the job runs two separate shell scripts for the verify and test steps, but if we squashed them together into one stage we could probably Bash our way around it. |
a602e60
to
db3e978
Compare
re[test] |
// TODO once the cloudProvider moves, move the configs out of here to where they need to be constructed | ||
persistentVolumeController := PersistentVolumeControllerConfig{ | ||
RecyclerImage: c.RecyclerImage, | ||
// TODO: In 3.7 this is renamed to 'Cloud' and is part of kubernetes ControllerContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess we can remove this todos as you have todo in func godoc (which btw. should be in godoc format ;-)
ClientEnvVars: vars, | ||
} | ||
ret.DeploymentConfigControllerConfig = origincontrollers.DeploymentConfigControllerConfig{ | ||
Codec: annotationCodec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we lost todo about moving codec to controller context? (maybe i never added that todo ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we lost todo about moving codec to controller context? (maybe i never added that todo ;-)
Having now gone through it, we don't want it generically available. It turns out that it is being used improperly and will cause us versioning pain. We should strive to eliminate it instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine with that
} | ||
ret.ImageImportControllerOptions = origincontrollers.ImageImportControllerOptions{ | ||
MaxScheduledImageImportsPerMinute: options.ImagePolicyConfig.MaxScheduledImageImportsPerMinute, | ||
ResyncPeriod: 10 * time.Minute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder if we want to make the resyncPeriods configurable or we can just hardcode them in controllers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder if we want to make the resyncPeriods configurable or we can just hardcode them in controllers
Ultimately, they should be configurable. This is a move of existing hardcodedness.
HasStatefulSetsEnabled: options.DisabledFeatures.Has("triggers.image.openshift.io/statefulsets"), | ||
HasCronJobsEnabled: options.DisabledFeatures.Has("triggers.image.openshift.io/cronjobs"), | ||
} | ||
ret.ImageImportControllerOptions = origincontrollers.ImageImportControllerOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ImageImportControllerOptions/ImageImportControllerConfig/ for consistency
} | ||
|
||
ret.OriginToRBACSyncControllerConfig = origincontrollers.OriginToRBACSyncControllerConfig{ | ||
PrivilegedRBACClient: kubeInternal.Rbac(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment about why this need privileged?
//c.Options.ProjectConfig.SecurityAllocator | ||
SecurityAllocator *configapi.SecurityAllocator | ||
|
||
//c.RESTOptionsGetter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nuke this and above (i was doing the same when I was moving these ;-)
LGTM (with some nits) also great stuff! |
05b129a
to
9e38e36
Compare
re[test] |
2 similar comments
re[test] |
re[test] |
Comments were minor and addressed. I plan to merge on green. |
@openshift/networking Can I get some help looking at the networking test here: https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_networking_minimal/3543 ? I'm assuming my problem is |
9e38e36
to
6d4f687
Compare
[merge] |
severity:blocker removed tag at request of eparis |
It means that openshift-sdn hasn't been able to get the ClusterNetwork object from the apiserver, or that it hasn't been able to read its own HostNetwork allocation from the master. In both these cases, the plugin cannot initialized, which consists of writing a config file to /etc/cni/net.d which then kubelet looks for in a loop and when it finds one, magically updates node network readiness. You should see log messages about the SDN reading the cluster network. But I'm not sure how to get node messages out of AWS. It should be fairly easy to reproduce locally though; with your branch, can you just run: hack/dind-cluster.sh start and then if it fails the same way the extended test does, do: docker exec -it openshift-node-2 bash and then once that's dropped you into the node: journalctl -b -u openshift-node > /tmp/node.log and then somehow get node.log to one of us. |
Click on "S3 Artifacts" on the side of the test run, scroll down to near the bottom, click "scripts/networking-minimal/logs/multitenant/nettest-node-1/systemd.log.gz". The nodes are failing with:
which means the master isn't correctly running the SDN controller. Actually, the master logs don't seem to show the openshift master running at all... the only "openshift-master"-related thing I see is:
The logs go another 2 minutes after that but have nothing from the openshift master. That seems really weird. If it had crashed, systemd should log something, so I guess it didn't crash, but maybe it deadlocked or something? I'm going to try running this PR locally... |
oh, nm, apparently already fixed |
re[merge] |
re[test] |
3 similar comments
re[test] |
re[test] |
re[test] |
Needs rebase looks like |
6d4f687
to
f80e65c
Compare
Evaluated for origin test up to f80e65c |
continuous-integration/openshift-jenkins/merge Waiting: You are in the build queue at position: 12 |
Evaluated for origin merge up to f80e65c |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2690/) (Base Commit: bbb9647) (PR Branch Commit: f80e65c) |
Flake, merged at head |
This builds on @mfojtik's refactoring pull and separates the construction of the controller part of the process from the apiserver part of the process. More will be required, but this gets the informers under control and moves us in the correct direction.
@liggitt