Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split controller and apiserver start #14775

Merged
merged 1 commit into from
Jun 27, 2017

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Jun 20, 2017

This builds on @mfojtik's refactoring pull and separates the construction of the controller part of the process from the apiserver part of the process. More will be required, but this gets the informers under control and moves us in the correct direction.

@liggitt

@deads2k
Copy link
Contributor Author

deads2k commented Jun 20, 2017

oh joy, conflict

@deads2k
Copy link
Contributor Author

deads2k commented Jun 20, 2017

[test]

@deads2k
Copy link
Contributor Author

deads2k commented Jun 20, 2017

@stevekuznetsov how hard would it be to have verify failures fail the job, but allow the unit tests to run anyway?

@stevekuznetsov
Copy link
Contributor

Right now the job runs two separate shell scripts for the verify and test steps, but if we squashed them together into one stage we could probably Bash our way around it.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2017
@deads2k deads2k force-pushed the start-01-controllers branch 2 times, most recently from a602e60 to db3e978 Compare June 21, 2017 12:06
@deads2k
Copy link
Contributor Author

deads2k commented Jun 21, 2017

re[test]

// TODO once the cloudProvider moves, move the configs out of here to where they need to be constructed
persistentVolumeController := PersistentVolumeControllerConfig{
RecyclerImage: c.RecyclerImage,
// TODO: In 3.7 this is renamed to 'Cloud' and is part of kubernetes ControllerContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess we can remove this todos as you have todo in func godoc (which btw. should be in godoc format ;-)

ClientEnvVars: vars,
}
ret.DeploymentConfigControllerConfig = origincontrollers.DeploymentConfigControllerConfig{
Codec: annotationCodec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we lost todo about moving codec to controller context? (maybe i never added that todo ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we lost todo about moving codec to controller context? (maybe i never added that todo ;-)

Having now gone through it, we don't want it generically available. It turns out that it is being used improperly and will cause us versioning pain. We should strive to eliminate it instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine with that

}
ret.ImageImportControllerOptions = origincontrollers.ImageImportControllerOptions{
MaxScheduledImageImportsPerMinute: options.ImagePolicyConfig.MaxScheduledImageImportsPerMinute,
ResyncPeriod: 10 * time.Minute,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we want to make the resyncPeriods configurable or we can just hardcode them in controllers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we want to make the resyncPeriods configurable or we can just hardcode them in controllers

Ultimately, they should be configurable. This is a move of existing hardcodedness.

HasStatefulSetsEnabled: options.DisabledFeatures.Has("triggers.image.openshift.io/statefulsets"),
HasCronJobsEnabled: options.DisabledFeatures.Has("triggers.image.openshift.io/cronjobs"),
}
ret.ImageImportControllerOptions = origincontrollers.ImageImportControllerOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ImageImportControllerOptions/ImageImportControllerConfig/ for consistency

}

ret.OriginToRBACSyncControllerConfig = origincontrollers.OriginToRBACSyncControllerConfig{
PrivilegedRBACClient: kubeInternal.Rbac(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment about why this need privileged?

//c.Options.ProjectConfig.SecurityAllocator
SecurityAllocator *configapi.SecurityAllocator

//c.RESTOptionsGetter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuke this and above (i was doing the same when I was moving these ;-)

@mfojtik
Copy link
Contributor

mfojtik commented Jun 21, 2017

LGTM (with some nits)

also great stuff!

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 21, 2017
@liggitt liggitt added this to the 3.6.0 milestone Jun 21, 2017
@deads2k deads2k force-pushed the start-01-controllers branch 4 times, most recently from 05b129a to 9e38e36 Compare June 21, 2017 19:40
@deads2k
Copy link
Contributor Author

deads2k commented Jun 21, 2017

re[test]

2 similar comments
@deads2k
Copy link
Contributor Author

deads2k commented Jun 21, 2017

re[test]

@deads2k
Copy link
Contributor Author

deads2k commented Jun 22, 2017

re[test]

@deads2k
Copy link
Contributor Author

deads2k commented Jun 22, 2017

Comments were minor and addressed. I plan to merge on green.

@deads2k
Copy link
Contributor Author

deads2k commented Jun 22, 2017

@openshift/networking Can I get some help looking at the networking test here: https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_networking_minimal/3543 ? I'm assuming my problem is Reason:KubeletNotReady Message:runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized, but I don't know where to go from there.

@deads2k
Copy link
Contributor Author

deads2k commented Jun 22, 2017

[merge]

@deads2k
Copy link
Contributor Author

deads2k commented Jun 22, 2017

severity:blocker

removed tag at request of eparis

@dcbw
Copy link
Contributor

dcbw commented Jun 24, 2017

@openshift/networking Can I get some help looking at the networking test here: https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_networking_minimal/3543 ? I'm assuming my problem is Reason:KubeletNotReady Message:runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized, but I don't know where to go from there.

It means that openshift-sdn hasn't been able to get the ClusterNetwork object from the apiserver, or that it hasn't been able to read its own HostNetwork allocation from the master. In both these cases, the plugin cannot initialized, which consists of writing a config file to /etc/cni/net.d which then kubelet looks for in a loop and when it finds one, magically updates node network readiness.

You should see log messages about the SDN reading the cluster network. But I'm not sure how to get node messages out of AWS. It should be fairly easy to reproduce locally though; with your branch, can you just run:

hack/dind-cluster.sh start

and then if it fails the same way the extended test does, do:

docker exec -it openshift-node-2 bash

and then once that's dropped you into the node:

journalctl -b -u openshift-node > /tmp/node.log
exit
docker cp openshift-node-2:/tmp/node.log /tmp/node.log

and then somehow get node.log to one of us.

@danwinship
Copy link
Contributor

But I'm not sure how to get node messages out of AWS

Click on "S3 Artifacts" on the side of the test run, scroll down to near the bottom, click "scripts/networking-minimal/logs/multitenant/nettest-node-1/systemd.log.gz".

The nodes are failing with:

node.go:325] error: SDN node startup failed: failed to get subnet for this host: nettest-node-1, error: timed out waiting for the condition

which means the master isn't correctly running the SDN controller. Actually, the master logs don't seem to show the openshift master running at all... the only "openshift-master"-related thing I see is:

Jun 22 12:53:06 nettest-master systemd-journald[18]: Suppressed 3069 messages from /system.slice/openshift-master.service

The logs go another 2 minutes after that but have nothing from the openshift master. That seems really weird. If it had crashed, systemd should log something, so I guess it didn't crash, but maybe it deadlocked or something?

I'm going to try running this PR locally...

@danwinship
Copy link
Contributor

oh, nm, apparently already fixed

@deads2k
Copy link
Contributor Author

deads2k commented Jun 26, 2017

re[merge]

@deads2k
Copy link
Contributor Author

deads2k commented Jun 26, 2017

re[test]

3 similar comments
@deads2k
Copy link
Contributor Author

deads2k commented Jun 26, 2017

re[test]

@deads2k
Copy link
Contributor Author

deads2k commented Jun 26, 2017

re[test]

@deads2k
Copy link
Contributor Author

deads2k commented Jun 26, 2017

re[test]

@smarterclayton
Copy link
Contributor

Needs rebase looks like

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2017
@openshift-bot
Copy link
Contributor

Evaluated for origin test up to f80e65c

@openshift-bot
Copy link
Contributor

openshift-bot commented Jun 27, 2017

continuous-integration/openshift-jenkins/merge Waiting: You are in the build queue at position: 12

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to f80e65c

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2690/) (Base Commit: bbb9647) (PR Branch Commit: f80e65c)

@smarterclayton smarterclayton merged commit ede15e3 into openshift:master Jun 27, 2017
@smarterclayton
Copy link
Contributor

Flake, merged at head

@deads2k deads2k deleted the start-01-controllers branch August 3, 2017 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants