Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quit all processes when "apiserver-boot run local" failed #255

Closed
wants to merge 3 commits into from

Conversation

interma
Copy link
Contributor

@interma interma commented Jul 20, 2018

Fix:
#253

Glad to open my first PR, thanks for your review.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 20, 2018
@ghost
Copy link

ghost commented Jul 20, 2018

I can't review this, but I'd suggest you do the following:

  • Create a new context.Context
  • Create a cancel function from it
  • Give all processes the cancel function and context
  • In the run functions create a new <-chan error, run cmd.Run in a new goroutine handing over the channel and sending the return value of that to the channel as soon as it exits
  • outside the goroutine select from context.Done() and the channel you created.
    • If your channel case is hit, cancel the context and return
    • If the context is canceled by another process exiting kill the process and return
  • make sure your context also cancels when the process receives an Interrupt or Kill signal (Ctrl+C in console for instance)
func cancelWhenSignaled(parent context.Context) context.Context {
	ctx, cancel := context.WithCancel(parent)

	go func() {
		signalChannel := make(chan os.Signal)
		signal.Notify(signalChannel, os.Interrupt, os.Kill)
		<-signalChannel
		cancel()
	}()

	return ctx
}

@interma
Copy link
Contributor Author

interma commented Jul 23, 2018

Thanks @damongant
I will update later.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 23, 2018
@interma
Copy link
Contributor Author

interma commented Jul 23, 2018

@damongant
I have refined quit logic as your advice (yes, it more makes sense). Did I catch your idea?

Thanks for your help.

log.Printf("Failed to run controller-manager %v", err)
stopCh <- struct{}{}
log.Printf("Failed to run %s, error: %v\n", cmdName, err)
stopCh <- err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can save this line and the else and just stopCh<-err - it'll be nil in the else case either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will update this.

cancel()
case <-ctx.Done():
// other commands quited
cmd.Process.Kill()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check if cmd.Process is nil here? Unfortunate timing with 2 processes exiting could produce a segfault here afaik, but I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, check nil is better, I will update them later.

@ghost
Copy link

ghost commented Jul 23, 2018

LGTM overall, didn't have a chance to test yet.

@interma
Copy link
Contributor Author

interma commented Jul 24, 2018

@pwittrock Could you help to review this?
Thanks.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2019
@ghost
Copy link

ghost commented Apr 25, 2019

I'd hate for this contribution to go to waste because of inactive maintainers.

wink wink nudge nudge

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2019
Copy link
Member

@yue9944882 yue9944882 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies for delayed response, i'm having trouble finding out active threads in the pool 🧐

@@ -139,3 +142,16 @@ func CheckInstall() {
strings.Join(missing, ","))
}
}

func CancelWhenSignaled(parent context.Context) context.Context {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@interma any plan to bump the thread?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-receiving should be a fine addition to this piece of code if that's what you mean, but the context is also used to support cancelling when one of the processes unexpectedly quits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-receiving should be a fine addition to this piece of code if that's what you mean

for now we're not catching signal in the code so there's no double-receiving. i mean the pull LGTM overall, but we should reuse the existing code from upstream as much as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've attempted to address this final code-review comment in #410

It'd be great to get this fixed in the next release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trouble is that importing SetupSignalHandler from k8s.io/apiserver causes a clash with the k8s.io/kube-openapi dependency.
So I copied the code across instead.

@interma
Copy link
Contributor Author

interma commented Apr 26, 2019

I remember apiserver-builder is deprecated last year, reloads now?
What's the purpose and roadmap now?
Thanks.

@yue9944882
Copy link
Member

I remember apiserver-builder is deprecated last year, reloads now?

for now, we don't see a better replacement tooling/SDK for AA server so we're still delivering it. actually there were supposed to be a replacement for AA-builder, we planed and named it apiserver-runtime. but out of some reason, it's delayed and we don't have a clear roadmap for it yet.

What's the purpose and roadmap now?

generally, we're aiming at converging this project w/ kubebuilder (which is now actively maintained by phil and his team) to share the same framework like controller-runtime etc. fwiw the point of this project is reusing the wheels from kube-builder as much as possible and extending features beyond CRD's capability. CRD's going more light-weight and better integrated w/ openapi-schema and AA's going more extensible for customization.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 25, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 24, 2019
@wallrj
Copy link
Contributor

wallrj commented Aug 31, 2019

@interma I forked your branch and addressed the remaining code review feedback in #410 which has now been merged, so this can now be closed. I hope you don't mind.

@interma
Copy link
Contributor Author

interma commented Sep 2, 2019

@wallrj Ok, no problem, thanks!

@interma interma closed this Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants