Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeout set freeze to fix exec with update make set freeze step o… #2767

Closed
wants to merge 1 commit into from

Conversation

gaopeiliang
Copy link

@gaopeiliang gaopeiliang commented Jan 26, 2021

sometimes we found runc init and container process step on Disk Sleep status when use runc-rc92 version

we found runc exec run with runc update concurrence,will make cgroup freezing incomplete, we should retry set freezer status or cancel set not block forever as kernel freezer-subsystem.txt said

even though kernel freezing status will not step on more time , but we found it forever on Ubuntu 16.04, kernel 4.19

when unix execv(2) replace self and write cgroup freezen make cgroup step on freezing forever ,, maybe kernel something ,, but we should around it , not stop the world .....

@kolyshkin
Copy link
Contributor

Thank you for your contribution @gaopeiliang. Can you please describe the issue you're trying to fix in more details?

@kolyshkin
Copy link
Contributor

Can you please describe the issue you're trying to fix in more details?

Ideally, a code / test case that demonstrates it.

@thaJeztah
Copy link
Member

related to / addresses #2753

@kolyshkin
Copy link
Contributor

we should retry set freezer status or cancel set not block forever as kernel freezer-subsystem.txt said

Here's the relevant portion:

It's important to note that freezing can be incomplete. In that case we return EBUSY. This means that some tasks in the cgroup are busy doing something that prevents us from completely freezing the cgroup at this time. After EBUSY, the cgroup will remain partially frozen -- reflected by freezer.state reporting "FREEZING" when read. The state will remain "FREEZING" until one of these things happens:
1) Userspace cancels the freezing operation by writing "THAWED" to
	the freezer.state file
2) Userspace retries the freezing operation by writing "FROZEN" to
	the freezer.state file (writing "FREEZING" is not legal
	and returns EINVAL)
3) The tasks that blocked the cgroup from entering the "FROZEN"
	state disappear from the cgroup's set of tasks.

@gaopeiliang gaopeiliang force-pushed the master branch 3 times, most recently from 0e683fb to 7fb93bf Compare January 29, 2021 03:08
@kolyshkin
Copy link
Contributor

@gaopeiliang I'm still working on that stuff, no need to replace your implementation with mine. Once ready I'll open an PR.

Signed-off-by: gaopeiliang <964911957@qq.com>
@gaopeiliang
Copy link
Author

@gaopeiliang I'm still working on that stuff, no need to replace your implementation with mine. Once ready I'll open an PR.

OK ,, maybe it will come soon ,,, we have about 2k node in risk status ......

@gaopeiliang
Copy link
Author

#2774

@gaopeiliang gaopeiliang closed this Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants