Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix runc's poststart behaviour doesn't match the runtime-spec #4348

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ningmingxiao
Copy link

@ningmingxiao ningmingxiao commented Jul 15, 2024

fix:#4347

@ningmingxiao ningmingxiao force-pushed the fix_poststart branch 2 times, most recently from 6ffc50a to 307c2b6 Compare July 15, 2024 21:50
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code should be moved to container.Exec (keep it inside libcontainer).

@kolyshkin
Copy link
Contributor

Please write a more detailed commit message. "fix poststart error" is not a good one.

@kolyshkin kolyshkin marked this pull request as draft July 16, 2024 01:48
@ningmingxiao ningmingxiao changed the title draft: fix poststart error draft: fix runc's poststart behaviour doesn't match the runtime-spec Jul 16, 2024
@ningmingxiao
Copy link
Author

ningmingxiao commented Jul 16, 2024

Please write a more detailed commit message. "fix poststart error" is not a good one.

done.thank you. please review my pr. @kolyshkin @cyphar

@ningmingxiao ningmingxiao force-pushed the fix_poststart branch 2 times, most recently from c6c42bc to b766494 Compare July 16, 2024 06:57
@ningmingxiao ningmingxiao changed the title draft: fix runc's poststart behaviour doesn't match the runtime-spec fix runc's poststart behaviour doesn't match the runtime-spec Jul 16, 2024
libcontainer/container_linux.go Outdated Show resolved Hide resolved
libcontainer/container_linux.go Outdated Show resolved Hide resolved
libcontainer/container_linux.go Outdated Show resolved Hide resolved
@lifubang lifubang added this to the 1.1.14 milestone Jul 16, 2024
@lifubang lifubang added the backport/1.1-todo A PR in main branch which needs to be backported to release-1.1 label Jul 16, 2024
@lifubang lifubang modified the milestones: 1.1.14, 1.2.0 Jul 16, 2024
Copy link
Member

@lifubang lifubang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test case to avoid the similar error in future.
You can see:
#4323 (comment)

libcontainer/container_linux.go Outdated Show resolved Hide resolved
return err
}
if c.config.Hooks != nil {
if err := c.config.Hooks.Run(configs.Poststart, s); err != nil {
Copy link
Member

@lifubang lifubang Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the runtime-spec said:
The poststart hooks MUST be invoked by the runtime. If any poststart hook fails, the runtime MUST log a warning, but the remaining hooks and lifecycle continue as if the hook had succeeded.

So maybe this is another bug, we should write a new function to run poststart and poststop hooks.
@kolyshkin WDYT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, please help to open a new issue to track this bug, thanks.

This comment was marked as outdated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the current code logs a warning (like the spec says) but also terminates container's init and returns an error (unlike the spec says).

I'm not sure whether we should change the spec or runc behavior. In theory spec should be left as is, and runc should be changed to follow the spec. In practice, though, changing runc might result in backward incompatibilities and complexities in upper runtime. So yes, we have a choice to make here.

It's complicated 😿

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I researched the history a little bit.

  1. The initial implementation of poststart hooks was done by @mrunalp in 2015 (commit 452e8a7, PR Add poststart hooks #392, merged Nov 2015). If a hook was returning an error, the start was terminated and the error was returned.
  2. The "start must continue even if poststart hook failed" clause was added to runtime-spec by @RenaudWasTaken in Add create-runtime, create-container, start-container hooks runtime-spec#1008 (merged Dec 30 2019).
  3. More hooks were added to runc by @RenaudWasTaken (commit ccdd757, PR Add CreateRuntime, CreateContainer and StartContainer Hooks #2229, merged Jun 2020). The same commit touched the poststart code but didn't change it to not return an error if a hook failed.

In my mind, this is the spec that needs to be changed. The reasons are:

  • initial implementation predates the spec wording by a few years;
  • the wording in the spec was never implemented in runc;
  • returning an error (and stopping the container) seems like a more versatile approach (a hook can choose whether to return an error or not).

Opened opencontainers/runtime-spec#1262

Copy link
Author

@ningmingxiao ningmingxiao Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to stop container by using SIGTERM ,SIGKILL or both use?

@ningmingxiao
Copy link
Author

already backport #4351

@kolyshkin
Copy link
Contributor

The commit description still doesn't say what it does and why.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR removes the part where the container is killed when a hook fails. This should have been caught by an integration test, alas, it doesn't check that.

Opened #4352

@ningmingxiao ningmingxiao force-pushed the fix_poststart branch 2 times, most recently from d2fac01 to adcf828 Compare July 18, 2024 06:39
@ningmingxiao ningmingxiao force-pushed the fix_poststart branch 5 times, most recently from 5d066a8 to df2c44a Compare July 18, 2024 11:42
Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>
@cyphar
Copy link
Member

cyphar commented Oct 21, 2024

Moved the milestone to 1.3.0. Dealing with this is going to take a while, and we should just get 1.2.0 out (this is a pre-existing issue anyway).

@cyphar cyphar modified the milestones: 1.2.0, 1.3.0 Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1-todo A PR in main branch which needs to be backported to release-1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants