Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

k8s: fix wrong number cpus after killing a container #2251

Merged
merged 2 commits into from
Nov 26, 2019

Conversation

devimc
Copy link

@devimc devimc commented Nov 22, 2019

Don't hot add again non-running container resources to avoid having extra
and useless resources

fixes #2186

NOTE: this PR won't break backward compatibility

@devimc
Copy link
Author

devimc commented Nov 22, 2019

/test

@codecov
Copy link

codecov bot commented Nov 22, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@dd15db3). Click here to learn what that means.
The diff coverage is 50%.

@@            Coverage Diff            @@
##             master    #2251   +/-   ##
=========================================
  Coverage          ?   48.85%           
=========================================
  Files             ?      111           
  Lines             ?    16135           
  Branches          ?        0           
=========================================
  Hits              ?     7882           
  Misses            ?     7274           
  Partials          ?      979

devimc pushed a commit to devimc/kata-tests that referenced this pull request Nov 22, 2019
Run the test that checks the number of cpus after killing a
container

Depends-on: github.com/kata-containers/runtime#2251

fixes kata-containers#2116

Signed-off-by: Julio Montes <julio.montes@intel.com>
Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @devimc - is there any way you can add a unit test for this?

@@ -1951,6 +1955,12 @@ func (s *Sandbox) updateResources() error {
func (s *Sandbox) calculateSandboxMemory() int64 {
memorySandbox := int64(0)
for _, c := range s.config.Containers {
// Do not hot add again non-running containers resources
if cont, ok := s.containers[c.ID]; ok && cont.state.State == types.StateStopped {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you need to recalculate resources in s.StartContainer(), right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point!, let me do it and run the tests

@lifupan
Copy link
Member

lifupan commented Nov 25, 2019

Hi @devimc, I have a question about this issue the PR fixed: once the container process was killed, the contaienrd/cri would get a task exit event and would try to delete the container from https://github.com/containerd/cri/blob/master/pkg/server/events.go#L320, which would trigger kata shimv2 to delete the container from the sandbox. But from your fix, it seem that the killed container was still existed in the sandbox, Im puzzled with why the stopped container wasn't deleted from sandbox or am I missed some thing?

@devimc
Copy link
Author

devimc commented Nov 25, 2019

@jodh-intel there is already an integration test - https://github.com/kata-containers/tests/blob/master/integration/kubernetes/k8s-number-cpus.bats

@lifupan yes, you are right, the container status is stopped but delete is never called, instead k8s creates a new container. Something weird, if this new container is killed, its status change to stopped and delete is called, I guess because k8s don't want to delete the original containers? 🤔

@devimc devimc added the do-not-merge PR has problems or depends on another label Nov 25, 2019
Julio Montes added 2 commits November 25, 2019 18:42
Status of container should know prior to calculate the number of CPU
and memory

Signed-off-by: Julio Montes <julio.montes@intel.com>
Don't hot add again non-running container resources to avoid having extra
and useless resources

fixes kata-containers#2186

Signed-off-by: Julio Montes <julio.montes@intel.com>
@devimc devimc force-pushed the topic/k8s/fixWrongNumberCPUs branch from dbfae48 to b7731e9 Compare November 25, 2019 18:42
@devimc
Copy link
Author

devimc commented Nov 25, 2019

@bergwolf change applied, thanks

/test

@devimc devimc removed the do-not-merge PR has problems or depends on another label Nov 25, 2019
@lifupan
Copy link
Member

lifupan commented Nov 26, 2019

@lifupan yes, you are right, the container status is stopped but delete is never called, instead k8s creates a new container. Something weird, if this new container is killed, its status change to stopped and delete is called, I guess because k8s don't want to delete the original containers? 🤔

Actually k8s didn't involved into the stopped container's keep/delete in runtime, it's containerd/cri's responsibility. As you said it's weird, maybe there would be another bug embedded somewhere.

@devimc
Copy link
Author

devimc commented Nov 26, 2019

can we merge this?

@amshinde
Copy link
Member

Did you test this with crio or containerd? Do you see the same behaviour with both? Would be good to track down any cri error.

@amshinde amshinde merged commit d054556 into kata-containers:master Nov 26, 2019
@devimc
Copy link
Author

devimc commented Nov 26, 2019

@amshinde no, I just tested with crio

@amshinde
Copy link
Member

amshinde commented Dec 2, 2019

@devimc Looks like a candidate for backporting. Can you open a PR to do so.

@devimc
Copy link
Author

devimc commented Dec 3, 2019

@amshinde done #2313

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

k8s: wrong number of cpus after killing a container
6 participants