-
Notifications
You must be signed in to change notification settings - Fork 193
fix the race in case sandbox start failed during pod creating #600
Conversation
Signed-off-by: Wang Xu <gnawux@gmail.com>
And this should be part of #462 |
retest this please, hykins |
safe quit from #557 init error:
|
@Crazykev could you also review it pls? |
@laijs Cool, glad to help. |
daemon/pod/provision.go
Outdated
p.Log(ERROR, "init chan broken") | ||
return false, nil | ||
} | ||
p.initChan <- res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I've understood why we need send this result back here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case there are several op wait pod running, only one could get notification from the chan, and send back to let others get the event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that should be fine, I was worried if we need to send this in restore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. But I am thinking whether I should change to Cond/Wait here....
LGTM. @laijs Do you want have another look? |
wait, let me check if I could use Cond/Wait instead of a chan, which will help understanding. |
OK, I recheck current implement, if we start a running pod, there will be an ugly error
Could you also fix this within this patch? |
@Crazykev do you mean just return success if pod is running? |
maybe need get current IPs for save? Signed-off-by: Wang Xu <gnawux@gmail.com>
Return success or report pod is already running is fine, just don't try to add container to that pod again. |
updated |
daemon/pod/container.go
Outdated
if cs.State != S_CONTAINER_CREATED { | ||
if cs.State == S_CONTAINER_RUNNING { | ||
return errors.ErrContainerAlreadyRunning | ||
} else if cs.State != S_CONTAINER_CREATED { | ||
return fmt.Errorf("only CREATING container could be set to creatd, current: %d", cs.State) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this is a wrong error message, could you also fix it here.
I think this should be only CREATED container could be set to RUNNING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.... I didn't change this line, but looks you are right....
daemon/pod/provision.go
Outdated
func (p *XPod) waitPodRun(activity string) error { | ||
p.statusLock.RLock() | ||
for { | ||
if p.status == S_POD_RUNNING || p.statusLock == S_POD_PAUSED { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.statusLock == S_POD_PAUSED
-> p.status == S_POD_PAUSED
daemon/pod/provision.go
Outdated
p.Log(DEBUG, "pod is running, proceed %s", activity) | ||
return nil | ||
} | ||
if p.statusLock != S_POD_STARTING { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.statusLock != S_POD_STARTING
-> p.status != S_POD_STARTING
@@ -211,7 +214,7 @@ func (p *XPod) ContainerStart(cid string) error { | |||
// Start() means start a STOPPED pod. | |||
func (p *XPod) Start() error { | |||
|
|||
if p.status == S_POD_STOPPED { | |||
if p.IsStopped() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember there is a S_POD_ERROR
state, should we filter it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave this status in later issue/pr.
In principle, if a pod failed and could be restarted, it should be set back to stopped status. Only those meet fatal exception and could not start any more should stay in ERROR status.
However, sometimes the daemon can not judge the situation accurately. And I think it might be better if we could allow a user to set back status from ERROR to STOPPED manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, more complicated than I thought.
f48b73c
to
4c4585d
Compare
@laijs would you like look at this test result: http://ci.hypercontainer.io:8080/job/hyperd-auto/338/console . looks runv can't get hyperstart version |
|
@gnawux Changes LGTM, although there is still little flaw in that error message, we can fix it later. This patch is more important. |
.... one line need two fix... |
Signed-off-by: Wang Xu <gnawux@gmail.com>
if sandbox failed, the cleanup and pod provision may race Signed-off-by: Wang Xu <gnawux@gmail.com>
updated again |
CI passed, can I merge this? |
Just do it |
For #598