-
Notifications
You must be signed in to change notification settings - Fork 549
clean ${PAI_WORK_DIR} before mv content to this folder #3695
Conversation
In prod env, we saw when node down and up. The pod running on this node may be restarted by kubelet. Even though the `restartPolicy` set to `Never`. If that happens, the init container script will failed when executing `mv` command. To prevent this, we clean the `${PAI_WORK_DIR}` before mv content to this folder.
We should make our runtime Idempotent, since k8s restartPolicy=Never seems to be at least once execution like FC. So please confirm it and refine other places in runtime code. Seems at least for below code, after create container, kubelet does not record the container id for recovery, so if kubelet is killed just after StartContainer, kubelet will create another container even if restartPolicy=Never, because kubelet does not know it has already created before. |
src/kube-runtime/src/init
Outdated
@@ -78,6 +79,10 @@ PAI_RUNTIME_DIR=${PAI_WORK_DIR}/runtime.d | |||
|
|||
PAI_LOG_DIR=${PAI_WORK_DIR}/logs/${FC_POD_UID} | |||
|
|||
# Clean ${PAI_WORK_DIR} and ${PAI_LOG_DIR} since they may contain last execution content. (rarely happen, but seen in real world) | |||
rm -rf ${PAI_WORK_DIR}/* | |||
rm -rf ${PAI_LOG_DIR}/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to clean log dir? keep the previous log may be helpful #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Move previous logs to another folder. Notice: for init.log, some part will append to previous log file | ||
LOG_FILES=$(find $PAI_LOG_DIR -maxdepth 1 -type f) | ||
if [[ ! -z $LOG_FILES ]]; then | ||
RANDOM=$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we print warning log here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0931eac
to
f162631
Compare
In prod env, we saw when node down and up. The pod running on this node
may be restarted by kubelet. Even though the
restartPolicy
set toNever
. If that happens, the init container script will failed whenexecuting
mv
command. To prevent this, we clean the${PAI_WORK_DIR}
before mv content to this folder.