Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not sync job status correctly when upgrading from v1.5 #3640

Open
yuyue9284 opened this issue Jul 30, 2024 · 4 comments · May be fixed by #3652
Open

Can not sync job status correctly when upgrading from v1.5 #3640

yuyue9284 opened this issue Jul 30, 2024 · 4 comments · May be fixed by #3652
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.

Comments

@yuyue9284
Copy link

What happened:

When upgrading volcano from the v1.5 to later version, the already running job created by the v1.5 volcano cannot be correctly handled.

Anything else we need to know?:

v1.5 changed the naming logics of pod group by adding UID into the name: #2140, and there is also another fix regarding handling the already created pod group without UID in create or update: #2400. But a similar fix does not exist in the syncJob function.

So, in the following part the syncTask won't be set to true.

var syncTask bool
pgName := job.Name + "-" + string(job.UID)
if pg, _ := cc.pgLister.PodGroups(job.Namespace).Get(pgName); pg != nil {
if pg.Status.Phase != "" && pg.Status.Phase != scheduling.PodGroupPending {
syncTask = true
}
for _, condition := range pg.Status.Conditions {
if condition.Type == scheduling.PodGroupUnschedulableType {
cc.recorder.Eventf(job, v1.EventTypeWarning, string(batch.PodGroupPending),
fmt.Sprintf("PodGroup %s:%s unschedule,reason: %s", job.Namespace, job.Name, condition.Message))
}
}
}

Environment:

  • Volcano Version:
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@yuyue9284 yuyue9284 added the kind/bug Categorizes issue or PR as related to a bug. label Jul 30, 2024
@Monokaix
Copy link
Member

Yeah you're right, we haven't consider the old pg format when syncjob: )

@Monokaix
Copy link
Member

/good-first-issue

@volcano-sh-bot
Copy link
Contributor

@Monokaix:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot volcano-sh-bot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jul 30, 2024
@QingyaFan
Copy link

/assign

QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 5, 2024
v1.5 changed the naming logics of pod group by adding UID into the name: volcano-sh#2140, and there is also another fix regarding handling the already created pod group without UID in create or update: volcano-sh#2400. But a similar fix does not exist in the syncJob function.

Fixes volcano-sh#3640
QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 5, 2024
v1.5 changed the naming logics of pod group by adding UID into the name,
and there is a fix handling the already created pod group without UID in
create or update: volcano-sh#2400. But similar fix does not exist in syncJob function.

Fixes volcano-sh#3640
QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 5, 2024
v1.5 changed the naming logics of pod group by adding UID into the name,
and there is a fix handling the already created pod group without UID in
create or update: volcano-sh#2400. But similar fix does not exist in syncJob function.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 6, 2024
v1.5 changed the naming logics of pod group by adding UID into the name,
and there is a fix handling the already created pod group without UID in
create or update: volcano-sh#2400. But similar fix does not exist in syncJob function.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 6, 2024
v1.5 changed the naming logics of pod group by adding UID into the name,
and there is a fix handling the already created pod group without UID in
create or update: volcano-sh#2400. But similar fix does not exist in syncJob function.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 8, 2024
v1.5 changed the naming logics of pod group by adding UID into the name,
and there is a fix handling the already created pod group without UID in
create or update. But similar fix does not exist in syncJob function.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 8, 2024
v1.5 changed the naming logics of pod group by adding UID into the name,
and there is a fix handling the already created pod group without UID in
create or update. But similar fix does not exist in syncJob function.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 8, 2024
…#3640

v1.5 changed the naming logics of pod group by adding UID into the
name: volcano-sh#2140, and there is also another fix regarding handling the
already created pod group without UID in create or update: volcano-sh#2400.
But a similar fix does not exist in the syncJob function.

Fixes volcano-sh#3640
QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 8, 2024
…#3640

v1.5 changed the naming logics of pod group by adding UID into the
name: volcano-sh#2140, and there is also another fix regarding handling the
already created pod group without UID in create or update: volcano-sh#2400.
But a similar fix does not exist in the syncJob function.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 8, 2024
…#3640

v1.5 changed the naming logics of pod group by adding UID into the
name: volcano-sh#2140, syncJob function should change some logic.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this issue Aug 8, 2024
…#3640

v1.5 changed the naming logics of pod group by adding UID into the
name: volcano-sh#2140, syncJob function should change some logic.

Signed-off-by: cheerfun <qingyafan@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants