Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the states in PodGroup is not accurate #578

Closed
1 of 4 tasks
zwpaper opened this issue Apr 15, 2023 · 20 comments
Closed
1 of 4 tasks

the states in PodGroup is not accurate #578

zwpaper opened this issue Apr 15, 2023 · 20 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@zwpaper
Copy link
Member

zwpaper commented Apr 15, 2023

Area

  • Scheduler
  • Controller
  • Helm Chart
  • Documents

Other components

No response

What happened?

after migrated to controller runtime, podGroup.status.scheduled count will not be updated by PostBind, and the phase transform seem not working as expected

What did you expect to happen?

PodGroup will reflect on pods states change and update in PodGroup.status.phase

but it is now accurate and sometimes wrong, we need to discuss the expected states change and make it always right.

let's discuss the state flow before working on it:

currently, we have the following states:

  • Pending: pod group has been accepted by the system
  • Running: minMember pods of the pod group are in running phase.
  • PreScheduling: all pods of the pod group have enqueued and are waiting to be scheduled
  • Scheduling: partial pods have been scheduled and are in running phase, not meet minMember
  • Scheduled: minMember pods have been scheduled and are in running phase. @Huang-Wei, is this right? seems duplicated with running
  • Unknown: part of pods scheduled, and some not
  • Finished: minMember pods are successfully finished
  • Failed: at least one of pods have failed

// PodGroupPending means the pod group has been accepted by the system, but scheduler can not allocate
// enough resources to it.
PodGroupPending PodGroupPhase = "Pending"
// PodGroupRunning means the `spec.minMember` pods of the pod group are in running phase.
PodGroupRunning PodGroupPhase = "Running"
// PodGroupPreScheduling means all pods of the pod group have enqueued and are waiting to be scheduled.
PodGroupPreScheduling PodGroupPhase = "PreScheduling"
// PodGroupScheduling means partial pods of the pod group have been scheduled and are in running phase
// but the number of running pods has not reached the `spec.minMember` pods of PodGroups.
PodGroupScheduling PodGroupPhase = "Scheduling"
// PodGroupScheduled means the `spec.minMember` pods of the pod group have been scheduled and are in running phase.
PodGroupScheduled PodGroupPhase = "Scheduled"
// PodGroupUnknown means a part of `spec.minMember` pods of the pod group have been scheduled but the others can not
// be scheduled due to, e.g. not enough resource; scheduler will wait for related controllers to recover them.
PodGroupUnknown PodGroupPhase = "Unknown"
// PodGroupFinished means the `spec.minMember` pods of the pod group are successfully finished.
PodGroupFinished PodGroupPhase = "Finished"
// PodGroupFailed means at least one of `spec.minMember` pods have failed.
PodGroupFailed PodGroupPhase = "Failed"

Please notice, the following phase only shows my understanding of the defined phase in the code, it may be a misunderstanding or could be discussed to improve

stateDiagram-v2
	state if_minMember <<choice>>
    [*] --> Pending
    Pending --> PreScheduling: pods added
    PreScheduling --> Scheduling: some of the pods scheduled
    Scheduling --> Scheduled: minMember pods scheduled, but not running
    Scheduled --> Running: minMember pods scheduled and running
    Running --> Failed: at least one of the pods failed
    Failed --> if_minMember: failed fixed
    if_minMember --> Scheduling: minMember does not meet
    if_minMember --> Scheduled: minMember meet
    Running --> Finished: all pods successfully finished
    Finished --> [*]
Loading

How can we reproduce it (as minimally and precisely as possible)?

  1. create a podGroup with minMember 3
  2. create 3 pods in podGroup
  3. change 1 of the pods to make it unschedulable
  4. we can see the phase not working as expected and scheduled count not right.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
1.25.7

Scheduler Plugins version

0.25.7
@zwpaper zwpaper added the kind/bug Categorizes issue or PR as related to a bug. label Apr 15, 2023
@Gekko0114
Copy link
Member

Thank you for the summary. It's very clear. Here are my opinions:

  1. Since Running and Scheduled are redundant, we can remove Scheduled.
  2. Let's define the following statuses:

Pending: pod group has been accepted by the system
Running: minMember pods of the pod group are in running phase.
PreScheduling: all pods of the pod group have enqueued and are waiting to be scheduled
Scheduling: partial pods have been scheduled and are in running phase, not meet minMember
Unknown: part of pods scheduled, and some not
Finished: minMember pods are successfully finished
Failed: at least one of pods have failed

  1. With the above definitions, PodGroupStatus.Scheduled becomes unnecessary and can be removed.

@zwpaper
Copy link
Member Author

zwpaper commented Apr 16, 2023

  1. Since Running and Scheduled are redundant, we can remove Scheduled.

Scheduled could mean scheduled but not yet running? it may be the original design intention

With the above definitions, PodGroupStatus.Scheduled becomes unnecessary and can be removed

if we keep the scheduled phase, then status.scheduled would also be kept.

the point may be that is there a phase pods scheduled but not running and whether we should expose this phase to users.

for example, pods scheduled but stuck on a ContainerCreating or some other status

@Gekko0114
Copy link
Member

Hi @Huang-Wei,
I would like to hear your suggestions regarding this issue. Would it be possible for you to comment at your convenience?

@Huang-Wei
Copy link
Contributor

I may delegate the review work to @denkensk as he was the original author. My 2 cents about the status revamp work:

  • if some status is transient, there is no point keeping it
  • if some status is nuanced to calculate while also exposing trivial information to end-user, we should eliminate it as well.

/assign @denkensk as primary reviewer.

@zwpaper
Copy link
Member Author

zwpaper commented May 2, 2023

/assign @denkensk

please take a look when you were free, the main point we were discussing is that:

  1. is the status flow we figure out from code right?
  2. is it still suitable after we migrated to ctrl runtime?
  3. if we need to change, what is your idea?

@Gekko0114
Copy link
Member

Hi @denkensk,
I would like to hear your suggestions regarding this issue. Could you comment at your convenience?

@denkensk
Copy link
Member

after migrated to controller runtime, podGroup.status.scheduled count will not be updated by PostBind, and the phase transform seem not working as expected

Can you describe why it doesn't work as expected in detail to help me understand this issue? @Gekko0114

@denkensk
Copy link
Member

With the above definitions, PodGroupStatus.Scheduled becomes unnecessary and can be removed.

+1

Also we can discuss whether the following two status can be removed as well?

  • Scheduling
  • PreScheduling

@Gekko0114
Copy link
Member

Gekko0114 commented May 11, 2023

Can you describe why it doesn't work as expected in detail to help me understand this issue?

@denkensk
Sure.
The transition of pg.Status.Phase to Scheduled is achieved by checking the number of pg.Status.Scheduled like this code.

However, pg.Status.Scheduled was only updated in PostBind logic.

That's why this problem occurred.
I've provided more detail here.
If you have any questions, please ask me.

@Gekko0114
Copy link
Member

Also we can discuss whether the following two status can be removed as well?

Scheduling
PreScheduling

I agree with the idea of removing the Scheduling phase.
In the current implementation, pg.Status.Phase would never transition to the Scheduling phase, so it can be removed.

However PreScheduling is used when the number of pods is bigger than minMember. It might be useful to keep it.

pgCopy.Status.Phase = schedv1alpha1.PodGroupPreScheduling
fillOccupiedObj(pgCopy, &pods[0])

@denkensk , @zwpaper
What do you think?

@zwpaper
Copy link
Member Author

zwpaper commented May 15, 2023

Hi @Gekko0114, it sounds good to me to remove the "scheduling" phase, but it seems to be odd if we did not have a "scheduling" but got a "preScheduling".

so, how about we keep the scheduling name, but let it works like the preScheduling.

in conclusion, the status flow would be like this:

stateDiagram-v2
	state if_minMember <<choice>>
    [*] --> Pending
    Pending --> Scheduling: pods added
    Scheduling --> Running: minMember pods scheduled
    Running --> Failed: at least one of the pods failed
    Failed --> if_minMember: failed fixed
    if_minMember --> Scheduling: minMember does not meet
    if_minMember --> Running: minMember meet
    Running --> Finished: all pods successfully finished
    Finished --> [*]
Loading

@denkensk
Copy link
Member

In the current implementation, pg.Status.Phase would never transition to the Scheduling phase, so it can be removed.

Thanks for your explain @Gekko0114

@denkensk
Copy link
Member

Ok. From @zwpaper suggestion, we will have 6 phases. It looks good to me. It will be clean. WDYT? @Gekko0114 @Huang-Wei

  • Pending: pod group is created and accepted by the system
  • Scheduling: the number of pods is bigger than minMember
  • Running
  • Failed
  • Finished
  • Unknown

@Gekko0114
Copy link
Member

@zwpaper, @denkensk
Sure, I agree with you. Thanks for clarifying the discussion. I will implement it.

@Gekko0114
Copy link
Member

#574
Updated the PR.

@Gekko0114
Copy link
Member

/close

@k8s-ci-robot
Copy link
Contributor

@Gekko0114: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Gekko0114
Copy link
Member

We can close it since I completed PR

@zwpaper
Copy link
Member Author

zwpaper commented Jul 1, 2023

thanks @Gekko0114

/close

@k8s-ci-robot
Copy link
Contributor

@zwpaper: Closing this issue.

In response to this:

thanks @Gekko0114

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants