Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statefulsets need a moment to stabilize / not abort skaffold dev during launch #6912

Closed
DGollings opened this issue Nov 26, 2021 · 6 comments
Assignees
Labels
area/deploy help wanted We would love to have this done, but don't have the bandwidth, need help from contributors kind/bug Something isn't working priority/p3 agreed that this would be good to have, but no one is available at the moment.

Comments

@DGollings
Copy link

Expected behavior

Skaffold Dev starts, whilst allowing a little failure

Actual behavior

Skaffold sees an error and terminates everything

Information

  • Skaffold version: 1.35

Steps to reproduce the behavior

  1. Have a statefulset (I can paste the contents of nats/stan if requested)
  2. skaffold dev
  • statefulset/stan: container stan is backing off waiting to restart
    • pod/stan-0: container stan is backing off waiting to restart

      [stan-0 stan] [1] 2021/11/26 09:11:31.878330 [INF] STREAM: Starting nats-streaming-server[stan] version 0.16.2
      [stan-0 stan] [1] 2021/11/26 09:11:31.878355 [INF] STREAM: ServerID: DDROjri7DXdxYBnHLqvfWF
      [stan-0 stan] [1] 2021/11/26 09:11:31.878356 [INF] STREAM: Go version: go1.11.13
      [stan-0 stan] [1] 2021/11/26 09:11:31.878357 [INF] STREAM: Git commit: [910d6e1]
      [stan-0 stan] [1] 2021/11/26 09:11:31.881090 [INF] STREAM: Shutting down.
      [stan-0 stan] [1] 2021/11/26 09:11:31.881121 [FTL] STREAM: Failed to start: nats: no servers available for connection

  • statefulset/stan failed. Error: container stan is backing off waiting to restart.

As stan depends on nats this is very normal behaviour, simply die and try again, and thus 'impossible' to fix

Downgrading to < 1.35 instantly fixes the issue

Related to #4158, #6205 and in particular #6828

@aaron-prindle aaron-prindle added kind/bug Something isn't working area/deploy priority/p2 May take a couple of releases triage/discuss Items for discussion labels Nov 29, 2021
@aaron-prindle
Copy link
Contributor

aaron-prindle commented Nov 29, 2021

@gsquared94 can you add any information here regarding if this is intended behaviour from #6828 and what possible short-term/long-term fixes there might for this issue?

@tejal29 tejal29 removed the triage/discuss Items for discussion label Jan 10, 2022
@tejal29
Copy link
Contributor

tejal29 commented Jan 10, 2022

@DGollings thanks for the issue. We added Statefulsets status check recently.
Looks like the ask is to ignore this failure.
Does skaffold dev exit on the first occurrence of this failure?
If not, have you tries using the statusCheckDeadlineSeconds config field and bump the value ?

@DGollings
Copy link
Author

DGollings commented Jan 17, 2022

Looks like the ask is to ignore this failure.
Does skaffold dev exit on the first occurrence of this failure?

yes

If not, have you tries using the statusCheckDeadlineSeconds config field and bump the value ?

was already 600 secs, but instantly dies

@aaron-prindle aaron-prindle added priority/p3 agreed that this would be good to have, but no one is available at the moment. and removed priority/p2 May take a couple of releases labels Feb 22, 2022
@tejal29 tejal29 added the help wanted We would love to have this done, but don't have the bandwidth, need help from contributors label Feb 28, 2022
@tejal29
Copy link
Contributor

tejal29 commented May 9, 2022

Assigning this to @aaron-prindle. They are looking into it.

@tejal29
Copy link
Contributor

tejal29 commented Sep 2, 2022

We made a fix for auto-pilot cluster which got released in v2.0.0-beta2. Note: not available in cloud code.

@ericzzzzzzz
Copy link
Contributor

the issue can be fixed by adding --tolerate-failures-until-deadline flag when running skaffold dev , implementation #8047

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/deploy help wanted We would love to have this done, but don't have the bandwidth, need help from contributors kind/bug Something isn't working priority/p3 agreed that this would be good to have, but no one is available at the moment.
Projects
None yet
Development

No branches or pull requests

4 participants