Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skaffold apply deployments never stabilize on Skaffold 1.35 #6870

Closed
edwardthiele opened this issue Nov 17, 2021 · 4 comments · Fixed by #6875
Closed

Skaffold apply deployments never stabilize on Skaffold 1.35 #6870

edwardthiele opened this issue Nov 17, 2021 · 4 comments · Fixed by #6875
Assignees
Labels
area/status-check kind/bug Something isn't working priority/p0 Highest priority. We are actively looking at delivering it.

Comments

@edwardthiele
Copy link

Using the manifest file "manifest.xml"

apiVersion: v1
kind: Pod
metadata:
  name: getting-started
spec:
  containers:
  - image: k8s.gcr.io/echoserver:1.4
    name: echoserver

with the command:

/usr/local/bin/skaffold apply 'manifest.yaml'

skaffold apply 'manifest.yaml'
Starting deploy...

  • pod/getting-started configured
    Waiting for deployments to stabilize...
  • pods: could not stabilize within 10m0s
  • pods failed. Error: could not stabilize within 10m0s.

The actual deployment appears to work as expected

Expected behavior

The deployment reports success

Actual behavior

The command times out

Information

  • Skaffold version: 1.35
  • Operating system: Linux
  • Installed via: skaffold.dev
@gsquared94 gsquared94 self-assigned this Nov 17, 2021
@gsquared94 gsquared94 added kind/bug Something isn't working area/status-check needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug priority/p0 Highest priority. We are actively looking at delivering it. and removed needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug labels Nov 17, 2021
@briandealwis
Copy link
Member

I've been able to reproduce this by deploying to a GKE Autopilot cluster. In digging into the issue, I've found that we call into checkStandalonePodsStatus and len(r.resources) == 0 and so we cycle:

func (r *Resource) checkStandalonePodsStatus(ctx context.Context, cfg kubectl.Config) *proto.ActionableErr {
if len(r.resources) == 0 {
return &proto.ActionableErr{ErrCode: proto.StatusCode_STATUSCHECK_STANDALONE_PODS_PENDING}
}

DEBU[0015] about to call CheckStatus on pods             subtask=-1 task=DevLoop
DEBU[0015] checkStandalonePodsStatus                     subtask=-1 task=DevLoop
DEBU[0015] update status: errCode:STATUSCHECK_STANDALONE_PODS_PENDING  subtask=-1 task=DevLoop
DEBU[0015] finished call to CheckStatus on pods          subtask=-1 task=DevLoop
DEBU[0016] about to call CheckStatus on pods             subtask=-1 task=DevLoop
DEBU[0016] checkStandalonePodsStatus                     subtask=-1 task=DevLoop
DEBU[0016] update status: errCode:STATUSCHECK_STANDALONE_PODS_PENDING  subtask=-1 task=DevLoop
DEBU[0016] finished call to CheckStatus on pods          subtask=-1 task=DevLoop
DEBU[0017] about to call CheckStatus on pods             subtask=-1 task=DevLoop
DEBU[0017] checkStandalonePodsStatus                     subtask=-1 task=DevLoop
DEBU[0017] update status: errCode:STATUSCHECK_STANDALONE_PODS_PENDING  subtask=-1 task=DevLoop
DEBU[0017] finished call to CheckStatus on pods          subtask=-1 task=DevLoop

@briandealwis
Copy link
Member

briandealwis commented Nov 17, 2021

Steps to reproduce:

  1. Use an Autopilot cluster
  2. cd examples/getting-started
  3. copy the manifest above into echo.yaml
  4. skaffold apply -vdebug echo.yaml

@gsquared94
Copy link
Contributor

I can actually consistently repro this locally against a kind cluster. I was able to debug it to be a skaffold apply issue that isn't setting namespaces correctly. For instance, running skaffold dev on Edward's manifest.yaml file sets the namespace "default" in the status.Monitor

But running skaffold apply manifest.yaml sets the namespace "". Which gets filtered out here, so status check just hangs.

@gsquared94
Copy link
Contributor

The difference between apply and dev is being caused in these lines:

if !offline {
return k.kubectl.ReadManifests(ctx, manifests)
}

In apply we don't run kubectl create --dry-run so the ManifestList doesn't have metadata.namespace populated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/status-check kind/bug Something isn't working priority/p0 Highest priority. We are actively looking at delivering it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants