Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipelineRun hangs in Pending state #1068

Closed
castlemilk opened this issue Jul 11, 2019 · 4 comments · Fixed by #1071
Closed

PipelineRun hangs in Pending state #1068

castlemilk opened this issue Jul 11, 2019 · 4 comments · Fixed by #1071
Assignees

Comments

@castlemilk
Copy link

castlemilk commented Jul 11, 2019

Expected Behavior

Pipeline should run singular task, as it would in pre v0.5.0

Actual Behavior

Pipeline stun in Pending state, waiting for a PVC to be provisioned which hasn't been created by the controller.

Logs will be observed warning:

{"level":"info","logger":"controller.taskrun-controller","caller":"taskrun/taskrun.go:334","msg":"Successfully reconciled taskrun demo-pipeline-run-1-build-skaffold-web-lvgrc/default with status: &apis.Condition{Type:\"Succeeded\", Status:\"Unknown\", Severity:\"\", LastTransitionTime:apis.VolatileTime{Inner:v1.Time{Time:time.Time{wall:0xbf42083cb35e1ed1, ext:4499218940429, loc:(*time.Location)(0x27a75a0)}}}, Reason:\"Pending\", Message:\"pod status \\\"PodScheduled\\\":\\\"False\\\"; message: \\\"persistentvolumeclaim \\\\\\\"demo-pipeline-run-1-pvc\\\\\\\" not found\\\"\"}","knative.dev/controller":"taskrun-controller"}

and

{"level":"warn","logger":"controller.pipeline-controller","caller":"artifacts/artifacts_storage.go:149","msg":"the configmap has no data","knative.dev/controller":"pipeline-controller"}

Steps to Reproduce the Problem

deploy the following pipeline:

---
# This demo modifies the cluster (deploys to it) you must use a service
# account with permission to admin the cluster (or make your default user an admin
# of the `default` namespace with default-cluster-admin.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: default-cluster-admin
subjects:
  - kind: ServiceAccount
    name: default
    namespace: default
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: tekton.dev/v1alpha1
kind: PipelineResource
metadata:
  name: skaffold-image-leeroy-web
spec:
  type: image
  params:
  - name: url
    value: gcr.io/christiewilson-catfactory/leeroy-web
---
apiVersion: tekton.dev/v1alpha1
kind: PipelineResource
metadata:
  name: skaffold-git
spec:
  type: git
  params:
  - name: revision
    value: v0.32.0
  - name: url
    value: https://github.com/GoogleContainerTools/skaffold
---
apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
  name: build-push
spec:
  inputs:
    resources:
    - name: workspace
      type: git
    params:
    - name: pathToDockerFile
      description: The path to the dockerfile to build
      default: /workspace/workspace/Dockerfile
    - name: pathToContext
      description: The build context used by Kaniko (https://github.com/GoogleContainerTools/kaniko#kaniko-build-contexts)
      default: /workspace/workspace
  outputs:
    resources:
    - name: builtImage
      type: image
  steps:
  - name: build-and-push
    image: gcr.io/kaniko-project/executor:v0.9.0
    # specifying DOCKER_CONFIG is required to allow kaniko to detect docker credential
    env:
    - name: "DOCKER_CONFIG"
      value: "/builder/home/.docker/"
    command:
    - /kaniko/executor
    args:
    - --dockerfile=${inputs.params.pathToDockerFile}
    - --destination=${outputs.resources.builtImage.url}
    - --context=${inputs.params.pathToContext}
---
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: demo-pipeline
spec:
  resources:
  - name: source-repo
    type: git
  - name: web-image
    type: image
  tasks:
  - name: build-skaffold-web
    taskRef:
      name: build-push
    params:
    - name: pathToDockerFile
      value: Dockerfile
    - name: pathToContext
      value: /workspace/workspace/examples/microservices/leeroy-web
    resources:
      inputs:
      - name: workspace
        resource: source-repo
      outputs:
      - name: builtImage
        resource: web-image
---
apiVersion: tekton.dev/v1alpha1
kind: PipelineRun
metadata:
  name: demo-pipeline-run-1
spec:
  pipelineRef:
    name: demo-pipeline
  serviceAccount: 'default'
  resources:
  - name: source-repo
    resourceRef:
      name: skaffold-git
  - name: web-image
    resourceRef:
      name: skaffold-image-leeroy-web

The Pod for running the pipeline will be created, but without a PVC, consequently the Pod is stuck in a Pending state, waiting for a referenced PVC, which hasn't been created by the tekton-controller.

The full pipeline defined here then works, which contains multiple tasks in one pipeline

Tekton Version - v0.5.0
Kubernetes Versio - 1.13

@bobcatfish
Copy link
Collaborator

bobcatfish commented Jul 11, 2019

Oh noes, I'm seeing the same thing :(((( Not sure how this slipped by our tests but I guess we'll find out!!

    startTime: 2019-07-11T21:46:50Z
    taskRuns:
      demo-pipeline-run-1-build-skaffold-web-8nvhj:
        pipelineTaskName: build-skaffold-web
        status:
          conditions:
          - lastTransitionTime: 2019-07-11T21:46:51Z
            message: 'pod status "PodScheduled":"False"; message: "persistentvolumeclaim
              \"demo-pipeline-run-1-pvc\" not found"'
            reason: Pending
            status: Unknown
            type: Succeeded
          podName: demo-pipeline-run-1-build-skaffold-web-8nvhj-pod-1f1a43
          startTime: 2019-07-11T21:46:50Z

I wanted to make a new 0.5.1 release with #1061 but we should fix this first!

I'm guessing this is a bug with #1007 so (icky) way to work around this would be to link the resources 😓 nvm, there's only one Task, that's probably the problem like you said @castlemilk - and why the CI didnt catch it 😩

@bobcatfish bobcatfish self-assigned this Jul 11, 2019
@castlemilk
Copy link
Author

castlemilk commented Jul 11, 2019

Awesome! yea I found if you create a second step that links or depends on the first it'll created a PVC and enable the pipeline to run.

bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jul 12, 2019
In tektoncd#1007 @dlorenc and I tried to fix the case where a PVC wasn't needed
for output -> input linking and it was being created anyway. What we
forgot to do was check to see where that PVC was being mounted. It turns
out that if a TaskRun has an output and is created by a PipelineRun
(this is checked via the owner reference), the TaskRun assumes it needs
to mount the volume and further adds containers to copy the output data
to the (possibly) non-existent PVC. @castlemilk caught this problem in tektoncd#1068.

The real fix here is probably going to involve an interface change b/c
we can't assume that just being owned by a PipelineRun means that a
linking PVC is going to be involved. This commit is a terrible and
probably race condition hack to make it so that if the PVC the TaskRun
is expecting doesn't exist, it doesn't attempt to add containers that
will copy data to it.

Making the hack even worse is that instead of adding more actual unit
tests, I updated the test to run all the existing unit tests twice, once
with this PVC existing and once with it not, and I manipulated the test
so that in the case where it doesn't exist, the expected outcome is
different. This is a terrible way to write tests and I hope we either
don't merge this or we fix it quickly afterward. @dlorenc and I are
going to work on a better fix tomorrow.

I also modified our end to end PipelineRun test to include an output
resource so we could reproduce the issue that @castlemilk reported.
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jul 12, 2019
In tektoncd#1007 @dlorenc and I tried to fix the case where a PVC wasn't needed
for output -> input linking and it was being created anyway. What we
forgot to do was check to see where that PVC was being mounted. It turns
out that if a TaskRun has an output and is created by a PipelineRun
(this is checked via the owner reference), the TaskRun assumes it needs
to mount the volume and further adds containers to copy the output data
to the (possibly) non-existent PVC. @castlemilk caught this problem in tektoncd#1068.

The real fix here is probably going to involve an interface change b/c
we can't assume that just being owned by a PipelineRun means that a
linking PVC is going to be involved. This commit is a terrible and
probably race condition hack to make it so that if the PVC the TaskRun
is expecting doesn't exist, it doesn't attempt to add containers that
will copy data to it.

Making the hack even worse is that instead of adding more actual unit
tests, I updated the test to run all the existing unit tests twice, once
with this PVC existing and once with it not, and I manipulated the test
so that in the case where it doesn't exist, the expected outcome is
different. This is a terrible way to write tests and I hope we either
don't merge this or we fix it quickly afterward. @dlorenc and I are
going to work on a better fix tomorrow.

I also modified our end to end PipelineRun test to include an output
resource so we could reproduce the issue that @castlemilk reported.
@vdemeester vdemeester added this to the Pipelines 0.5 🐱 milestone Jul 12, 2019
@bobcatfish
Copy link
Collaborator

Sorry again about this @castlemilk and thanks for the quick report! I've created https://github.com/tektoncd/pipeline/tree/release-v0.5.1 with our 2 bug fixes and I'll make 0.5.1 release first thing monday EST :) 🙏

@bobcatfish
Copy link
Collaborator

Okay https://github.com/tektoncd/pipeline/releases/tag/v0.5.1 should have this fixed! (It's back to making an extra PVC unfortunately, but we'll have a better fix for that via #937 hopefully!)

Thanks again for the report and your patience @castlemilk ! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants