Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve e2e test flakyness #1825

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions test/dag_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,12 @@ func TestDAGPipelineRun(t *testing.T) {
t.Fatalf("Failed to create echo Task: %s", err)
}

// Make sure the Pipeline has been created (wait for it)
if err := WaitForTaskCreated(c, "echo-task", "TaskCreated"); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth adding a helper method to create-and-wait-for-existence a resource? I'm worried we'll forget to wait for existence every time we create a Task/Pipeline/etc., and have to chase flakes forever as a result.

Do we know why Create returns before the resource is actually reliably created? That seems like a problem for lots of tests. Is there somewhere we're using a lister in the reconciler to retrieve a Task/Pipeline, where we should be using a direct client to get them, bypassing caching? We've seen listers being slow to notice new objects in the past, and have (mostly?) replaced them with regular k8s clients as a result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so sure and more I read your comment and more I think that my patch may not address the fundamental problem, it doesn't matter if we know it has been created if it's actually the reconciler being slow to be notified.

Is there somewhere we're using a lister in the reconciler to retrieve a

I am not so sure about it (and probably will start to ping you about if i have to figure out :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c := &Reconciler{
Base: reconciler.NewBase(opt, pipelineRunAgentName, images),
pipelineRunLister: pipelineRunInformer.Lister(),
pipelineLister: pipelineInformer.Lister(),
taskLister: taskInformer.Lister(),
clusterTaskLister: clusterTaskInformer.Lister(),
taskRunLister: taskRunInformer.Lister(),
resourceLister: resourceInformer.Lister(),
conditionLister: conditionInformer.Lister(),
timeoutHandler: timeoutHandler,
metrics: metrics,
?

t.Errorf("Error waiting for Task echo-task to be created: %s", err)
t.Fatal("Pipeline execution failed, Task echo-task has not been created")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Don't need a t.Error and a t.Fatal, you could just combine these into one descriptive t.Fatal.

}

// Create the repo PipelineResource (doesn't really matter which repo we use)
repoResource := tb.PipelineResource("repo", namespace, tb.PipelineResourceSpec(
v1alpha1.PipelineResourceTypeGit,
Expand Down Expand Up @@ -105,6 +111,13 @@ func TestDAGPipelineRun(t *testing.T) {
if _, err := c.PipelineClient.Create(pipeline); err != nil {
t.Fatalf("Failed to create dag-pipeline: %s", err)
}

// Make sure the Pipeline has been created (wait for it)
if err := WaitForPipelineCreated(c, "dag-pipeline", "PipelineCreated"); err != nil {
t.Errorf("Error waiting for Pipeline dag-pipeline to be created: %s", err)
t.Fatal("Pipeline execution failed, Pipeline dag-pipeline has not been created")
}

pipelineRun := tb.PipelineRun("dag-pipeline-run", namespace, tb.PipelineRunSpec("dag-pipeline",
tb.PipelineRunResourceBinding("repo", tb.PipelineResourceBindingRef("repo")),
))
Expand Down
6 changes: 6 additions & 0 deletions test/git_checkout_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,12 @@ func TestGitPipelineRun(t *testing.T) {
t.Fatalf("Failed to create Pipeline `%s`: %s", gitTestPipelineName, err)
}

// Make sure the Pipeline has been created (wait for it)
if err := WaitForPipelineCreated(c, gitTestPipelineName, "PipelineCreated"); err != nil {
t.Errorf("Error waiting for Pipeline %s to be created: %s", gitTestPipelineName, err)
t.Fatalf("Pipeline execution failed, Pipeline %s has not been created", gitTestPipelineName)
}

t.Logf("Creating PipelineRun %s", gitTestPipelineRunName)
if _, err := c.PipelineRunClient.Create(getGitCheckPipelineRun(namespace)); err != nil {
t.Fatalf("Failed to create Pipeline `%s`: %s", gitTestPipelineRunName, err)
Expand Down
6 changes: 6 additions & 0 deletions test/sidecar_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,12 @@ func TestSidecarTaskSupport(t *testing.T) {
t.Errorf("Failed to create Task %q: %v", sidecarTaskName, err)
}

// Make sure the Task has been created (wait for it)
if err := WaitForTaskCreated(clients, sidecarTaskName, "TaskCreated"); err != nil {
t.Errorf("Error waiting for Task %s to be created: %s", sidecarTaskName, err)
t.Fatalf("Task execution failed, Task %s has not been created", sidecarTaskName)
}

t.Logf("Creating TaskRun %q", sidecarTaskRunName)
if _, err := clients.TaskRunClient.Create(taskRun); err != nil {
t.Errorf("Failed to create TaskRun %q: %v", sidecarTaskRunName, err)
Expand Down
30 changes: 30 additions & 0 deletions test/wait.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,36 @@ type TaskRunStateFn func(r *v1alpha1.TaskRun) (bool, error)
// PipelineRunStateFn is a condition function on TaskRun used polling functions
type PipelineRunStateFn func(pr *v1alpha1.PipelineRun) (bool, error)

// WaitForPipelineCreated wait until a pipeline has been created
func WaitForPipelineCreated(c *clients, name, desc string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice <3

metricName := fmt.Sprintf("WaitForPipelineCreated/%s/%s", name, desc)
_, span := trace.StartSpan(context.Background(), metricName)
defer span.End()

return wait.PollImmediate(interval, timeout, func() (bool, error) {
pc, err := c.PipelineClient.Get(name, metav1.GetOptions{})
if pc.GetName() == name {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need this check? Is it a way to check if no result was returned?
Wouldn't we get then an err =! nil?

return true, err
}
return false, nil
})
}

// WaitForTaskCreated wait until a task has been created
func WaitForTaskCreated(c *clients, name, desc string) error {
metricName := fmt.Sprintf("WaitForTaskCreated/%s/%s", name, desc)
_, span := trace.StartSpan(context.Background(), metricName)
defer span.End()

return wait.PollImmediate(interval, timeout, func() (bool, error) {
tc, err := c.TaskClient.Get(name, metav1.GetOptions{})
if tc.GetName() == name {
return true, err
}
return false, err
})
}

// WaitForTaskRunState polls the status of the TaskRun called name from client every
// interval until inState returns `true` indicating it is done, returns an
// error or timeout. desc will be used to name the metric that is emitted to
Expand Down