Intermittent "InvalidTaskResultReference" when running tasks with big results #4529

skaegi · 2022-01-27T21:16:08Z

We've been seeing an increasing number of teams reporting InvalidTaskResultReference problems in pipelines that might otherwise run successfully. We are seeing this infrequently (1 in 40) but we've now seen it a number of different unrelated pipelines. Something like this...

The "results" in question are generally large-ish -- 2K+ (correction here thanks @pritidesai) and are present when we look ;)

We suspect that what's happening is that there is a race somewhere. Perhaps a race between the "next" task running and the current TaskRun being updated with the Result. That's just a guess, but does this seem possible/likely?

I'll try to create a good test-case but this sort of race condition is not easy to get to happen on demand although it clearly is occurring with some regularity.

The text was updated successfully, but these errors were encountered:

pritidesai · 2022-02-07T20:25:40Z

I think its very likely possible and surprised to see this happening intermittently.

The taskRun status is updated once all containers in the pod have terminated (pod succeeded or failed).

pipeline/pkg/pod/status.go

Lines 109 to 115 in 6cb0f4c

    
           complete := areStepsComplete(pod) || pod.Status.Phase == corev1.PodSucceeded || pod.Status.Phase == corev1.PodFailed 
        
           if complete { 
        
           	updateCompletedTaskRunStatus(logger, trs, pod) 
        
           } else { 
        
           	updateIncompleteTaskRunStatus(trs, pod) 
        
           }

updateCompletedTaskRunStatus includes updating the condition and competition time.

But there is more updates done after updating the status to completed.

In the end, the taskRun status is updated with the results:

pipeline/pkg/pod/status.go

Line 138 in 6cb0f4c

trs.TaskRunResults = removeDuplicateResults(trs.TaskRunResults)

So if its possible for the pipelineRun controller to pick up the completed status before the task results are populated, its also possible to run into this issue.

lbernick · 2022-02-07T20:30:21Z

/priority important-soon

pritidesai · 2022-02-07T20:32:23Z

@lbernick thanks for adding the priority 👍 Let me know if you want to take a stab at it else I am happy to try fixing it 🙏

/assign

chitrangpatel · 2022-04-21T17:22:05Z

/assign

vdemeester · 2022-04-28T12:33:59Z

@skaegi @pritidesai if you manage to reproduce, can you look at the container's termination message. We do have some similar failures that happens because in the case it fails, the termination message content (json) is cut in the middle (and thus invalid).

vdemeester · 2022-04-28T14:51:22Z

/assign
I think I know why this is happening, I'll create an additional issue around this, but in a gist, it's because of the termination message behavior in k8s with multiple containers.

The total size of termination messages of all containers in a pod cannot exceed 12 KB. If the total size exceeds 12 KB, the state manager of Kubernetes sets a limit on the termination message sizes. For example, if a pod contains four InitContainers and eight application containers, the state manager limits the termination message of each container to 1 KB. This indicates that only the first 1 KB of the termination message of each container is intercepted.

vdemeester · 2022-04-28T15:45:23Z

@skaegi @pritidesai @chitrangpatel see #4808

vdemeester · 2022-05-04T15:43:56Z

@skaegi #4826 has a "regression" test that reproduce this constantly between 0.32 and 0.35 😉

dibyom · 2022-06-14T16:45:00Z

Closing in favor of #4808 which tracks larger results in general

skaegi added the kind/bug Categorizes issue or PR as related to a bug. label Jan 27, 2022

tekton-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 7, 2022

tekton-robot assigned pritidesai Feb 7, 2022

pritidesai added this to the Pipelines v0.34 milestone Feb 9, 2022

pritidesai modified the milestones: Pipelines v0.34, Pipelines v0.35 Mar 22, 2022

tekton-robot assigned chitrangpatel Apr 21, 2022

tekton-robot assigned vdemeester Apr 28, 2022

vdemeester mentioned this issue Apr 28, 2022

Results, TerminationMessage and Containers #4808

Open

4 tasks

pritidesai modified the milestones: Pipelines v0.35, Pipelines v0.36 May 3, 2022

vdemeester modified the milestones: Pipelines v0.36, Pipelines v0.37 May 16, 2022

jerop unassigned chitrangpatel May 26, 2022

dibyom closed this as completed Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent "InvalidTaskResultReference" when running tasks with big results #4529

Intermittent "InvalidTaskResultReference" when running tasks with big results #4529

skaegi commented Jan 27, 2022 •

edited

Loading

pritidesai commented Feb 7, 2022 •

edited

Loading

lbernick commented Feb 7, 2022

pritidesai commented Feb 7, 2022

chitrangpatel commented Apr 21, 2022

vdemeester commented Apr 28, 2022

vdemeester commented Apr 28, 2022

vdemeester commented Apr 28, 2022

vdemeester commented May 4, 2022

dibyom commented Jun 14, 2022

Intermittent "InvalidTaskResultReference" when running tasks with big results #4529

Intermittent "InvalidTaskResultReference" when running tasks with big results #4529

Comments

skaegi commented Jan 27, 2022 • edited Loading

pritidesai commented Feb 7, 2022 • edited Loading

lbernick commented Feb 7, 2022

pritidesai commented Feb 7, 2022

chitrangpatel commented Apr 21, 2022

vdemeester commented Apr 28, 2022

vdemeester commented Apr 28, 2022

vdemeester commented Apr 28, 2022

vdemeester commented May 4, 2022

dibyom commented Jun 14, 2022

skaegi commented Jan 27, 2022 •

edited

Loading

pritidesai commented Feb 7, 2022 •

edited

Loading