Test Flake: Unit test cmd/entrypoint TestRealRunnerTimeout #4643

lbernick · 2022-03-03T15:47:39Z

Sadly not much info present in log file:

=== RUN   TestRealRunnerTimeout
    runner_test.go:37: step didn't timeout
--- FAIL: TestRealRunnerTimeout (0.03s)

/kind flake
/priority important-soon

The text was updated successfully, but these errors were encountered:

lbernick · 2022-03-03T15:59:21Z

This could be fixed by rewriting waiter.Wait to use the k8s Clock.Sleep, and the test to use FakeClock.Sleep.

tekton-robot · 2022-07-24T19:56:41Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

abayer · 2022-08-11T15:34:14Z

/lifecycle frozen

I hit this one while trying to reproduce the various events-related flakes, so it's still a thing. I thought that maybe increasing the timeout in the test would help, but I seem to still be hitting it...

abayer · 2022-08-11T15:35:27Z

Oh, and I reproduced it with this:

while go test -race -count 100 -v ./cmd/entrypoint > flake-log; do :; done

abayer · 2022-08-11T15:49:37Z

...and now I can't reproduce it at all! Fun times.

abayer · 2022-08-11T16:00:47Z

Yeah, it failing or not depended on what I had for -count and the overall load on my system, not on anything flaky. Sigh. Well, it's still a flake, just one I can't come up with a fix for...

Yongxuanzhang · 2022-12-02T16:48:07Z

I'm looking into this issue and I'm not sure what the testcase wants to test. From the test description it wants to test that the runner will be killed after a millisecond (though it wants to sleep 10 ms?),

pipeline/cmd/entrypoint/runner_test.go

Line 123 in 0b8349b

    
           // TestRealRunnerTimeout tests whether cmd is killed after a millisecond even though it's supposed to sleep for 10 milliseconds.

and the DeadlineExceeded error can be retuned in 2 places:

pipeline/cmd/entrypoint/runner.go

Lines 120 to 125 in f83cd1f

    
           if err := cmd.Start(); err != nil { 
        
           	if ctx.Err() == context.DeadlineExceeded { 
        
           		return context.DeadlineExceeded 
        
           	} 
        
           	return err 
        
           }

pipeline/cmd/entrypoint/runner.go

Lines 154 to 158 in f83cd1f

    
           if err := cmd.Wait(); err != nil { 
        
           	if ctx.Err() == context.DeadlineExceeded { 
        
           		return context.DeadlineExceeded 
        
           	} 
        
           	return err

It seems that the DeadlineExceeded can still be returned if the runner really sleep 10 ms?

lbernick · 2022-12-02T17:11:15Z

I'm looking into this issue and I'm not sure what the testcase wants to test. From the test description it wants to test that the runner will be killed after a millisecond (though it wants to sleep 10 ms?),

The command here that the entrypoint is supposed to be running is "sleep 10ms". This test ensures that the runner can be timed out in the appropriate time. i.e. it wants to ensure that the runner times out after 1ms even though the command is not done.

It seems that the DeadlineExceeded can still be returned if the runner really sleep 10 ms?

Maybe? If so, that would mean that our tests could pass because the runner finished running and not because it got timed out correctly, which isn't great. (I suspect our timeout logic works correctly but ideally our tests should tell us this!) To test out whether or not this could happen, you could set the timeout to a very long time, and see if the tests still pass when the "sleep" command completes after 10ms.

Yongxuanzhang · 2022-12-02T17:47:12Z

From the test and code, it uses context.WithTimeout(context.Background(), timeout) this timeout ctx to count the time, and the ctx is also added to the cmd, right now the error is returned here:

pipeline/cmd/entrypoint/runner.go

Line 120 in 09c7285

if err := cmd.Start(); err != nil {

And if we jump into the Start(), it will return before the process is created to execute the cmd b/c the ctx is already done with timeout err, it this expected?

So I'm curious if the test is really testing what it says. Or maybe my understanding is wrong

Yongxuanzhang · 2023-01-25T18:58:08Z

/assign

This commit increase the timer for TestRealRunnerTimeout and hope this could reduce the flake of tektoncd#4643. Some thoughts about why tektoncd#4643 happened.The flaky test got "step didn't timeout", which means that the rr.Run doesn't return any errors, including the DeadlineExceeded error. It could be that the context timeout is accidentaly larger than the sleep time and the Run finishes without context timeout. So I think we may increase the sleep time to avoid this flake. Even it is already a rare case. Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com

tekton-robot added kind/flake Categorizes issue or PR as related to a flakey test priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 3, 2022

dibyom added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 25, 2022

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 24, 2022

tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 11, 2022

pritidesai mentioned this issue Sep 16, 2022

hardening looksLikeResultRef - params and when expressions #5465

Merged

7 tasks

xchapter7x added this to Tekton Community Roadmap Sep 20, 2022

xchapter7x moved this to Todo in Tekton Community Roadmap Sep 20, 2022

jerop mentioned this issue Oct 24, 2022

[TEP-0115] Support Artifact Hub in Hub Resolver #5666

Merged

7 tasks

JeromeJu mentioned this issue Jan 9, 2023

Add taskrun.status.cloudEvents to deprecation.md #5937

Merged

6 tasks

tekton-robot assigned Yongxuanzhang Jan 25, 2023

lbernick mentioned this issue Feb 3, 2023

[TEP074] Remove Pullrequest-init Image #6078

Merged

7 tasks

jerop mentioned this issue Feb 3, 2023

Bump k8s.io/api from 0.25.4 to 0.26.1 in /test/custom-task-ctrls/wait-task-beta #6084

Closed

Yongxuanzhang mentioned this issue Feb 8, 2023

add feature gate check for param array indexing #6120

Closed

7 tasks

Yongxuanzhang mentioned this issue Mar 21, 2023

increase timer for TestRealRunnerTimeout #6409

Closed

7 tasks

JeromeJu mentioned this issue Jun 12, 2023

Add instructions for cherry-picking commits for patch releases #6788

Merged

4 tasks

JeromeJu mentioned this issue Oct 10, 2023

Rename test cases for beta feature validations #7198

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Flake: Unit test cmd/entrypoint TestRealRunnerTimeout #4643

Test Flake: Unit test cmd/entrypoint TestRealRunnerTimeout #4643

lbernick commented Mar 3, 2022

lbernick commented Mar 3, 2022

tekton-robot commented Jul 24, 2022

abayer commented Aug 11, 2022

abayer commented Aug 11, 2022

abayer commented Aug 11, 2022

abayer commented Aug 11, 2022

Yongxuanzhang commented Dec 2, 2022

lbernick commented Dec 2, 2022

Yongxuanzhang commented Dec 2, 2022 •

edited

Loading

Yongxuanzhang commented Jan 25, 2023

Test Flake: Unit test cmd/entrypoint TestRealRunnerTimeout #4643

Test Flake: Unit test cmd/entrypoint TestRealRunnerTimeout #4643

Comments

lbernick commented Mar 3, 2022

lbernick commented Mar 3, 2022

tekton-robot commented Jul 24, 2022

abayer commented Aug 11, 2022

abayer commented Aug 11, 2022

abayer commented Aug 11, 2022

abayer commented Aug 11, 2022

Yongxuanzhang commented Dec 2, 2022

lbernick commented Dec 2, 2022

Yongxuanzhang commented Dec 2, 2022 • edited Loading

Yongxuanzhang commented Jan 25, 2023

Yongxuanzhang commented Dec 2, 2022 •

edited

Loading