-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test Flake: Unit test cmd/entrypoint TestRealRunnerTimeout #4643
Comments
This could be fixed by rewriting waiter.Wait to use the k8s Clock.Sleep, and the test to use FakeClock.Sleep. |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
/lifecycle frozen I hit this one while trying to reproduce the various events-related flakes, so it's still a thing. I thought that maybe increasing the timeout in the test would help, but I seem to still be hitting it... |
Oh, and I reproduced it with this:
|
...and now I can't reproduce it at all! Fun times. |
Yeah, it failing or not depended on what I had for |
I'm looking into this issue and I'm not sure what the testcase wants to test. From the test description it wants to test that the runner will be killed after a millisecond (though it wants to sleep 10 ms?), pipeline/cmd/entrypoint/runner_test.go Line 123 in 0b8349b
and the DeadlineExceeded error can be retuned in 2 places: pipeline/cmd/entrypoint/runner.go Lines 120 to 125 in f83cd1f
pipeline/cmd/entrypoint/runner.go Lines 154 to 158 in f83cd1f
It seems that the DeadlineExceeded can still be returned if the runner really sleep 10 ms? |
The command here that the entrypoint is supposed to be running is "sleep 10ms". This test ensures that the runner can be timed out in the appropriate time. i.e. it wants to ensure that the runner times out after 1ms even though the command is not done.
Maybe? If so, that would mean that our tests could pass because the runner finished running and not because it got timed out correctly, which isn't great. (I suspect our timeout logic works correctly but ideally our tests should tell us this!) To test out whether or not this could happen, you could set the timeout to a very long time, and see if the tests still pass when the "sleep" command completes after 10ms. |
From the test and code, it uses pipeline/cmd/entrypoint/runner.go Line 120 in 09c7285
And if we jump into the Start() , it will return before the process is created to execute the cmd b/c the ctx is already done with timeout err, it this expected?So I'm curious if the test is really testing what it says. Or maybe my understanding is wrong |
/assign |
This commit increase the timer for TestRealRunnerTimeout and hope this could reduce the flake of tektoncd#4643. Some thoughts about why tektoncd#4643 happened.The flaky test got "step didn't timeout", which means that the rr.Run doesn't return any errors, including the DeadlineExceeded error. It could be that the context timeout is accidentaly larger than the sleep time and the Run finishes without context timeout. So I think we may increase the sleep time to avoid this flake. Even it is already a rare case. Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
This commit increase the timer for TestRealRunnerTimeout and hope this could reduce the flake of tektoncd#4643. Some thoughts about why tektoncd#4643 happened.The flaky test got "step didn't timeout", which means that the rr.Run doesn't return any errors, including the DeadlineExceeded error. It could be that the context timeout is accidentaly larger than the sleep time and the Run finishes without context timeout. So I think we may increase the sleep time to avoid this flake. Even it is already a rare case. Signed-off-by: Yongxuan Zhang yongxuanzhang@google.com
Sadly not much info present in log file:
/kind flake
/priority important-soon
The text was updated successfully, but these errors were encountered: