Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expect exit code enhancement #14672

Merged
merged 1 commit into from
Nov 14, 2022
Merged

Expect exit code enhancement #14672

merged 1 commit into from
Nov 14, 2022

Conversation

tjungblu
Copy link
Contributor

@tjungblu tjungblu commented Nov 1, 2022

ExpectProcess and ExpectFunc now take the exit code of the process into account, not just the matching of the tty output.

Signed-off-by: Thomas Jungblut tjungblu@redhat.com

I'm still fixing some of the tests that are now failing due to this.

@serathius
Copy link
Member

serathius commented Nov 1, 2022

Nice, one additional option to consider is returning process status so the caller can decide based on exit code. Example linearizability tests expect process to fail when failpoint is injected.

@tjungblu
Copy link
Contributor Author

tjungblu commented Nov 1, 2022

added the status code as well, I think this is going to become a bigger refactor since we there was no exit code assumption beforehand. Some tests still don't run, I also think there's some locking issue. Gotta continue tomorrow.

@ahrtr
Copy link
Member

ahrtr commented Nov 2, 2022

Thanks for working on this.

It seems the PR caused lots of workflow failures. Usually 1~2 failures might be caused by flaky test. But 6 failures most likely mean that it's caused by the PR. Please take a look at the failures.

@tjungblu
Copy link
Contributor Author

tjungblu commented Nov 2, 2022

Yeah absolutely, this is caused by several commands now failing the expect due to the exit codes.

@tjungblu tjungblu force-pushed the etcd-14638 branch 10 times, most recently from fae5399 to e564641 Compare November 2, 2022 16:37
pkg/expect/expect.go Outdated Show resolved Hide resolved
@tjungblu tjungblu force-pushed the etcd-14638 branch 5 times, most recently from b7e3021 to 5289e73 Compare November 3, 2022 12:48
@tjungblu
Copy link
Contributor Author

tjungblu commented Nov 3, 2022

@ahrtr @serathius finally managed to get all to green. There are a couple of weird choices I've made around the error assertions, so let me know what you think about it.

if l != "" {
if printDebugLines {
fmt.Printf("%s (%s) (%d): %s", ep.cmd.Path, ep.cfg.name, ep.cmd.Process.Pid, l)
err := func() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for anonymous function

Copy link
Contributor Author

@tjungblu tjungblu Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disagree, this makes the locking more obvious in keeping the unlock scoped to the block.

ep.mu.Lock()
defer ep.mu.Unlock()

I think this could be another func however, if that's a more viable approach for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

return ep.exitCode
}

return math.MinInt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is weird decision. would prefer to return an error or some known error code like 255

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, will change this

t.Fatalf("defrag_test: defrag error (%v)", err)
err = cc.Defragment(ctx, options)
if err != nil {
require.ErrorContains(t, err, "Finished defragmenting etcd")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why defragment command fails here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odd, works again. Reverted that.

@@ -47,13 +48,19 @@ func testDowngradeUpgrade(t *testing.T, clusterSize int) {
t.Skipf("%q does not exist", lastReleaseBinary)
}

currentVersion := semver.New(version.Version)
lastVersion := semver.Version{Major: currentVersion.Major, Minor: currentVersion.Minor - 1}
currentVersion, err := getVersionFromBinary(currentEtcdBinary)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like unrelated changes from other PR. Please rebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can split this into another commit, but it was never passing for me locally. Not sure if that works on your box.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that after rebase those changes will not be needed/present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, the version inconsistency I run into still exists in main:
https://github.com/etcd-io/etcd/blob/main/tests/e2e/cluster_downgrade_test.go#L50-L55

The issue stems from the fact that the logic takes what is assumed to be in the binary, not what is actually reported by it. If you check the diff, it's already against the latest revision of the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a separate PR to fix the downgrade test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'll pull this out of that PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created it: #14710

@serathius
Copy link
Member

Great work overall. I expect this will be crucial to reduce flakiness of e2e tests. One think I would like to fix before we merge is separation between Wait and Close. They should behave like their process level counterparts.

@tjungblu tjungblu force-pushed the etcd-14638 branch 2 times, most recently from bb5724c to ea584b2 Compare November 9, 2022 12:51
pkg/expect/expect.go Outdated Show resolved Hide resolved
Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is a big change so lets get also feedback from other maintainers.
cc @ahrtr @spzala @ptabor

@tjungblu
Copy link
Contributor Author

tjungblu commented Nov 9, 2022

Thanks Marek, I'm still fixing the remainder of the tests that are failing due to the changes

@tjungblu tjungblu force-pushed the etcd-14638 branch 4 times, most recently from 82f22b4 to 38ff2ce Compare November 9, 2022 17:51
@ahrtr ahrtr self-requested a review November 10, 2022 00:49
@tjungblu
Copy link
Contributor Author

oh wow, almost all green. The last check seems to be an unrelated flake:

2022-11-09T18:00:33.8660828Z logger.go:130: 2022-11-09T17:57:58.788Z ERROR m0 failed to update storage version {"member": "m0", "cluster-version": "3.6.0", "error": "cannot detect storage schema version: missing confstate information"}

I'm just testing whether the e2e test got a lot longer than before, otherwise this is good to go from my side.

@serathius
Copy link
Member

oh wow, almost all green.

Great job!

The last check seems to be an unrelated flake:

Rerun the failed scenario

I'm just testing whether the e2e test got a lot longer than before, otherwise this is good to go from my side.

Correctness here is more important then time. Let's revisit it when time becomes a problem.

@tjungblu
Copy link
Contributor Author

Thanks for the rerun. It's almost the same runtime on e2e on my local box. I'll keep an eye on the execution times on the workflow over the next couple days.

Comment on lines 138 to 142
if err != nil {
return err
}

return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
return err
}
return nil
return err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return fmt.Errorf("failed to remove member: %w", err)

if !memberRemoved {
return fmt.Errorf("failed to remove member after 10 tries")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: if there is no parameter, then you'd better use errors.New instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ahrtr
Copy link
Member

ahrtr commented Nov 14, 2022

This PR is based on an old commit tjungblu@0d4a516, please rebase this PR, then I will take another look.

ExpectProcess and ExpectFunc now take the exit code of the process into
account, not just the matching of the tty output.

This also refactors the many tests that were previously succeeding on
matching an output from a failing cmd execution.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
@tjungblu
Copy link
Contributor Author

@ahrtr just rebased, PTAL.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

Great work. Thank you @tjungblu

@tjungblu
Copy link
Contributor Author

Thanks @ahrtr. Please ping me on any flakes and perf regressions, I'll monitor this throughout the week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants