Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - fix: fluvio-test harness bug #2790

Closed
wants to merge 12 commits into from

Conversation

davidbeesley
Copy link
Contributor

@davidbeesley davidbeesley commented Nov 8, 2022

Addresses #2695

A race condition existed in the logic to evaluate whether or not tests run with fluvio-test passed, making it possible for a non-passing-test to appear to succeed.

Also, tests that timed-out were erroneously set as passing. Note that this effect was obscured as default timeout of one-hour was higher than github action timeout.

  1. Added test harness validation tests that "failed to fail" using previous logic.
  2. Added --expect-fail and --expect-timeout flags for tests that are supposed to fail or timeout and added --expect-timeout to longevity tests.
  3. Tightened up logic so as to eliminate race condition.

Note: This fix exposes genuine failing tests, so it will not consistently pass CI pipeline.

validate-test-harness example output (ignoring stderr)

./build-scripts/install_target.sh
cargo build --bin fluvio-test -p fluvio-test   
cargo  build --bin fluvio -p fluvio-cli    
cargo build --bin fluvio-run -p fluvio-run    
echo "clean up previous installation"
clean up previous installation
./target/debug/fluvio cluster delete
./target/debug/fluvio-test expected_pass --local  --cluster-start
Start running fluvio test runner
Starting cluster and testing connection
remove cluster skipped
installing cluster
starting test in child process
Creating the topic: unused
topic "unused" created

Starting example test that passes
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 827.659539ms |
+----------+--------------+
./target/debug/fluvio-test expected_fail --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 2.731820019s |
+----------+--------------+
./target/debug/fluvio-test expected_fail_join_fail_first --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 2.511258335s |
+----------+--------------+
./target/debug/fluvio-test expected_fail_join_success_first --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 925.028775ms |
+----------+--------------+
./target/debug/fluvio-test expected_timeout --timeout 5sec --expect-timeout
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that timeouts
Test timed out as expected after 5 seconds
killing child test pid 1250657 name fluvio-test
killing child test pid 1250686 name fluvio-test
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 5.405322312s |
+----------+--------------+

@davidbeesley davidbeesley changed the title Ci reliability fix: fluvio-test harness bug Nov 8, 2022
@davidbeesley davidbeesley added the CI label Nov 8, 2022
@davidbeesley davidbeesley marked this pull request as ready for review November 8, 2022 22:36
Copy link
Contributor

@sehz sehz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sehz
Copy link
Contributor

sehz commented Nov 8, 2022

bors r+

bors bot pushed a commit that referenced this pull request Nov 8, 2022
A race condition existed in the logic to evaluate whether or not tests run with `fluvio-test` passed, making it possible for a non-passing-test to appear to succeed. 

Also, tests that timed-out were erroneously set as passing. Note that this effect was obscured as default timeout of one-hour was higher than github action timeout.

1. Added test harness validation tests that "failed to fail" using previous logic.
2. Added `--expect-fail` and `--expect-timeout` flags for tests that are supposed to fail or timeout and added `--expect-timeout` to longevity tests.
3. Tightened up logic so as to eliminate race condition.


**Note: This fix exposes genuine failing tests, so it will not consistently pass CI pipeline.**


### validate-test-harness example output (ignoring stderr)

```
./build-scripts/install_target.sh
cargo build --bin fluvio-test -p fluvio-test   
cargo  build --bin fluvio -p fluvio-cli    
cargo build --bin fluvio-run -p fluvio-run    
echo "clean up previous installation"
clean up previous installation
./target/debug/fluvio cluster delete
./target/debug/fluvio-test expected_pass --local  --cluster-start
Start running fluvio test runner
Starting cluster and testing connection
remove cluster skipped
installing cluster
starting test in child process
Creating the topic: unused
topic "unused" created

Starting example test that passes
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 827.659539ms |
+----------+--------------+
./target/debug/fluvio-test expected_fail --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 2.731820019s |
+----------+--------------+
./target/debug/fluvio-test expected_fail_join_fail_first --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 2.511258335s |
+----------+--------------+
./target/debug/fluvio-test expected_fail_join_success_first --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 925.028775ms |
+----------+--------------+
./target/debug/fluvio-test expected_timeout --timeout 5sec --expect-timeout
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that timeouts
Test timed out as expected after 5 seconds
killing child test pid 1250657 name fluvio-test
killing child test pid 1250686 name fluvio-test
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 5.405322312s |
+----------+--------------+

```
@bors
Copy link

bors bot commented Nov 8, 2022

Build failed:

@sehz
Copy link
Contributor

sehz commented Nov 8, 2022

bors r+

bors bot pushed a commit that referenced this pull request Nov 8, 2022
Addresses #2695 

A race condition existed in the logic to evaluate whether or not tests run with `fluvio-test` passed, making it possible for a non-passing-test to appear to succeed. 

Also, tests that timed-out were erroneously set as passing. Note that this effect was obscured as default timeout of one-hour was higher than github action timeout.

1. Added test harness validation tests that "failed to fail" using previous logic.
2. Added `--expect-fail` and `--expect-timeout` flags for tests that are supposed to fail or timeout and added `--expect-timeout` to longevity tests.
3. Tightened up logic so as to eliminate race condition.


**Note: This fix exposes genuine failing tests, so it will not consistently pass CI pipeline.**


### validate-test-harness example output (ignoring stderr)

```
./build-scripts/install_target.sh
cargo build --bin fluvio-test -p fluvio-test   
cargo  build --bin fluvio -p fluvio-cli    
cargo build --bin fluvio-run -p fluvio-run    
echo "clean up previous installation"
clean up previous installation
./target/debug/fluvio cluster delete
./target/debug/fluvio-test expected_pass --local  --cluster-start
Start running fluvio test runner
Starting cluster and testing connection
remove cluster skipped
installing cluster
starting test in child process
Creating the topic: unused
topic "unused" created

Starting example test that passes
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 827.659539ms |
+----------+--------------+
./target/debug/fluvio-test expected_fail --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 2.731820019s |
+----------+--------------+
./target/debug/fluvio-test expected_fail_join_fail_first --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 2.511258335s |
+----------+--------------+
./target/debug/fluvio-test expected_fail_join_success_first --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 925.028775ms |
+----------+--------------+
./target/debug/fluvio-test expected_timeout --timeout 5sec --expect-timeout
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that timeouts
Test timed out as expected after 5 seconds
killing child test pid 1250657 name fluvio-test
killing child test pid 1250686 name fluvio-test
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 5.405322312s |
+----------+--------------+

```
@bors
Copy link

bors bot commented Nov 8, 2022

Build failed:

@davidbeesley
Copy link
Contributor Author

bors r+

bors bot pushed a commit that referenced this pull request Nov 8, 2022
Addresses #2695 

A race condition existed in the logic to evaluate whether or not tests run with `fluvio-test` passed, making it possible for a non-passing-test to appear to succeed. 

Also, tests that timed-out were erroneously set as passing. Note that this effect was obscured as default timeout of one-hour was higher than github action timeout.

1. Added test harness validation tests that "failed to fail" using previous logic.
2. Added `--expect-fail` and `--expect-timeout` flags for tests that are supposed to fail or timeout and added `--expect-timeout` to longevity tests.
3. Tightened up logic so as to eliminate race condition.


**Note: This fix exposes genuine failing tests, so it will not consistently pass CI pipeline.**


### validate-test-harness example output (ignoring stderr)

```
./build-scripts/install_target.sh
cargo build --bin fluvio-test -p fluvio-test   
cargo  build --bin fluvio -p fluvio-cli    
cargo build --bin fluvio-run -p fluvio-run    
echo "clean up previous installation"
clean up previous installation
./target/debug/fluvio cluster delete
./target/debug/fluvio-test expected_pass --local  --cluster-start
Start running fluvio test runner
Starting cluster and testing connection
remove cluster skipped
installing cluster
starting test in child process
Creating the topic: unused
topic "unused" created

Starting example test that passes
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 827.659539ms |
+----------+--------------+
./target/debug/fluvio-test expected_fail --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 2.731820019s |
+----------+--------------+
./target/debug/fluvio-test expected_fail_join_fail_first --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 2.511258335s |
+----------+--------------+
./target/debug/fluvio-test expected_fail_join_success_first --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 925.028775ms |
+----------+--------------+
./target/debug/fluvio-test expected_timeout --timeout 5sec --expect-timeout
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that timeouts
Test timed out as expected after 5 seconds
killing child test pid 1250657 name fluvio-test
killing child test pid 1250686 name fluvio-test
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 5.405322312s |
+----------+--------------+

```
@bors
Copy link

bors bot commented Nov 9, 2022

Build failed:

@davidbeesley
Copy link
Contributor Author

bors r+

bors bot pushed a commit that referenced this pull request Nov 9, 2022
Addresses #2695 

A race condition existed in the logic to evaluate whether or not tests run with `fluvio-test` passed, making it possible for a non-passing-test to appear to succeed. 

Also, tests that timed-out were erroneously set as passing. Note that this effect was obscured as default timeout of one-hour was higher than github action timeout.

1. Added test harness validation tests that "failed to fail" using previous logic.
2. Added `--expect-fail` and `--expect-timeout` flags for tests that are supposed to fail or timeout and added `--expect-timeout` to longevity tests.
3. Tightened up logic so as to eliminate race condition.


**Note: This fix exposes genuine failing tests, so it will not consistently pass CI pipeline.**


### validate-test-harness example output (ignoring stderr)

```
./build-scripts/install_target.sh
cargo build --bin fluvio-test -p fluvio-test   
cargo  build --bin fluvio -p fluvio-cli    
cargo build --bin fluvio-run -p fluvio-run    
echo "clean up previous installation"
clean up previous installation
./target/debug/fluvio cluster delete
./target/debug/fluvio-test expected_pass --local  --cluster-start
Start running fluvio test runner
Starting cluster and testing connection
remove cluster skipped
installing cluster
starting test in child process
Creating the topic: unused
topic "unused" created

Starting example test that passes
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 827.659539ms |
+----------+--------------+
./target/debug/fluvio-test expected_fail --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 2.731820019s |
+----------+--------------+
./target/debug/fluvio-test expected_fail_join_fail_first --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 2.511258335s |
+----------+--------------+
./target/debug/fluvio-test expected_fail_join_success_first --expect-fail
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that fails
test failed as expected, signaling parent
test complete, signaling to parent
Test passed
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 925.028775ms |
+----------+--------------+
./target/debug/fluvio-test expected_timeout --timeout 5sec --expect-timeout
Start running fluvio test runner
Testing connection to Fluvio cluster in profile
starting test in child process
Creating the topic: unused
topic "unused" already exists

Starting example test that timeouts
Test timed out as expected after 5 seconds
killing child test pid 1250657 name fluvio-test
killing child test pid 1250686 name fluvio-test
+----------+--------------+
| Pass?    | true         |
|----------+--------------|
| Duration | 5.405322312s |
+----------+--------------+

```
@bors
Copy link

bors bot commented Nov 9, 2022

Pull request successfully merged into master.

Build succeeded:

@bors bors bot changed the title fix: fluvio-test harness bug [Merged by Bors] - fix: fluvio-test harness bug Nov 9, 2022
@bors bors bot closed this Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants