Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chan.Part.API: reduce flakiness after removing system channel #2111

Merged
merged 1 commit into from
Nov 16, 2020

Conversation

wlahti
Copy link
Contributor

@wlahti wlahti commented Nov 11, 2020

Type of change

  • Test update

Description

Perform the transaction submission for the application channel that existed before removing the system channel at the end of the test. This gives that channel more time to find a raft leader before we check that it's functioning properly. This seems to reduce the flakiness but it still needs work.

Related issues

FAB-18305

@wlahti wlahti requested a review from a team as a code owner November 11, 2020 21:00
@wlahti wlahti changed the title Chan.Part.API: avoid flakiness after removing system channel Chan.Part.API: reduce flakiness after removing system channel Nov 11, 2020
@wlahti wlahti force-pushed the fab-18305 branch 2 times, most recently from 0b5d95f to b7a1f71 Compare November 12, 2020 12:52
@wlahti
Copy link
Contributor Author

wlahti commented Nov 12, 2020

This isn't a fix yet but it does reduce the frequency of the flake. The issue is we don't have a reliable way to ensure a Raft leader has been elected for the channel participation tests. The existing findLeader helper works with a single channel and when that channel stays around. With the ability to remove channels using the channel participation API, especially the system channel which results in stopping/restarting all of the channels and a new wave of leader elections, we can't reliably use that helper and need some other way to verify a leader has been elected to fully avoid this (and similar) flakes.

For the record, this has run successfully three times so far. Going to try one more and then see if we can get this in.

@wlahti
Copy link
Contributor Author

wlahti commented Nov 12, 2020

Seems like part of the flakiness is around orderer3 sometimes not realizing its been evicted from the channel (removed from the consenter set) in time. Currently seeing if reducing the EvictionSuspicion from 10s to 5s helps there.

Rolling success count: 5

@wlahti
Copy link
Contributor Author

wlahti commented Nov 12, 2020

Alright, the test itself worked 5 times in a row with the EvictionSuspicion config change. Going to clean this up and move it to review. This still isn't the total long term fix but it should greatly reduce the frequency of the flakes in this test for now.

@wlahti wlahti marked this pull request as ready for review November 12, 2020 18:27
jyellick
jyellick previously approved these changes Nov 16, 2020
Perform the transaction submission for the application channel that
existed before removing the system channel at the end of the test. This
gives those chains more time to find a raft leader before they are used.

Also change the EvictionSuspicion time from 10s to 5s as the orderer
occasionally doesn't notice its been evicted before the test times out.

FAB-18305

Signed-off-by: Will Lahti <wtlahti@us.ibm.com>
@jyellick jyellick merged commit cb683e5 into hyperledger:master Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants