-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: multi nat initialization causing dead lock in waku tests + serialize test runs to avoid timing and port occupied issues #2799
Conversation
You can find the image built from this PR at
Built from 7ade6c2 |
You can find the image built from this PR at
Built from 7ade6c2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazingg, thanks so much!
Co-authored-by: gabrielmer <101006718+gabrielmer@users.noreply.github.com>
a0ec207
to
64db37a
Compare
…ming and port allocation issues
@Ivansete-status, @gabrielmer In addition to the original PR I still encountered flaky tests mostly on mac-os. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zoltan the 🕵️.
Very nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks! Just added a comment to split it in separate PRs
sudo docker run --rm -d -e POSTGRES_PASSWORD=test123 -p 5432:5432 postgres:15.4-alpine3.18 | ||
postgres_enabled=1 | ||
fi | ||
|
||
export MAKEFLAGS="-j1" | ||
export NIMFLAGS="--colors:off -d:chronicles_colors:none" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That LGTM but shall we add that in a separate PR so that is clear the commit that applied it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That LGTM but shall we add that in a separate PR so that is clear the commit that applied it?
@NagyZoltanPeter - maybe easier to just update the PR title and description to also reflect the CI test change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-) agree! I do it.
…lize test runs to avoid timing and port occupied issues (#2799) * Prevent multiple nat module initialization that cause dead lock in nat refresh thread tear down during tests. * NPROC to 1 to avoid parallel test runs can lead to timing and port allocation issues Co-authored-by: gabrielmer <101006718+gabrielmer@users.noreply.github.com>
Description
Multiple NAT module initialization cause dead lock in tests
In case nat setup found proper device to make the port mapping (that does not happens in jenkins CI) could cause multiple initilalization of nim-eth/nat module.
That module is not designed for that and changing this needs a bigger rework of that module.
The root cause of the issue with multiple initialization inside one application run leads to multiple remapping thread created. That thread is responsible for refreshing the port mapping on the router if needed.
As such multiple thread created but only the last one tracked caused dead lock in the shut down mechanism of it as the used Channel[bool] single module variable locking mechanism do not handle such situation and remains blocked.
Simple workaround is applied: waku nat module prevents multiple initialization of nim-eth/nat module and will behave as no proper device would be found (which is still an ok case for testing).
Reduce tests flakyness
In order to reduce probability of timing issue during CI test runs also possibility of failed tests because of ports already in use we made test execution sequential.
How to test
Issue
#2628