-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: fix multiple race conditions in simulated time tests #12527
Conversation
cc @jmarantz @wrowe @sunjayBhatia This is not done and I'm still working through various issues but I wanted to let you see my current progress. I think the idea here is sound, however see my comment around |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cross-referencing #12539 which tries to switch integration tests to using await() rather than condvar, for more robust operation.
b8e85f7
to
3b90839
Compare
@jmarantz I'm about to quit for today but this passes except for 2 tests which don't compile which should not be difficult to fix. From a test perspective this is a pretty scary change, but overall I think this makes everything much simpler to reason about and cleans up a bunch of stuff. Feel free to start reviewing and helping me to fix things. |
Running TSAN and seeing some errors. None of them look too bad to fix so will work on that next. |
3b90839
to
9add2b7
Compare
No good deed goes unpunished. The TSAN issues are internal to abseil. I think they were recently fixed with:
But now when I pull current abseil there are TSAN errors without any other changes: see abseil/abseil-cpp#760 |
9add2b7
to
3d947c9
Compare
@jmarantz this passes all tests for me locally now under fastbuild and tsan. There are some flakes that I have hit. It's unclear if they are new or if they are pre-existing and exacerbated by the alternate TSAN lock implementation we are now using. It will be better to merge the other PR with the abseil bump first and see how that goes. |
3d947c9
to
30c6ef4
Compare
30c6ef4
to
af6e1fd
Compare
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
@jmarantz this is passing all tests on fastbuild and I think is ready for real review. I'm going to start looking for flakes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flushing comments; mostly nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great; just a few nits, mostly about clarity and comments.
@jmarantz updated. Great suggestion about the time bounds class. Much cleaner! |
ARM flake is a known different issue #12638 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you for finally cleaning this up!
Up to you if you want to apply the syntactic tweaks or just leave that for next time.
auto thread = Thread::threadFactoryForTest().createThread([this, &mutex, &done]() { | ||
for (;;) { | ||
{ | ||
absl::MutexLock lock(&mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taste test, as this syntax works now (for you golang fans):
for (;;) {
if (absl::MutexLock lock(&mutex); done) {
return;
}
base_scheduler_.run(Dispatcher::RunType::Block);
}
Looking at this code is the first time it occurred to me to use it in C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah that's good. I will fix that in a follow up. I want to get this merged so we can see how we are doing with flakes.
auto thread = Thread::threadFactoryForTest().createThread([this, &mutex, &done]() { | ||
for (;;) { | ||
{ | ||
absl::MutexLock lock(&mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golang syntax here if you like.
public: | ||
template <class D> | ||
RealTimeBound(const D& duration) | ||
: end_time_(std::chrono::steady_clock::now() + duration) // NO_CHECK_FORMAT(real_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm guessing you were swayed to use this style by the convenience of not bothering to pass timeSystem() into the ctor here.
Regardless; this turned out really well. Thanks!
* master: (67 commits) logger: support log control in admin interface and command line option for Fancy Logger (envoyproxy#12369) test: fix http_timeout_integration_test flake (envoyproxy#12654) [fuzz]added an input check in writefilter fuzzer and added test cases (envoyproxy#12628) add 'explicit' restriction. (envoyproxy#12643) scoped_rds_integration_test migrate from api v2 to api v3. (envoyproxy#12633) fuzz: added fuzz test for listener filter tls_inspector (envoyproxy#12617) testing: fix multiple race conditions in simulated time tests (envoyproxy#12527) [tls] Move handshaking behavior into SslSocketInfo. (envoyproxy#12571) header: getting rid of exception-throwing behaviors in header files [the rest] (envoyproxy#12611) router: add new ratelimited retry backoff strategy (envoyproxy#12202) [redis_proxy] added a constraint for route.prefix().size() (envoyproxy#12637) network: add tcp listener backlog config (envoyproxy#12625) runtime: debug log that condition is always true when fractionalPercent numerator > denominator (envoyproxy#12068) WatchDog Extension hook (envoyproxy#12416) router: add dynamic metadata header formatter (envoyproxy#11858) statsd: revert visibility to public (envoyproxy#12621) Fix regression of /build_* in gitignore (envoyproxy#12630) Added a missing extension point to documentation. (envoyproxy#12620) Reverts proxy protocol test on windows (envoyproxy#12619) caching: Improved the tests and coverage of the CacheFilter tree (envoyproxy#12544) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
This PR fixes multiple race conditions in tests. The summary is:
all time systems. This means that all network operations are now
"instantaneous" and makes all time advances for alarms explicit. This
required fixes in a few tests but should make simulated time much easier
to reason about.
Fixes #12480
Fixes #10568
Risk Level: None for prod code, high for tests
Testing: Existing and fixed tests
Docs Changes: N/A
Release Notes: N/A