Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deflaked: Guarddog_impl_test #14475

Merged
merged 3 commits into from
Dec 22, 2020
Merged

Conversation

KBaichoo
Copy link
Contributor

@KBaichoo KBaichoo commented Dec 17, 2020

Deflaked guarddog_impl_test; on overloaded machines it would sometimes flake under the real time system with both threads to included in the miss / megamiss events when only one should be.

Tested via 2000 runs, of which it all passed. Prior it would flake a few (~1-5)/1000 on slower / overloaded machines.

Signed-off-by: Kevin Baichoo kbaichoo@google.com

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Commit Message: Deflaked guarddog_impl_test under real time system.
Additional Description:
Risk Level: low
Testing: test change.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]

…s flake with both threads to included in the miss / megamiss events when only one should be.

Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
@KBaichoo
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #14475 (comment) was created by @KBaichoo.

see: more, trace.

@@ -686,8 +692,14 @@ TEST_P(GuardDogActionsTest, MegaMissShouldOnlyReportRelevantThreads) {
// This will reset the loop interval timer, and should help us
// synchronize with the guard dog.
guard_dog_->forceCheckForTest();
// Touch the second_dog in case we overslept in the real time system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do this extra petting only when this test is run in real-time mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that if you really want, but the extra petting it shouldn't affect the simulated time test much as we're really just trying to get one of the watch dogs petted and have the other not petted over the given threshold, so that when we do the final check the latter one triggers the event.

Petting the former more often doesn't change that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I suggest that is that part of the goal of the simulated time test is that it should be completely deterministic in its behavior regardless of how loaded the testing machine is.

I think this workaround that's specific to real-time means we don't gain the confidence that the simtime version is 100% deterministic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jmarantz
Copy link
Contributor

/wait

Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
…impl

Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
@jmarantz jmarantz merged commit 484a10d into envoyproxy:master Dec 22, 2020
mpuncel added a commit to mpuncel/envoy that referenced this pull request Dec 22, 2020
* master: (30 commits)
  Deflaked: Guarddog_impl_test (envoyproxy#14475)
  [fuzz] add fuzz tests for hpack encoding and decoding (envoyproxy#13315)
  [filters] Prevent a filter from sending local reply and continue (envoyproxy#14416)
  oauth2: improving coverage (envoyproxy#14479)
  owners: Change dio email address (envoyproxy#14498)
  macos build: Fix ninja install (envoyproxy#14495)
  http: use OptRef helper to reduce some boilerplate (envoyproxy#14361)
  doc: update test/integration/README.md (envoyproxy#14485)
  server: wait workers to start before draining parent. (envoyproxy#14319)
  api: relax inline_string length limitation in DataSource (envoyproxy#14461)
  oauth: properly stop filter chain when a response was sent (envoyproxy#14476)
  listener: deprecate use_proxy_proto (envoyproxy#14406)
  deps: update cel and remove a patch (envoyproxy#14473)
  preconnect: rename: (envoyproxy#14474)
  coverage: ratcheting limits (envoyproxy#14472)
  grpc mux: fix sending node again after stream is reset (envoyproxy#14080)
  [test] Replace printers_include with printers_lib. (envoyproxy#14442)
  tcp: nodelay in the new pool (envoyproxy#14453)
  test: replace mock_methodn macros with mock_method (envoyproxy#14450)
  tcp: extending tcp integration test (envoyproxy#14451)
  ...

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants