Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple "Error 13" message improvement #666

Merged
merged 78 commits into from
Apr 1, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
8ea442d
Merge pull request #5 from envoyproxy/master
eric846 Jun 1, 2020
5ac755a
Merge pull request #6 from envoyproxy/master
eric846 Jun 28, 2020
b8c25a5
Merge pull request #7 from envoyproxy/master
eric846 Jul 7, 2020
1c19c68
initial commit
eric846 Jul 9, 2020
7050686
fix comments
eric846 Jul 9, 2020
0776563
fix format
eric846 Jul 9, 2020
16fd8f6
rename adaptive_rps to adaptive_load
eric846 Jul 10, 2020
c383010
add field_selector in example
eric846 Jul 10, 2020
6e1a483
fix example comment
eric846 Jul 10, 2020
4ef1140
fix format
eric846 Jul 10, 2020
4111bf4
add support for fault injection headers
eric846 Jul 10, 2020
871a959
replace linear and binary search with exponential search
eric846 Jul 10, 2020
1fd77c1
add InputVariableSetter mechanism
eric846 Jul 11, 2020
edc36b2
add input variable setter to build file
eric846 Jul 11, 2020
4d0364e
fix syntax errors
eric846 Jul 11, 2020
aed6d94
rename samples/adaptive_rps
eric846 Jul 11, 2020
d9ae87d
improve comments, change step controller initial value from int64 to …
eric846 Jul 12, 2020
a05a6f5
add proto validation rules, fix comments, make rps the default input_…
eric846 Jul 13, 2020
8cd4d21
fix comment wording
eric846 Jul 13, 2020
d814a96
simplify protos, add defaults, specify required or optional
eric846 Jul 14, 2020
5f5a885
add missing newline
eric846 Jul 14, 2020
7e20a78
Kick CI
eric846 Jul 14, 2020
9048267
simplify protos
eric846 Jul 15, 2020
306c0ec
fix format
eric846 Jul 15, 2020
d33f543
fix some optional field comments and rules
eric846 Jul 15, 2020
442cca9
Merge pull request #10 from envoyproxy/master
eric846 Jul 16, 2020
677b783
add Nighthawk status field in BenchmarkResult as nested nighthawk.cli…
eric846 Jul 19, 2020
cefb366
switch to standard Envoy plugin config proto, add prefix to internal …
eric846 Jul 22, 2020
f3684df
Merge remote-tracking branch 'upstream/master' into adaptive-rps-protos2
eric846 Jul 22, 2020
5463051
create headers
eric846 Jul 22, 2020
46e0e25
fix format
eric846 Jul 22, 2020
f634642
use docstring format
eric846 Jul 22, 2020
3c39faa
fix typos in comments
eric846 Jul 23, 2020
b9c8f2b
split build target, get rid of ostream, change InputValueSetter to us…
eric846 Jul 24, 2020
5fc4db4
remove nested namespace, remove redundant _include in target names
eric846 Jul 26, 2020
64e7852
merge from upstream
eric846 Jul 29, 2020
12807f1
Merge remote-tracking branch 'upstream/master' into adaptive-rps-headers
eric846 Jul 29, 2020
e8e960f
merge from upstream
eric846 Aug 27, 2020
6306b4e
Merge remote-tracking branch 'upstream/master' into master2
eric846 Aug 27, 2020
1ece783
Merge remote-tracking branch 'upstream/master' into master2
eric846 Aug 28, 2020
70705e9
Merge remote-tracking branch 'upstream/master' into master2
eric846 Aug 31, 2020
e576bc1
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 1, 2020
1fca528
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 3, 2020
ed32856
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 3, 2020
eecf00d
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 8, 2020
13179fb
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 10, 2020
3eb89e6
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 11, 2020
fae60ad
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 12, 2020
9fec200
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 14, 2020
9d1cf53
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 15, 2020
cfad844
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 19, 2020
bdde910
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 22, 2020
8100376
Merge remote-tracking branch 'upstream/master' into master2
eric846 Sep 28, 2020
694fda6
Merge remote-tracking branch 'upstream/master' into master2
eric846 Dec 29, 2020
1df6137
Merge remote-tracking branch 'upstream/main'
eric846 Jan 21, 2021
2b9e1e8
Merge remote-tracking branch 'upstream/main'
eric846 Jan 26, 2021
a6da150
Merge remote-tracking branch 'upstream/main'
eric846 Feb 3, 2021
4b8ba84
Merge remote-tracking branch 'upstream/main'
eric846 Feb 4, 2021
eaa20de
Merge remote-tracking branch 'upstream/main'
eric846 Feb 11, 2021
aedd520
Merge remote-tracking branch 'upstream/main'
eric846 Feb 11, 2021
4ec202a
Merge remote-tracking branch 'upstream/main'
eric846 Feb 23, 2021
f42c374
Merge remote-tracking branch 'upstream/main'
eric846 Feb 25, 2021
be5c0fa
Merge remote-tracking branch 'upstream/main'
eric846 Mar 8, 2021
7da182a
Merge remote-tracking branch 'upstream/main'
eric846 Mar 10, 2021
d3e5545
Merge remote-tracking branch 'upstream/main'
eric846 Mar 11, 2021
2f44065
Merge remote-tracking branch 'upstream/main'
eric846 Mar 15, 2021
190a691
Merge remote-tracking branch 'upstream/main'
eric846 Mar 17, 2021
74de764
Merge remote-tracking branch 'upstream/main'
eric846 Mar 18, 2021
c2baebd
Merge remote-tracking branch 'upstream/main'
eric846 Mar 18, 2021
da83218
Merge remote-tracking branch 'upstream/main'
eric846 Mar 22, 2021
e78adac
Merge remote-tracking branch 'upstream/main'
eric846 Mar 25, 2021
f79c078
Merge remote-tracking branch 'upstream/main'
eric846 Mar 29, 2021
7e6a810
log troubleshooting tips on URI and failure predicate errors; add fai…
eric846 Mar 29, 2021
8f163b4
mention Nighthawk Service logs in Error 13 message
eric846 Mar 29, 2021
39a68ad
return tips in Nighthawk Service response
eric846 Mar 29, 2021
a27dfbb
Merge remote-tracking branch 'upstream/main'
eric846 Mar 30, 2021
3a171ee
Merge branch 'main' into error13b
eric846 Mar 30, 2021
5520728
split large log text into small error and large info
eric846 Apr 1, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions source/client/options_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -672,6 +672,7 @@ void OptionsImpl::setNonTrivialDefaults() {
failure_predicates_["benchmark.http_4xx"] = 0;
failure_predicates_["benchmark.http_5xx"] = 0;
failure_predicates_["benchmark.pool_connection_failure"] = 0;
failure_predicates_["benchmark.stream_resets"] = 0;
// Also, fail fast when a remote request source is specified that we can't connect to or otherwise
// fails.
failure_predicates_["requestsource.upstream_rq_5xx"] = 0;
Expand Down
24 changes: 22 additions & 2 deletions source/client/process_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -598,7 +598,23 @@ bool ProcessImpl::runInternal(OutputCollector& collector, const std::vector<UriP
StatisticFactoryImpl statistic_factory(options_);
collector.addResult("global", mergeWorkerStatistics(workers_), counters,
total_execution_duration / workers_.size(), first_acquisition_time);
return counters.find("sequencer.failed_terminations") == counters.end();
if (counters.find("sequencer.failed_terminations") == counters.end()) {
return true;
} else {
ENVOY_LOG(error, "Terminated early because of a failure predicate.");
ENVOY_LOG(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) How do you feel about splitting this into two log statements - one error log "Terminated early because of a failure predicate" and a second log (info/debug/verbose/something) with all of these extra details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

info,
"Check the output for problematic counter values. The default Nighthawk failure predicates "
"report failure if (1) Nighthawk could not connect to the target (see "
"'benchmark.pool_connection_failure' counter; check the address and port number, and try "
"explicitly setting --address-family v4 or v6, especially when using DNS; instead of "
"localhost try 127.0.0.1 or ::1 explicitly), (2) the protocol was not supported by the "
"target (see 'benchmark.stream_resets' counter; check http/https in the URI, --h2), (3) "
"the target returned a 4xx or 5xx HTTP response code (see 'benchmark.http_4xx' and "
"'benchmark.http_5xx' counters; check the URI path and the server config), or (4) a custom "
"gRPC RequestSource failed. --failure-predicate can be used to relax expectations.");
return false;
}
}

bool ProcessImpl::run(OutputCollector& collector) {
Expand Down Expand Up @@ -632,7 +648,11 @@ bool ProcessImpl::run(OutputCollector& collector) {
tracing_uri->resolve(*dispatcher_,
Utility::translateFamilyOptionString(options_.addressFamily()));
}
} catch (const UriException&) {
} catch (const UriException& ex) {
ENVOY_LOG(error,
"URI exception (for example, malformed URI syntax, bad "
"MultiTarget path, unresolvable host DNS): {}",
ex.what());
return false;
}

Expand Down
13 changes: 12 additions & 1 deletion source/client/service_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,18 @@ void ServiceImpl::handleExecutionRequest(const nighthawk::client::ExecutionReque
response.mutable_error_detail()->set_code(grpc::StatusCode::INTERNAL);
// TODO(https://github.com/envoyproxy/nighthawk/issues/181): wire through error descriptions, so
// we can do better here.
response.mutable_error_detail()->set_message("Unknown failure");
response.mutable_error_detail()->set_message(
"Unknown failure. See Nighthawk Service logs. Make sure the URI is well formed and the DNS "
"name resolves (if applicable). Check the output for problematic counter values. The "
"default Nighthawk failure predicates report failure if (1) Nighthawk could not connect to "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there was a lot of churn in this investigation, so please feel free to push back.

Were we unable to identify where the predicates are resulting in failures? This feels like it'd be much better if we could instead log at the point of a predicate failing what predicate was failing and what its value was set to. Is that not possible?

Copy link
Contributor Author

@eric846 eric846 Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Termination predicates are based on counters which don't seem to store their own names. See termination_predicate_impl.h. We could log the first time each counter was incremented in benchmark_client_impl.cc, without knowing whether it will trigger a failure predicate. Logging in either place could impact performance unless we add logic to bypass logging on subsequent times. The user would have to check the log for the detailed info since it would take more code to propagate the information to the point where we return from the service.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is ENVOY_LOG_ONCE these days... but thinking about this, maybe if we'd allow wiring in some sort of callback when constructing predicates, which would fire when one indicates termination, that could be helpful here. There is one gotcha to that, which is that one part of the chain might conditionally override the return value of another. This is hypothetical with the current predicate implementations, but conceptually it doesn't feel right to me. So thinking out loud some more, maybe this callback should be specific to a new top level predicate implementation, which never indicates termination all by itself - it relies on the next links in the chain to figure that out -- but is just there to serialize what happened in the chain when termination commenced, for the purpose of communicating that upwards. It think it would be useful to be able to communicate this in a structured way, especially in the gRPC flows.
TLDR; the current state seems like a good enhancement to me, perhaps punt improving predicates to do a better job at facilitating UX to an issue and discuss further out of the context of this issue if need be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that we can go forward without fixing this in this PR. Creating an issue for improving termination predicates (with logging or something) sounds like a good idea to me.

"the target (see 'benchmark.pool_connection_failure' counter; check the address and port "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of whether or not we make the above change, can we create a TODO/issue a bug for moving this to a documentation location for common trouble areas (similar to our internal playbooks)? This whole log feels like it really belongs in a markdown document rather than in a log statement.

(Afterwards, we can instead recommend that people reference our troubleshooting document, instead of including everything in this log statement)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #668.

"number, and try explicitly setting --address-family v4 or v6, especially when using DNS; "
"instead of localhost try 127.0.0.1 or ::1 explicitly), (2) the protocol was not supported "
"by the target (see 'benchmark.stream_resets' counter; check http/https in the URI, --h2), "
"(3) the target returned a 4xx or 5xx HTTP response code (see 'benchmark.http_4xx' and "
"'benchmark.http_5xx' counters; check the URI path and the server config), or (4) a custom "
"gRPC RequestSource failed. To relax expectations, set explicit failure predicates in the "
"benchmark request.");
}
*(response.mutable_output()) = output_collector.toProto();
process.shutdown();
Expand Down
3 changes: 3 additions & 0 deletions test/options_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,7 @@ TEST_P(RequestSourcePluginTestFixture, CreatesOptionsImplWithRequestSourceConfig
// comparison below.
EXPECT_EQ(1, command->mutable_failure_predicates()->erase("benchmark.http_4xx"));
EXPECT_EQ(1, command->mutable_failure_predicates()->erase("benchmark.http_5xx"));
EXPECT_EQ(1, command->mutable_failure_predicates()->erase("benchmark.stream_resets"));
EXPECT_EQ(1, command->mutable_failure_predicates()->erase("requestsource.upstream_rq_5xx"));

// TODO(#433)
Expand Down Expand Up @@ -438,6 +439,7 @@ TEST_F(OptionsImplTest, TlsContext) {
// comparison below.
EXPECT_EQ(1, cmd->mutable_failure_predicates()->erase("benchmark.http_4xx"));
EXPECT_EQ(1, cmd->mutable_failure_predicates()->erase("benchmark.http_5xx"));
EXPECT_EQ(1, cmd->mutable_failure_predicates()->erase("benchmark.stream_resets"));
EXPECT_EQ(1, cmd->mutable_failure_predicates()->erase("requestsource.upstream_rq_5xx"));
// TODO(#433)
OptionsImpl options_from_proto(*cmd);
Expand Down Expand Up @@ -500,6 +502,7 @@ TEST_F(OptionsImplTest, MultiTarget) {
// textual comparison below.
EXPECT_EQ(1, cmd->mutable_failure_predicates()->erase("benchmark.http_4xx"));
EXPECT_EQ(1, cmd->mutable_failure_predicates()->erase("benchmark.http_5xx"));
EXPECT_EQ(1, cmd->mutable_failure_predicates()->erase("benchmark.stream_resets"));
EXPECT_EQ(1, cmd->mutable_failure_predicates()->erase("requestsource.upstream_rq_5xx"));
// TODO(#433)
OptionsImpl options_from_proto(*cmd);
Expand Down