-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple "Error 13" message improvement #666
Conversation
update from master
merge from upstream
merge from upstream
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
…double Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
…variable_setter Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
merge from upstream
…ent.Output turns out not to include the status Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
…plugin names, log thresholds only once per session Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
…lure predicate for stream_resets Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM
@wjuan-AFK Please review and assign back to me once done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if (counters.find("sequencer.failed_terminations") == counters.end()) { | ||
return true; | ||
} else { | ||
ENVOY_LOG( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) How do you feel about splitting this into two log statements - one error log "Terminated early because of a failure predicate" and a second log (info/debug/verbose/something) with all of these extra details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
response.mutable_error_detail()->set_message( | ||
"Unknown failure. See Nighthawk Service logs. Make sure the URI is well formed and the DNS " | ||
"name resolves (if applicable). Check the output for problematic counter values. The " | ||
"default Nighthawk failure predicates report failure if (1) Nighthawk could not connect to " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there was a lot of churn in this investigation, so please feel free to push back.
Were we unable to identify where the predicates are resulting in failures? This feels like it'd be much better if we could instead log at the point of a predicate failing what predicate was failing and what its value was set to. Is that not possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Termination predicates are based on counters which don't seem to store their own names. See termination_predicate_impl.h. We could log the first time each counter was incremented in benchmark_client_impl.cc, without knowing whether it will trigger a failure predicate. Logging in either place could impact performance unless we add logic to bypass logging on subsequent times. The user would have to check the log for the detailed info since it would take more code to propagate the information to the point where we return from the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is ENVOY_LOG_ONCE
these days... but thinking about this, maybe if we'd allow wiring in some sort of callback when constructing predicates, which would fire when one indicates termination, that could be helpful here. There is one gotcha to that, which is that one part of the chain might conditionally override the return value of another. This is hypothetical with the current predicate implementations, but conceptually it doesn't feel right to me. So thinking out loud some more, maybe this callback should be specific to a new top level predicate implementation, which never indicates termination all by itself - it relies on the next links in the chain to figure that out -- but is just there to serialize what happened in the chain when termination commenced, for the purpose of communicating that upwards. It think it would be useful to be able to communicate this in a structured way, especially in the gRPC flows.
TLDR; the current state seems like a good enhancement to me, perhaps punt improving predicates to do a better job at facilitating UX to an issue and discuss further out of the context of this issue if need be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree that we can go forward without fixing this in this PR. Creating an issue for improving termination predicates (with logging or something) sounds like a good idea to me.
"Unknown failure. See Nighthawk Service logs. Make sure the URI is well formed and the DNS " | ||
"name resolves (if applicable). Check the output for problematic counter values. The " | ||
"default Nighthawk failure predicates report failure if (1) Nighthawk could not connect to " | ||
"the target (see 'benchmark.pool_connection_failure' counter; check the address and port " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of whether or not we make the above change, can we create a TODO/issue a bug for moving this to a documentation location for common trouble areas (similar to our internal playbooks)? This whole log feels like it really belongs in a markdown document rather than in a log statement.
(Afterwards, we can instead recommend that people reference our troubleshooting document, instead of including everything in this log statement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #668.
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
When there is a URI exception or failing failure predicate, log troubleshooting tips and return the tips in the Nighthawk Service error description.
Add a default failure predicate for
benchmark.stream_resets
> 0, since stream resets usually indicate a misconfigured protocol.NOTE: If you expect stream resets in a benchmark session, run with
--failure-predicate benchmark.stream_resets:1000000000
to avoid early termination.Related to #181 and #613.