-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
health check: structured active healthcheck logging #3176
Changes from 18 commits
fea8182
5531843
662d705
05b9194
6799f87
22af255
79cf7cc
9f85fdf
6c75857
099f8ec
a5e2ab1
1c0dc24
826c99d
421f3d9
0294538
92b8dbe
197db40
9eadc6c
7720da7
7d4d018
d47bbaa
bc0e04b
100164e
2f27dd3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
load("//bazel:api_build_system.bzl", "api_proto_library") | ||
|
||
licenses(["notice"]) # Apache 2 | ||
|
||
api_proto_library( | ||
name = "health_check_event", | ||
srcs = ["health_check_event.proto"], | ||
deps = ["//envoy/api/v2/core:base"], | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
syntax = "proto3"; | ||
|
||
package envoy.data.core.v2alpha; | ||
|
||
import "envoy/api/v2/core/base.proto"; | ||
|
||
import "google/protobuf/duration.proto"; | ||
import "google/protobuf/wrappers.proto"; | ||
|
||
import "validate/validate.proto"; | ||
import "gogoproto/gogo.proto"; | ||
|
||
option (gogoproto.equal_all) = true; | ||
|
||
// [#protodoc-title: Health check logging events] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is logging, doesn't it belong in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My bad, I was focusing on the |
||
// :ref:`Health check logging <arch_overview_health_check_logging>`. | ||
|
||
message HealthCheckEvent { | ||
HealthCheckerType health_checker_type = 1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we add enum validation here |
||
string host_address = 2 [(validate.rules).string.min_bytes = 1]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, full address preferable. The only situation it might not be the preferred choice is if we don't have the possibility of there being a port encoded (i.e. not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One thing from #3478... what happens when we have hosts that appear at multiple levels and priorities? Do we have unique HC events, do we have locality/priority information associated with them? |
||
string cluster_name = 3 [(validate.rules).string.min_bytes = 1]; | ||
|
||
oneof event { | ||
option (validate.required) = true; | ||
|
||
// Host ejection. | ||
HealthCheckEjectUnhealthy eject_unhealthy_event = 4; | ||
|
||
// Host addition. | ||
HealthCheckAddHealthy add_healthy_event = 5; | ||
} | ||
} | ||
|
||
enum HealthCheckFailureType { | ||
ACTIVE = 0; | ||
PASSIVE = 1; | ||
NETWORK = 2; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please also consider the discussion in #3478 (comment) |
||
} | ||
|
||
enum HealthCheckerType { | ||
HTTP = 0; | ||
TCP = 1; | ||
GRPC = 2; | ||
REDIS = 3; | ||
} | ||
|
||
message HealthCheckEjectUnhealthy { | ||
// The type of failure that caused this ejection. | ||
HealthCheckFailureType failure_type = 1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. enum validation |
||
} | ||
|
||
message HealthCheckAddHealthy { | ||
// Whether this addition is the result of the first ever health check on a host, in which case | ||
// the configured :ref:`healthy threshold <envoy_api_field_core.HealthCheck.healthy_threshold>` | ||
// is bypassed and the host is immediately added. | ||
bool first_check = 1; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
Core data | ||
========= | ||
|
||
.. toctree:: | ||
:glob: | ||
:maxdepth: 2 | ||
|
||
v2alpha/health_check_event.proto |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,4 +6,5 @@ Envoy data | |
:maxdepth: 2 | ||
|
||
accesslog/accesslog | ||
core/core | ||
tap/tap |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,15 @@ unhealthy, successes required before marking a host healthy, etc.): | |
maintenance by setting the specified key to any value and waiting for traffic to drain. See | ||
:ref:`redis_key <config_cluster_manager_cluster_hc_redis_key>`. | ||
|
||
.. _arch_overview_health_check_logging: | ||
|
||
Health check event logging | ||
-------------------------- | ||
|
||
A per-healthchecker log of ejection and addition events can optionally be produced by Envoy by | ||
specifying a log file path in `the HealthCheckConfig <envoy_api_field_core.HealthCheck.event_log_path>`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: use snake_case for field names |
||
The log is structured as JSON dumps of `HealthCheckEvent messages <envoy_api_msg_core.HealthCheckEvent>`. | ||
|
||
Passive health checking | ||
----------------------- | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,6 +3,7 @@ | |
#include <functional> | ||
#include <memory> | ||
|
||
#include "envoy/data/core/v2alpha/health_check_event.pb.h" | ||
#include "envoy/upstream/upstream.h" | ||
|
||
namespace Envoy { | ||
|
@@ -59,5 +60,36 @@ typedef std::shared_ptr<HealthChecker> HealthCheckerSharedPtr; | |
std::ostream& operator<<(std::ostream& out, HealthState state); | ||
std::ostream& operator<<(std::ostream& out, HealthTransition changed_state); | ||
|
||
/** | ||
* Sink for health check event logs. | ||
*/ | ||
class HealthCheckEventLogger { | ||
public: | ||
virtual ~HealthCheckEventLogger() {} | ||
|
||
/** | ||
* Log an unhealthy host ejection event. | ||
* @param health_checker_type supplies the type of health checker that generated the event. | ||
* @param host supplies the host that generated the event. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. docs for additional params There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
* @param failure_type supplies the type of health check failure | ||
*/ | ||
virtual void | ||
logEjectUnhealthy(envoy::data::core::v2alpha::HealthCheckerType health_checker_type, | ||
const HostDescriptionConstSharedPtr& host, | ||
envoy::data::core::v2alpha::HealthCheckFailureType failure_type) PURE; | ||
|
||
/** | ||
* Log a healthy host addition event. | ||
* @param health_checker_type supplies the type of health checker that generated the event. | ||
* @param host supplies the host that generated the event. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. docs for additional params There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
* @param healthy_threshold supplied the configured healthy threshold for this health check | ||
* @param first_check whether this is a fast path success on the first health check for this host | ||
*/ | ||
virtual void logAddHealthy(envoy::data::core::v2alpha::HealthCheckerType health_checker_type, | ||
const HostDescriptionConstSharedPtr& host, bool first_check) PURE; | ||
}; | ||
|
||
typedef std::unique_ptr<HealthCheckEventLogger> HealthCheckEventLoggerPtr; | ||
|
||
} // namespace Upstream | ||
} // namespace Envoy |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,7 +87,8 @@ void MessageUtil::loadFromFile(const std::string& path, Protobuf::Message& messa | |
} | ||
|
||
std::string MessageUtil::getJsonStringFromMessage(const Protobuf::Message& message, | ||
const bool pretty_print) { | ||
const bool pretty_print, | ||
const bool always_print_primitive_fields) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. humorously, I was just looking at adding this in a different change. Cool! |
||
Protobuf::util::JsonPrintOptions json_options; | ||
// By default, proto field names are converted to camelCase when the message is converted to JSON. | ||
// Setting this option makes debugging easier because it keeps field names consistent in JSON | ||
|
@@ -96,6 +97,11 @@ std::string MessageUtil::getJsonStringFromMessage(const Protobuf::Message& messa | |
if (pretty_print) { | ||
json_options.add_whitespace = true; | ||
} | ||
// Primitive types such as int32s and enums will not be serialized if they have the default value. | ||
// This flag disables that behavior. | ||
if (always_print_primitive_fields) { | ||
json_options.always_print_primitive_fields = true; | ||
} | ||
ProtobufTypes::String json; | ||
const auto status = Protobuf::util::MessageToJsonString(message, &json, json_options); | ||
// This should always succeed unless something crash-worthy such as out-of-memory. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -218,10 +218,13 @@ class MessageUtil { | |
* Extract JSON as string from a google.protobuf.Message. | ||
* @param message message of type type.googleapis.com/google.protobuf.Message. | ||
* @param pretty_print whether the returned JSON should be formatted. | ||
* @param always_print_primitive_fields whether to include primitive fields set to their default | ||
* values, e.g. an int32 set to 0 or a bool set to false. | ||
* @return std::string of formatted JSON object. | ||
*/ | ||
static std::string getJsonStringFromMessage(const Protobuf::Message& message, | ||
bool pretty_print = false); | ||
bool pretty_print = false, | ||
bool always_print_primitive_fields = false); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: update doc comment |
||
|
||
/** | ||
* Extract JSON object from a google.protobuf.Message. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we cross link to relevant docs here about the event log, and also specify that if empty, no event log will be written?