RFC: Managed Ingestion of Envoy Telemetry #262
Labels
App Mesh Envoy
App Mesh
RFC
Roadmap: Awaiting Customer Feedback
We need to get more information in order understand how we will implement this feature.
Today, the App Mesh control plane actually has very little insight into the customer experience in terms of their Mesh's health. While we are aware of sudden changes in the number of Envoys connected for given Virtual Nodes and when configuration is unusable by Envoy, we have no automatic feedback mechanism from customers for how this configuration is actually working out for customers.
What we're proposing is an optional integration that allows customers to egress operational data about their Envoys to App Mesh. This data would start by covering operational metrics such as request success/failure rates and latency, and would have granularity at the Envoy cluster (Virtual Node) and endpoint level. In combination with endpoint information
At a high level, some features we could build off this information
At the same time, gathering this information would also help App Mesh improve the aggregate customer experience. As we've observed in #151 and #227, particular combinations of configuration and state change can lead to a surprising and poor customer experience. Even with our continued improvements in comprehensive testing, there is always the potential for a change to still impact customers, requiring them to reach out to us for support. By integrating customer telemetry into our release process, we can monitor behavior changes as new configuration is deployed and roll back in the event of customer impact. This will also allow us to analyze trends and behavior customers are seeing and proactively improve configuration.
The text was updated successfully, but these errors were encountered: