Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opencensus: rewrite span exporter using async/await #789

Merged
merged 7 commits into from
Jan 31, 2021

Conversation

hawkw
Copy link
Contributor

@hawkw hawkw commented Dec 22, 2020

This branch rewrites the linkerd2_opencensus crate's SpanExporter to
use async/await rather than a manual Future implementation.

The new code is a bit clearer and easier to read, and may also be
slightly more efficient, since it can use Tokio 1.0's mpsc without
allocating to poll. Additionally, the new code adds a timer after which
any batched spans are flushed, thanks to @olix0r.

@hawkw hawkw requested review from olix0r, adleong and a team December 22, 2020 21:09
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally good, but I have to admit that I find the control flow with nested loops to be pretty difficult to follow. I have a feeling that it can be made simpler, but i'd need to give it a try myself to know what I'm actually talking about.

linkerd/opencensus/src/lib.rs Outdated Show resolved Hide resolved
linkerd/opencensus/src/lib.rs Outdated Show resolved Hide resolved
@hawkw
Copy link
Contributor Author

hawkw commented Dec 22, 2020

I have to admit that I find the control flow with nested loops to be pretty difficult to follow

Yeah, this is probably because I tried to translate the behavior of the previous implementation's state machine futures, which was surprisingly complex. I can add some comments?

@hawkw
Copy link
Contributor Author

hawkw commented Dec 22, 2020

Got rid of one of the innermost loops, hopefully this is a bit clearer now?

@olix0r
Copy link
Member

olix0r commented Dec 23, 2020

Took a stab at trying to make things clearer: #790. The primary difference with this approach is that we don't try to process any spans until the client has an open stream from the server. I think this is functionally equivalent because we'd only be able to process one batch while disconnected, anyway.

@olix0r olix0r marked this pull request as draft January 21, 2021 18:30
hawkw and others added 5 commits January 29, 2021 13:49
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
* Simplify opencensus client control flow

* +trace

* Flush traces after idle timeout

Adds a 10s idle timeout so that traces are flushed when traffic stops.

Increases the span buffer to 1000 spans and minimizes the initial
allocation so that the buffer grows with use.

Improves opencensus factoring by splitting more methods.

* More yakshaving: make arguments explicit, avoid cloning the client

* Remove unused IntoService helper

* Drive both the response future and export stream together
@hawkw hawkw force-pushed the eliza/rewrite-opencensus branch from 2c95313 to 26f4a15 Compare January 29, 2021 22:10
@hawkw hawkw marked this pull request as ready for review January 29, 2021 22:30
@hawkw
Copy link
Contributor Author

hawkw commented Jan 29, 2021

@olix0r okay, i've merged your changes and rebased onto main, this should be ready for a review!

@olix0r olix0r merged commit 3995dc8 into main Jan 31, 2021
@olix0r olix0r deleted the eliza/rewrite-opencensus branch January 31, 2021 18:18
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 3, 2021
This release changes HTTP protocol detection to prevent timeout errors
in two ways:

1. HTTP detection no longer blocks until a newline is read. We've
   reverted to relying on a single read to make a determination.
2. Detection timeouts are no longer terminal. When a timeout is
   encountered, we continue forwarding the connection as an opaque TCP
   connection.

These changes may lead to false-negatives--we may fail to detect some
HTTP streams--but it should prevent many avoidable detection errors.

This release also makes improvements for multicluster gateways,
improving caching so that profile lookups are only performed
once per target service.

Diagnostic `stack_*` metrics have been moved so that they track
underlying services, ignoring fail-fast. This should help us get better
insights into services that are in failfast.

Finally, the opencensus exporter has been improved to ensure that trace
events are flushed if the trace buffer is not filled within a timeout.

---

* actions: Update actions to use full SHAs (linkerd/linkerd2-proxy#885)
* http: Parameterize normalize_ur::DefaultAuthority (linkerd/linkerd2-proxy#886)
* http: Parameterize the HTTP server (linkerd/linkerd2-proxy#887)
* opencensus: rewrite span exporter using async/await (linkerd/linkerd2-proxy#789)
* Update http::Insert to use `Param` (linkerd/linkerd2-proxy#889)
* Update crate dependencies (linkerd/linkerd2-proxy#892)
* stack: Make the router fallible (linkerd/linkerd2-proxy#888)
* Track stack metrics within failfast (linkerd/linkerd2-proxy#891)
* outbound: Avoid building balancers when no concrete name (linkerd/linkerd2-proxy#890)
* inbound: Cache HTTP gateways per destination (linkerd/linkerd2-proxy#893)
* Reorganize the gateway crate (linkerd/linkerd2-proxy#897)
* Bias HTTP detection towards availability (linkerd/linkerd2-proxy#894)
* inbound: Use ALPN to determine transport header (linkerd/linkerd2-proxy#895)
* detect: Return unknown protocol on detection timeout (linkerd/linkerd2-proxy#896)
* Extract protocol detection into the gateway crate (linkerd/linkerd2-proxy#898)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 3, 2021
This release changes HTTP protocol detection to prevent timeout errors
in two ways:

1. HTTP detection no longer blocks until a newline is read. We've
   reverted to relying on a single read to make a determination.
2. Detection timeouts are no longer terminal. When a timeout is
   encountered, we continue forwarding the connection as an opaque TCP
   connection.

These changes may lead to false-negatives--we may fail to detect some
HTTP streams--but it should prevent many avoidable detection errors.

This release also makes improvements for multicluster gateways,
improving caching so that profile lookups are only performed
once per target service.

Diagnostic `stack_*` metrics have been moved so that they track
underlying services, ignoring fail-fast. This should help us get better
insights into services that are in failfast.

Finally, the opencensus exporter has been improved to ensure that trace
events are flushed if the trace buffer is not filled within a timeout.

---

* actions: Update actions to use full SHAs (linkerd/linkerd2-proxy#885)
* http: Parameterize normalize_ur::DefaultAuthority (linkerd/linkerd2-proxy#886)
* http: Parameterize the HTTP server (linkerd/linkerd2-proxy#887)
* opencensus: rewrite span exporter using async/await (linkerd/linkerd2-proxy#789)
* Update http::Insert to use `Param` (linkerd/linkerd2-proxy#889)
* Update crate dependencies (linkerd/linkerd2-proxy#892)
* stack: Make the router fallible (linkerd/linkerd2-proxy#888)
* Track stack metrics within failfast (linkerd/linkerd2-proxy#891)
* outbound: Avoid building balancers when no concrete name (linkerd/linkerd2-proxy#890)
* inbound: Cache HTTP gateways per destination (linkerd/linkerd2-proxy#893)
* Reorganize the gateway crate (linkerd/linkerd2-proxy#897)
* Bias HTTP detection towards availability (linkerd/linkerd2-proxy#894)
* inbound: Use ALPN to determine transport header (linkerd/linkerd2-proxy#895)
* detect: Return unknown protocol on detection timeout (linkerd/linkerd2-proxy#896)
* Extract protocol detection into the gateway crate (linkerd/linkerd2-proxy#898)
jijeesh pushed a commit to jijeesh/linkerd2 that referenced this pull request Mar 23, 2021
This release changes HTTP protocol detection to prevent timeout errors
in two ways:

1. HTTP detection no longer blocks until a newline is read. We've
   reverted to relying on a single read to make a determination.
2. Detection timeouts are no longer terminal. When a timeout is
   encountered, we continue forwarding the connection as an opaque TCP
   connection.

These changes may lead to false-negatives--we may fail to detect some
HTTP streams--but it should prevent many avoidable detection errors.

This release also makes improvements for multicluster gateways,
improving caching so that profile lookups are only performed
once per target service.

Diagnostic `stack_*` metrics have been moved so that they track
underlying services, ignoring fail-fast. This should help us get better
insights into services that are in failfast.

Finally, the opencensus exporter has been improved to ensure that trace
events are flushed if the trace buffer is not filled within a timeout.

---

* actions: Update actions to use full SHAs (linkerd/linkerd2-proxy#885)
* http: Parameterize normalize_ur::DefaultAuthority (linkerd/linkerd2-proxy#886)
* http: Parameterize the HTTP server (linkerd/linkerd2-proxy#887)
* opencensus: rewrite span exporter using async/await (linkerd/linkerd2-proxy#789)
* Update http::Insert to use `Param` (linkerd/linkerd2-proxy#889)
* Update crate dependencies (linkerd/linkerd2-proxy#892)
* stack: Make the router fallible (linkerd/linkerd2-proxy#888)
* Track stack metrics within failfast (linkerd/linkerd2-proxy#891)
* outbound: Avoid building balancers when no concrete name (linkerd/linkerd2-proxy#890)
* inbound: Cache HTTP gateways per destination (linkerd/linkerd2-proxy#893)
* Reorganize the gateway crate (linkerd/linkerd2-proxy#897)
* Bias HTTP detection towards availability (linkerd/linkerd2-proxy#894)
* inbound: Use ALPN to determine transport header (linkerd/linkerd2-proxy#895)
* detect: Return unknown protocol on detection timeout (linkerd/linkerd2-proxy#896)
* Extract protocol detection into the gateway crate (linkerd/linkerd2-proxy#898)

Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
jijeesh pushed a commit to jijeesh/linkerd2 that referenced this pull request Apr 21, 2021
This release changes HTTP protocol detection to prevent timeout errors
in two ways:

1. HTTP detection no longer blocks until a newline is read. We've
   reverted to relying on a single read to make a determination.
2. Detection timeouts are no longer terminal. When a timeout is
   encountered, we continue forwarding the connection as an opaque TCP
   connection.

These changes may lead to false-negatives--we may fail to detect some
HTTP streams--but it should prevent many avoidable detection errors.

This release also makes improvements for multicluster gateways,
improving caching so that profile lookups are only performed
once per target service.

Diagnostic `stack_*` metrics have been moved so that they track
underlying services, ignoring fail-fast. This should help us get better
insights into services that are in failfast.

Finally, the opencensus exporter has been improved to ensure that trace
events are flushed if the trace buffer is not filled within a timeout.

---

* actions: Update actions to use full SHAs (linkerd/linkerd2-proxy#885)
* http: Parameterize normalize_ur::DefaultAuthority (linkerd/linkerd2-proxy#886)
* http: Parameterize the HTTP server (linkerd/linkerd2-proxy#887)
* opencensus: rewrite span exporter using async/await (linkerd/linkerd2-proxy#789)
* Update http::Insert to use `Param` (linkerd/linkerd2-proxy#889)
* Update crate dependencies (linkerd/linkerd2-proxy#892)
* stack: Make the router fallible (linkerd/linkerd2-proxy#888)
* Track stack metrics within failfast (linkerd/linkerd2-proxy#891)
* outbound: Avoid building balancers when no concrete name (linkerd/linkerd2-proxy#890)
* inbound: Cache HTTP gateways per destination (linkerd/linkerd2-proxy#893)
* Reorganize the gateway crate (linkerd/linkerd2-proxy#897)
* Bias HTTP detection towards availability (linkerd/linkerd2-proxy#894)
* inbound: Use ALPN to determine transport header (linkerd/linkerd2-proxy#895)
* detect: Return unknown protocol on detection timeout (linkerd/linkerd2-proxy#896)
* Extract protocol detection into the gateway crate (linkerd/linkerd2-proxy#898)

Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants