-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spans skeleton #7862
Spans skeleton #7862
Conversation
|
||
@contextmanager | ||
def span(*tags: str) -> Iterator[None]: | ||
"""Tag group of tasks to be part of a certain group, called a span. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring is not rendered anywhere at the moment.
We should wait until we have something actually useful before we advertise it in sphinx.
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 20 files ± 0 20 suites ±0 11h 29m 15s ⏱️ + 11m 52s For more details on these failures, see this check. Results for commit 2098b9a. ± Comparison against base commit 3286d2f. ♻️ This comment has been updated with latest results. |
7d31b76
to
c9398a7
Compare
distributed/tests/test_spans.py
Outdated
def assert_span(d, *expect): | ||
span_id = (c.id, *expect) | ||
assert s.tasks[d.key].annotations["span"] == span_id | ||
assert a.state.tasks[d.key].annotations["span"] == span_id | ||
assert ext.spans[span_id].id == span_id | ||
|
||
assert_span(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that every task gets a span defined by the client ID?
If so, my sense is that multi-client workloads are rare, and we're probably already able to attribute the client ID. My inclination would be to drop Client ID from spans, which leaves the common case of no spans unchanged / clean and without annotations, which seems good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that every task gets a span defined by the client ID?
Yes, there's always a root span if the user does not explicitly call span()
on the client workflow, and it's identified by the client ID.
Last week, @fjetter, @hendrikmakait, @ntabris and I reached consensus in this direction during a high-bandwidth design discussion. The conclusion was that adding an annotation to all tasks is cheap and is a small cost to pay to support for multi-tenancy. The overhead becomes even smaller if the user actually uses the span()
context manager, as this adds just a single element to a tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conclusion was that adding an annotation to all tasks is cheap and is a small cost to pay to support for multi-tenancy.
Can I ask to learn more about the benefits of multi-tenancy that were discussed? I appreciate that the cost is small. I'm confused about the benefit, it seems very small to me.
Adding an annotation to every task that includes the client ID doesn't make much sense to me, especially not in support of a new feature that hasn't demonstrated that it will get much use. I'd love to learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I'd like to propose that we drop the client id from the span annotations. I suspect that that won't have any bad effects (please correct me if I'm not thinking of something) and will also isolate the effects of the proposed change to only when the new feature is used.
As an example, I can imagine changes in the future which turn off certain optimizations if any annotations are present, just because we don't know how to handle them. In this case (which seems not unlikely to me) it would have low impact if annotations are rare (as they are today) but would have very high impact if this change goes through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against turning the implicit span creation by clients into a config value that's off by default and assuming a global span instead. If we found a way to attach tasks to an implicit client span without annotations, even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main point is that we want to support overlapping span
s to tag independent workloads. This also means that we want to handle workloads from different clients independently.
Setting one default span per client was almost an afterthought to me: When discussing how to pick default spans, we decided to drop any heuristics and simply use static defaults. (This is the important decision.) Using one default span per client felt like a better default on a multi-tenant setup as it would allow users to always differentiate between independent work triggered by different clients (scheduled jobs/users) without any/much perceived cost. I still think having such a feature would be nice, but I personally don't care if that's hidden behind some flag to avoid some cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi-client workloads are exceedingly rare. I recommend that we don't worry about them until they come up (I'll bet lots of beers that they don't come up). If they're the only motivation behind adding annotations to all tasks then, to me, the choice is clear that we don't think much about multi-client workloads and also get to not add annotations in the common case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for me, spans can still be set explicitly in multi-client environments to handle workloads independently and whether or not we add annotations in the common case is essentially an implementation detail to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed client ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @crusaderky . I appreciate it.
This is again ready for review and merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @crusaderky for getting this off the ground! The code looks good to me.
Outside the scope of this PR, I'd like to ideate on the representation of spans and how we communicate them to the scheduler. I think we should move our representation closer to something like the trace/span specification of OpenTelemetry (https://opentelemetry.io/docs/concepts/signals/traces/). In particular, I see value in relying on UUIDs to differentiate between different "runs" of the same span (in particular, now that we've decided against creating a default span per client) and using a richer representation that would allow us to add additional attributes. This is probably best done in a high-bandwidth conversation.
Sounds like update_graph should prepend a unique ID to the span annotations. |
Two possible differences:
|
In scope
Out of scope