-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument ConcurrencyLimitedChannel #443
Conversation
Generate changelog in
|
.map(channel -> new FixedLimitedChannel(channel, MAX_REQUESTS_PER_CHANNEL, clientMetrics)) | ||
.collect(ImmutableList.toImmutableList()); | ||
ImmutableList.Builder<LimitedChannel> limitedChannels = ImmutableList.builder(); | ||
for (int hostIndex = 0; hostIndex < channels.size(); hostIndex++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we sort the input uris and index based on sorted order? That way if the input list changes order we don't shuffle metrics, and I think it's cleaner to consume a collection/iterable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment we just get 'Channel' as the input type - it's not comparable and we can't correlate between any particular instances.
Thoughts on asking people to provide a ChannelWithId so we can do that sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. Not sure channel IDs are helpful outside of this use case. We can sort by identity hash code and index from that list ¯_(ツ)_/¯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they'd also be useful for staying pinned on the same host after uris live-reload. Let me try something with identifyhashcode.
hostIndex.ifPresent(index -> { | ||
DialogueConcurrencylimiterMetrics metrics = DialogueConcurrencylimiterMetrics.of(taggedMetrics); | ||
// TODO(dfox): hook up the service-name somehow? also when nodes reshuffle these metrics will look odd. | ||
metrics.utilization().hostIndex(Integer.toString(index)).build(this::getUtilization); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not confident this works when we have multiple discovered services, each combined channel (alta both gatekeeper, for example) will have separate concurrency limited channels using the same gauge for index zero. We don't know which one we're measuring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's why I added the do not merge
. I think the thing we're really interested in here is some stable identifier per host.
I don't think we can actually use hashcode because then a ton of uri reloads will result in an unbounded number of tags, even for the uris that don't change.
@ellisjoe I think you were keen to avoid plumbing uris/channel ids deep into the internals of dialogue, so I'm trying to think of a way to produce these graphs (especially for the initial rollout), so that we can really clearly explain exactly what dialogue is doing at any point in time. What if at Channels#create time, in addition to a List, we asked users to pass in an opaque List or something, and then just called methods on that interface to report stuff about a specific channel? (Also I think it would be really helpful to articulate the dangerous you see in plumbing uris / channeld identifiers in some kind of architecture decision record). |
This PR has been automatically marked as stale because it has not been touched in the last 14 days. If you'd like to keep it open, please leave a comment or add the 'long-lived' label, otherwise it'll be closed in 7 days. |
Two open questions:
|
static ConcurrencyLimitedChannel create(LimitedChannel delegate, DialogueClientMetrics metrics) { | ||
return new ConcurrencyLimitedChannel( | ||
delegate, ConcurrencyLimitedChannel.createLimiter(SYSTEM_NANOTIME), metrics); | ||
interface Instrumentation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this extra interface? can we just pass a DialogueConcurrencyLimiterMetrics
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this originally, but it required passing in the string uri which was only used for instrumentation purposes (here's the commit where you can see this: 8a8139d).
This felt a bit gross, as the string uri could in theory be re-purposed for other stuff. Also felt odd to always be passing in the only possible tag value rather than just pre-combining them.
dialogue-core/src/main/java/com/palantir/dialogue/core/ConcurrencyLimitedChannel.java
Outdated
Show resolved
Hide resolved
dialogue-core/src/main/java/com/palantir/dialogue/core/ConcurrencyLimitedChannel.java
Show resolved
Hide resolved
weakGauge( | ||
taggedMetrics, | ||
MetricName.builder() | ||
.safeName("dialogue.concurrencylimiter.utilization") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kinda sad that we can't use metric schema :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but I think it's ok to experiment with new APIs by just manually writing them. If we find this weak reference pattern is actually widespread and worth recommending, then we could bake it into tritium/metric-schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could almost recommend everyone express gauges in terms of a T
and a Function<T,Number>
(and make this a first-class tritium) because in the current world I bet the majority of gauges are actually preventing things being GC'd!
dialogue-core/src/main/java/com/palantir/dialogue/core/ConcurrencyLimitedChannel.java
Show resolved
Hide resolved
@@ -97,13 +98,17 @@ public void updateUris(Collection<String> uris) { | |||
Sets.SetView<String> newUris = Sets.difference(uniqueUris, limitedChannelByUri.keySet()); | |||
|
|||
staleUris.forEach(limitedChannelByUri::remove); | |||
newUris.forEach(uri -> limitedChannelByUri.put(uri, createLimitedChannel(uri))); | |||
ImmutableList<String> allUris = ImmutableList.<String>builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These indices aren't stable across updates so it will probably pretty confusing to look at individual metrics. Not sure of a better solution, but worth flagging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Given that it's not recommended to just hash things, I think this is the best we can do. In practise, I hope it'll be reasonably easy to see n coloured lines go into a live-reload event and then either n+1 or n-1 lines come out. People should be able to visually connect em up.
👍 |
Released 0.19.12 |
Before this PR
'Concurrency Limiters' is a scary word to most people, and people fear what they don't understand. I'd like to have graphs to show when concurrency limiters are actually affecting people.
After this PR
==COMMIT_MSG==
Two new gauges,
dialogue.concurrencylimiter.max
anddialogue.concurrencylimiter.utilization
, describe the state of each host's concurrency limiter.==COMMIT_MSG==
do not merge
until we've decided how to plumb in the service-name (otherwise if a service creates multiple dialogue clients each with 3 hosts, these gauges will flicker horribly).Possible downsides?