Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument ConcurrencyLimitedChannel #443

Merged
merged 11 commits into from
Mar 16, 2020
Merged

Conversation

iamdanfox
Copy link
Contributor

@iamdanfox iamdanfox commented Feb 26, 2020

Before this PR

'Concurrency Limiters' is a scary word to most people, and people fear what they don't understand. I'd like to have graphs to show when concurrency limiters are actually affecting people.

After this PR

==COMMIT_MSG==
Two new gauges, dialogue.concurrencylimiter.max and dialogue.concurrencylimiter.utilization, describe the state of each host's concurrency limiter.
==COMMIT_MSG==

do not merge until we've decided how to plumb in the service-name (otherwise if a service creates multiple dialogue clients each with 3 hosts, these gauges will flicker horribly).

Possible downsides?

  • more metrics = more DD $
  • Witchcraft's metric sampling is roughly every 30 seconds, so even if a DataDog graph shows low utilization, there's a possibility that it was spiking in between samples, which is kinda misleading.

@changelog-app
Copy link

changelog-app bot commented Feb 26, 2020

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Two new gauges, dialogue.concurrencylimiter.max and dialogue.concurrencylimiter.utilization, describe the state of each host's concurrency limiter.

Check the box to generate changelog(s)

  • Generate changelog entry

.map(channel -> new FixedLimitedChannel(channel, MAX_REQUESTS_PER_CHANNEL, clientMetrics))
.collect(ImmutableList.toImmutableList());
ImmutableList.Builder<LimitedChannel> limitedChannels = ImmutableList.builder();
for (int hostIndex = 0; hostIndex < channels.size(); hostIndex++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we sort the input uris and index based on sorted order? That way if the input list changes order we don't shuffle metrics, and I think it's cleaner to consume a collection/iterable here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we just get 'Channel' as the input type - it's not comparable and we can't correlate between any particular instances.

Thoughts on asking people to provide a ChannelWithId so we can do that sorting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Not sure channel IDs are helpful outside of this use case. We can sort by identity hash code and index from that list ¯_(ツ)_/¯

Copy link
Contributor Author

@iamdanfox iamdanfox Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they'd also be useful for staying pinned on the same host after uris live-reload. Let me try something with identifyhashcode.

hostIndex.ifPresent(index -> {
DialogueConcurrencylimiterMetrics metrics = DialogueConcurrencylimiterMetrics.of(taggedMetrics);
// TODO(dfox): hook up the service-name somehow? also when nodes reshuffle these metrics will look odd.
metrics.utilization().hostIndex(Integer.toString(index)).build(this::getUtilization);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confident this works when we have multiple discovered services, each combined channel (alta both gatekeeper, for example) will have separate concurrency limited channels using the same gauge for index zero. We don't know which one we're measuring.

Copy link
Contributor Author

@iamdanfox iamdanfox Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's why I added the do not merge. I think the thing we're really interested in here is some stable identifier per host.

I don't think we can actually use hashcode because then a ton of uri reloads will result in an unbounded number of tags, even for the uris that don't change.

@iamdanfox
Copy link
Contributor Author

@ellisjoe I think you were keen to avoid plumbing uris/channel ids deep into the internals of dialogue, so I'm trying to think of a way to produce these graphs (especially for the initial rollout), so that we can really clearly explain exactly what dialogue is doing at any point in time.

What if at Channels#create time, in addition to a List, we asked users to pass in an opaque List or something, and then just called methods on that interface to report stuff about a specific channel?

(Also I think it would be really helpful to articulate the dangerous you see in plumbing uris / channeld identifiers in some kind of architecture decision record).

@stale
Copy link

stale bot commented Mar 13, 2020

This PR has been automatically marked as stale because it has not been touched in the last 14 days. If you'd like to keep it open, please leave a comment or add the 'long-lived' label, otherwise it'll be closed in 7 days.

@stale stale bot added the stale label Mar 13, 2020
@stale stale bot removed the stale label Mar 16, 2020
@iamdanfox
Copy link
Contributor Author

Two open questions:

  • if people create clients with a high number of uris then this will create a high tag DD tag cardinality.
  • if the same URI appears in multiple service-discovery-blocks e.g. one for multipass and one for gatekeeper, then the metrics will likely be confusing.

static ConcurrencyLimitedChannel create(LimitedChannel delegate, DialogueClientMetrics metrics) {
return new ConcurrencyLimitedChannel(
delegate, ConcurrencyLimitedChannel.createLimiter(SYSTEM_NANOTIME), metrics);
interface Instrumentation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this extra interface? can we just pass a DialogueConcurrencyLimiterMetrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this originally, but it required passing in the string uri which was only used for instrumentation purposes (here's the commit where you can see this: 8a8139d).

This felt a bit gross, as the string uri could in theory be re-purposed for other stuff. Also felt odd to always be passing in the only possible tag value rather than just pre-combining them.

weakGauge(
taggedMetrics,
MetricName.builder()
.safeName("dialogue.concurrencylimiter.utilization")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda sad that we can't use metric schema :(

Copy link
Contributor Author

@iamdanfox iamdanfox Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I think it's ok to experiment with new APIs by just manually writing them. If we find this weak reference pattern is actually widespread and worth recommending, then we could bake it into tritium/metric-schema.

Copy link
Contributor Author

@iamdanfox iamdanfox Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could almost recommend everyone express gauges in terms of a T and a Function<T,Number> (and make this a first-class tritium) because in the current world I bet the majority of gauges are actually preventing things being GC'd!

@@ -97,13 +98,17 @@ public void updateUris(Collection<String> uris) {
Sets.SetView<String> newUris = Sets.difference(uniqueUris, limitedChannelByUri.keySet());

staleUris.forEach(limitedChannelByUri::remove);
newUris.forEach(uri -> limitedChannelByUri.put(uri, createLimitedChannel(uri)));
ImmutableList<String> allUris = ImmutableList.<String>builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These indices aren't stable across updates so it will probably pretty confusing to look at individual metrics. Not sure of a better solution, but worth flagging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Given that it's not recommended to just hash things, I think this is the best we can do. In practise, I hope it'll be reasonably easy to see n coloured lines go into a live-reload event and then either n+1 or n-1 lines come out. People should be able to visually connect em up.

@ferozco
Copy link
Contributor

ferozco commented Mar 16, 2020

👍

@bulldozer-bot bulldozer-bot bot merged commit 19c392c into develop Mar 16, 2020
@bulldozer-bot bulldozer-bot bot deleted the dfox/instrument-limiter branch March 16, 2020 17:17
@svc-autorelease
Copy link
Collaborator

Released 0.19.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants