Skip to content

Conversation

@Aitozi
Copy link
Contributor

@Aitozi Aitozi commented May 15, 2019

What is the purpose of the change

This PR is to fix the bug in the calculation of inputBufferUsage. It does't now take the exclusive buffer which is assigned from global buffer pool into account.

Brief change log

(for example:)

  • The TaskInfo is stored in the blob store on job creation time as a persistent artifact
  • Deployments RPC transmits only the blob storage reference
  • TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (100MB)
  • Extended integration test for recovery after master (JobManager) failure
  • Added test that validates that TaskInfo is transferred only once across recoveries
  • Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@Aitozi
Copy link
Contributor Author

Aitozi commented May 15, 2019

Hi @zhijiangW could you help take a look on this PR when you are free, thanks.

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

Copy link
Contributor

@zhijiangW zhijiangW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for concerning this issue and opening the PR @Aitozi !

Actually I also considered this issue before, but have not confirmed the proper way to fix this. I could share some my thoughts:

  • The current inputBufferUsage is only reflecting the usage of floating buffers, and it could give an indicator whether the setting of floating-buffers-per-gate is enough for performance tuning.

  • We could always assume if the exclusive buffers are enough for one channel, it will not request floating buffers. In other words, if we see the floating buffers are used, that means the exclusive buffers should also be used.

  • The semantics of inputBufferUsage is also changed after credit-based mode. The previous usage indicates the buffer is actually filled with network data. Now the usage is only indicating the floating buffer is removed from LocalBufferPool to RemoteInputChannel, but the buffer might still be available, not filled with data.

  • The metric purpose should be guidable for performance tuning I think. If we mixed the usages of exclusive and floating buffers together, it might not bring any help for performance tuning. E.g. if the usage is 100% after merging, we could easy to estimate exclusive/floating are both used. But now we could also think so based on my second point. If the usage is 50% or other, we still need to further calculate how many floating are used in all.

It is indeed loss of some metrics to help performance tuning better in credit-based mode. Such as

  • how to measure the exclusive usages

  • how to measure the floating distribution among channels which could give feedback whether the current mechanism of floating buffers is proper.

  • how to measure whether the sender in network is blocked by insufficient credit feedback which could help further improve the setting and internal rule.

I plan to focus on these issues future, maybe not covered in next release. For the issue of this PR I would confirm with other guys and feedback to you later.

FYI, the commit title should cover [network, metrics] for better recognization, and exists netowork typo. :)

@Aitozi
Copy link
Contributor Author

Aitozi commented May 16, 2019

Thanks @zhijiangW for sharing your thoughts on this issue.
Although the semantics of inputBufferUsage means the buffer occupied by RemoteChannel which may fill with data or not as you mentioned in credit-based mode in the moment, But I think this PR taking the exclusive buffer into account, which will make the semantics of this more close to the original intention.
I'am also ok to stay the same until the time of introducing new network related metrics. I think we are short of more detail metric about credit-based network mode too.
Thanks for remind of the PR title format, I will pay more attention to this.:)

@Aitozi Aitozi changed the title [FLINK-12284]Fix the incorrect inputBufferUsage metric in credit-based netowork mode [FLINK-12284][Network,Metrics]Fix the incorrect inputBufferUsage metric in credit-based network mode May 16, 2019
@zhijiangW
Copy link
Contributor

@pnowojski What do you think of this issue?

@pnowojski
Copy link
Contributor

The current inputBufferUsage is only reflecting the usage of floating buffers, and it could give an indicator whether the setting of floating-buffers-per-gate is enough for performance tuning.

I agree, there is a value of the current semantic.

I would even build upon that. If you want to detect bottleneck, current semantic of tracking just floating buffers is perfect for that. If current operator (reading from this input) is causing a bottleneck it well can be that only a single one input channel is back-pressured. At the moment we could detect this situation with "all floating buffers are exhausted/used".

We could always assume if the exclusive buffers are enough for one channel, it will not request floating buffers. In other words, if we see the floating buffers are used, that means the exclusive buffers should also be used.

This is not true for even slight data skew.

I think that there is a value of tracking separately floating and exclusive buffers.

Maybe we should make inPoolUsage track floating + exclusive buffers (it would make it consistent with non credit based mode), while we could add inFloatingPoolUsage metric with current semantic? What do you think?

@Aitozi
Copy link
Contributor Author

Aitozi commented May 24, 2019

Hi @pnowojski , what is the reason for the second point ?

This is not true for even slight data skew

And +1 for adding the inFloatingPoolUsage specialized for detecting bottleneck.

@zhijiangW
Copy link
Contributor

Thanks for the confirmation and good suggestions. @pnowojski

In other words, if we see the floating buffers are used, that means the exclusive buffers should also be used.
This is not true for even slight data skew

Maybe my previous expression is not very clear. If floating buffers are used, I only mean the exclusive buffers for one/some RemoteInputChannels are also expected to be used eventually, but not indicate the exclusive buffers for all the channels are used.

For example, if the producer's backlog is 1, we would always request another 1 floating buffer even though the 2 exclusive buffers for this channel are available atm. Because we want to feedback some overhead credits beforehand in order not to block the network transport after producing more backlog soon. So it is not strong consistent for exclusive buffers used in time, might be eventual consistent within our expectation. If the backlog is becoming 0 from 1, the previous requested floating buffer would also be released by this channel if the 2 exclusive buffers are still available. So from the aspect of one input channel, it would not occupy extra floating buffers if its available exclusive buffers are enough.

I also agree with the above suggestions for distinguishing the total inPoolUsage and floatingInPoolUsage, or we could only retain exclusiveInPoolUsage and floatingInPoolUsage for credit-based mode.

@Aitozi
Copy link
Contributor Author

Aitozi commented May 27, 2019

Hi, @zhijiangW thanks for your detail explanation, I make a conclusion below
Non-credit-based mode:

  • InPoolUsage

Credit-based mode:

  • exclusiveInPoolUsage exclusiveBufferUsed/exclusiveBufferTotal(all initial credit)
  • floatingInPoolUsage floatingBufferUsed/floatingBufferTotal
  • InPoolUsage combine with exclusiveInPoolUsage and floatingInPoolUsage

I think this looks better to distinguish the buffer usage between floating and exclusive in credit-based mode and also give a view of the overall perspective with inPoolUsage metric. WDYT @pnowojski @zhijiangW

@pnowojski
Copy link
Contributor

pnowojski commented May 27, 2019

@Aitozi :

I think your proposal makes sense. Having 3 metrics for credit based flow control sounds as the best, the most explicit solution for me.

Hi @pnowojski , what is the reason for the second point ?

Which second point did you mean @Aitozi ? Or did @zhijiangW answer your question?

@zhijiangW :

Maybe my previous expression is not very clear. If floating buffers are used, I only mean the exclusive buffers for one/some RemoteInputChannels are also expected to be used eventually, but not indicate the exclusive buffers for all the channels are used.

👍

I meant that from the perspective of such data skew, just knowledge of floating buffers usage is not enough for performance tuning.

If you see that floating buffers usage is constantly jumping, between empty & full, it means that you should probably increase the either floating buffer pool or exclusive buffer pool. Now depending if you have a data skew and mostly unused exclusive buffers, you would be better off increasing floating buffer pool and potentially shrinking exclusive. While if you do not have data skew (and mostly used exclusive buffer pool % a couple of idling channels), you would be better off increasing exclusive buffer pool.

@Aitozi
Copy link
Contributor Author

Aitozi commented May 27, 2019

OK, I will open another issue additionally to track and will update this PR accordingly, thanks.

@Aitozi
Copy link
Contributor Author

Aitozi commented May 27, 2019

Have added the three metric for the credit based mode, please take a look when you are free @zhijiangW @pnowojski

Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @Aitozi for the update :) Generally speaking code looks conceptually OK. Left couple of comments.

import org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate;

/**
* Gauge metric measuring the input buffer pool usage gauge for {@link SingleInputGate}s under credit based mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: metrics?

private static final String METRIC_OUTPUT_POOL_USAGE = "outPoolUsage";
private static final String METRIC_INPUT_QUEUE_LENGTH = "inputQueueLength";
private static final String METRIC_INPUT_POOL_USAGE = "inPoolUsage";
private static final String METRIC_CREDIT_BASED_FLOATING_BUFFER_POOL_USAGE = "floatingBufferUsage";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a nit:

I'm not a native english speaker so I might be wrong, but shouldn't this be floatingBuffersUsage? And ditto in other places? It seems to me like the correct spelling are the following forms/contexts:

  • buffer pool usage (singular since its a single "buffer pool")
  • red buffers usage (multiple buffers)

/**
* Gauge metric measuring the floating buffer usage for {@link SingleInputGate}.
*/
public static class FloatingBufferPoolUsage implements Gauge<Float> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we deduplicate the code here a bit? For example we could introduce InputPoolUsageGauge that would contain the following code/loop:

    protected abstract int calculateUsedBuffers(InputGate);

    protected abstract int calculateBufferPoolSize(InputGate);

	@Override
	public Float getValue() {
		int usedBuffers = 0;
		int bufferPoolSize = 0;

		for (SingleInputGate inputGate : inputGates) {
			usedBuffers += calculateUsedBuffers(inputGate)
			bufferPoolSize += calculateBufferPoolSize(inputGate);
		}

		if (bufferPoolSize != 0) {
			return ((float) usedBuffers) / bufferPoolSize;
		} else {
			return 0.0f;
		}
	}

That could be extended/implemented by InputBufferPoolUsageGauge and the classes that you introduced.

Also it would be nice to express somehow the total usage as FloatingBufferPoolUsage + ExclusiveBufferPoolUsage instead of duplicating the logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good suggestion, I will follow it.

return initialCredit;
}

public int unsafeGetExclusiveBufferUsed() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsynchronized instead of unsafe to match existing unsynchronizedGetNumberOfQueuedBuffers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExclusiveBuffers

/**
* Gauge metric measuring the input buffer pool usage gauge for {@link SingleInputGate}s under credit based mode.
*/
public class CreditBasedInputBufferPoolMetric {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for those classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

return Math.max(0, initialCredit - bufferQueue.exclusiveBuffers.size());
}

public int unsafeGetFloatingBufferLeft() {
Copy link
Contributor

@zhijiangW zhijiangW May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FloatingBuffers
Left->Available?

if (requestedFloatingBuffer == 0) {
return 0.0f;
} else {
return 1 - ((float) leftFloatingBuffer) / requestedFloatingBuffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not very accurate here.
It should be (requestedFloatingBuffer - leftFloatingBuffer) / LocalBufferPool.getNumBuffers

}

@Override
public Float getValue() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might extract the logics of above exclusiveBuffersUsage and floatingBuffersUsage into separate methods, then these methods could be reused here to get final results directly, to avoid below detail logics again.

buffersGroup.gauge(METRIC_INPUT_QUEUE_LENGTH, new InputBuffersGauge(inputGates));
buffersGroup.gauge(METRIC_INPUT_POOL_USAGE, new InputBufferPoolUsageGauge(inputGates));

if (config.isCreditBased()) {
Copy link
Contributor

@zhijiangW zhijiangW May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For exclusive case there is no buffer pool actually. So maybe we could break down the new class names into ExclusiveBuffersUsageGauge, FloatingBuffersUsageGauge, InputBuffersUsageGauge instead?

But considering the metric name compatibility with before, we might still use METRIC_INPUT_POOL_USAGE in credit-based mode to correspond with InputBuffersUsageGauge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong feelings, but maybe keeping the name of the gauge in sync with the metric name is a better idea? (I'm fine either way)

private static final String METRIC_OUTPUT_POOL_USAGE = "outPoolUsage";
private static final String METRIC_INPUT_QUEUE_LENGTH = "inputQueueLength";
private static final String METRIC_INPUT_POOL_USAGE = "inPoolUsage";
private static final String METRIC_CREDIT_BASED_FLOATING_BUFFER_POOL_USAGE = "floatingBufferUsage";
Copy link
Contributor

@zhijiangW zhijiangW May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to not add the keywords CREDIT-BASED here, because after we remove the deprecated network codes finally, it is no need to emphasize this term any more. The terms of exclusive/floating already indicate that.
Also it should be consistent between variable name and value, like METRIC_FLOATING_BUFFERS_USAGE = "floatingBuffersUsage". ditto for the below case of exclusive.

@zhijiangW
Copy link
Contributor

Thanks for the updates @Aitozi ! I left some inline comments.

@Aitozi Aitozi force-pushed the inpoolusage_fix branch from 2b78438 to 7b39e0b Compare June 2, 2019 10:07
@Aitozi
Copy link
Contributor Author

Aitozi commented Jun 2, 2019

Hi @zhijiangW @pnowojski I have addressed your comments. About the inPoolUsage name, I use the name for both two mode to be consist with previous style. And i squash the commit history solving the conflicts, please help review when you are free, I will continue to solve another PR after this one has been approved to avoid conflicts.

@Aitozi Aitozi changed the title [FLINK-12284][Network,Metrics]Fix the incorrect inputBufferUsage metric in credit-based network mode [FLINK-12284,FLINK-12637][Network,Metrics]Fix the incorrect inputBufferUsage metric in credit-based network mode Jun 2, 2019
Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of more comments from my side :)

buffersGroup.gauge(METRIC_INPUT_QUEUE_LENGTH, new InputBuffersGauge(inputGates));
buffersGroup.gauge(METRIC_INPUT_POOL_USAGE, new InputBufferPoolUsageGauge(inputGates));

if (config.isCreditBased()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong feelings, but maybe keeping the name of the gauge in sync with the metric name is a better idea? (I'm fine either way)

@Aitozi
Copy link
Contributor Author

Aitozi commented Jun 14, 2019

Sorry for the late response @pnowojski , I will solve the comments you mentioned.

@Aitozi Aitozi force-pushed the inpoolusage_fix branch from 971d6a7 to 65465bc Compare June 15, 2019 17:36
@Aitozi
Copy link
Contributor Author

Aitozi commented Jun 15, 2019

Hi @pnowojski , I have adjust the test case according to your suggestion, please take a look when you have time, Thanks.

@Aitozi Aitozi force-pushed the inpoolusage_fix branch from f1d2f8c to 5b846b3 Compare July 1, 2019 13:41
@Aitozi Aitozi changed the title [FLINK-12284,FLINK-12637][Network,Metrics]Fix the incorrect inputBufferUsage metric in credit-based network mode [FLINK-12284][Network,Metrics]Fix the incorrect inputBufferUsage metric in credit-based network mode Jul 1, 2019
@Aitozi Aitozi force-pushed the inpoolusage_fix branch 2 times, most recently from 220d01e to a43de92 Compare July 1, 2019 13:50
@Aitozi
Copy link
Contributor Author

Aitozi commented Jul 1, 2019

Ping @zhijiangW

@zhijiangW
Copy link
Contributor

Thanks for the updates @Aitozi ! I left some other comments for tests.

@Aitozi
Copy link
Contributor Author

Aitozi commented Jul 2, 2019

Very thanks for @zhijiangW 's carefully review, please take a look again, hoping this is the last look :)

}
}

private Tuple3<SingleInputGate, List<RemoteInputChannel>, List<LocalInputChannel>> buildInputGate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could only return Tuple2<SingleInputGate, List<RemoteInputChannel>> instead because the list of local channels would not be used any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

.buildRemoteAndSetToGate(inputGate);
}

private LocalInputChannel buildLocalChannel(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it void method after adjusting tuple2 above.

@zhijiangW
Copy link
Contributor

Thanks for the patience and addressing my comments @Aitozi !

I think you might miss one previous comment for using tuple2 instead of tuple3. After fixing this I have no other concerns. 👍 LGTM!

@Aitozi Aitozi force-pushed the inpoolusage_fix branch from c3bc86a to e465536 Compare July 3, 2019 03:29
@Aitozi
Copy link
Contributor Author

Aitozi commented Jul 3, 2019

Ping @pnowojski

@pnowojski
Copy link
Contributor

Thanks for the fix @Aitozi and your patience with our reviews (especially that the scope of this ticket has grown significantly from your initial version). Merging as I can see that on your private travis this is already green.

@pnowojski pnowojski merged commit 36a938a into apache:master Jul 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants