-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-11392][network] Introduce ShuffleEnvironment interface #8608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
|
@flinkbot attention @zhijiangW |
d92a811 to
e57ce16
Compare
| long timerServiceShutdownTimeout, | ||
| RetryingRegistrationConfiguration retryingRegistrationConfiguration, | ||
| Optional<Time> systemResourceMetricsProbingInterval) { | ||
| InetAddress taskManagerAddress, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the previous indentation
| MemoryType memoryType, | ||
| boolean preAllocateMemory, | ||
| float memoryFraction, | ||
| int pageSize, long timerServiceShutdownTimeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate line for parameters
|
|
||
| nettyConfig = new NettyConfig(taskManagerInetSocketAddress.getAddress(), taskManagerInetSocketAddress.getPort(), | ||
| getPageSize(configuration), ConfigurationParserUtils.getSlot(configuration), configuration); | ||
| ConfigurationParserUtils.getPageSize(configuration), ConfigurationParserUtils.getSlot(configuration), configuration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate line for all the parameters
| Configuration config) throws IOException { | ||
| return TaskManagerServicesConfiguration.fromConfiguration( | ||
| config, MEM_SIZE_PARAM, InetAddress.getLocalHost(), true); | ||
| config, InetAddress.getLocalHost()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate line for every parameter or we do not need break line here because it might not be long after removing above parameters.
|
|
||
| /** | ||
| * Verifies that {@link TaskManagerServicesConfiguration#fromConfiguration(Configuration, long, InetAddress, boolean)} | ||
| * Verifies that {@link TaskManagerServicesConfiguration#fromConfiguration(Configuration, InetAddress)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is only for verifying the generation of NetworkEnvironmentConfiguration. The scope of TaskManagerServicesConfiguration#fromConfiguration covers many things. Also this class name TaskManagerServicesConfigurationTest might also be renamed to NetworkEnvironmentConfigurationTest or something else.
| this.isShutdown = false; | ||
| } | ||
|
|
||
| public static NetworkEnvironment fromConfiguration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to put this static factory at the end of this class, also for the following create method.
| maxJvmHeapMemory, | ||
| localTaskManagerCommunication, | ||
| taskManagerAddress); | ||
| return NetworkEnvironment.create(networkConfig, taskEventPublisher, metricGroup, ioManager); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done in previous hotfix commit while introducing fromConfiguration firstly.
|
|
||
| public static NetworkEnvironment create( | ||
| NetworkEnvironmentConfiguration config, | ||
| NetworkEnvironmentConfiguration config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the changes for commit [FLINK-11392][network] Introduce ShuffleEnvironment interface, this indentation should not be changed.
| public boolean updatePartitionInfo( | ||
| ExecutionAttemptID consumerID, | ||
| PartitionInfo partitionInfo) throws IOException, InterruptedException { | ||
| ExecutionAttemptID consumerID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: keep previous indentation
| } | ||
| } | ||
|
|
||
| @Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM this method is only used for testing, so it seems no reasonable to define an interface method only for testing. If the previous test could be refactored not relying on this method. I ever considered this issue to remove this method.
| final TaskManagerRuntimeInfo taskManagerRuntimeInfo = new TestingTaskManagerRuntimeInfo(); | ||
|
|
||
| final NetworkEnvironment networkEnv = new NetworkEnvironmentBuilder().build(); | ||
| final ShuffleEnvironment networkEnv = new NetworkEnvironmentBuilder().build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename shuffleEnvironment for networkEnv
| Executor executor = mock(Executor.class); | ||
|
|
||
| NetworkEnvironment network = new NetworkEnvironmentBuilder().build(); | ||
| ShuffleEnvironment network = new NetworkEnvironmentBuilder().build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: rename network as shuffleEnvironment
| mock(MemoryManager.class), | ||
| mock(IOManager.class), | ||
| networkEnvironment, | ||
| shuffleEnvironment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation issue here.
| taskConfig); | ||
|
|
||
| NetworkEnvironment network = new NetworkEnvironmentBuilder().build(); | ||
| ShuffleEnvironment network = new NetworkEnvironmentBuilder().build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename network as shuffleEnvironment
| import java.util.Collection; | ||
|
|
||
| /** | ||
| * Interface for the implementation of shuffle service locally on task executor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shuffle service -> shuffle environment
| /** | ||
| * Interface for the implementation of shuffle service locally on task executor. | ||
| * | ||
| * <p>Input/Output interface of local shuffle service is based on memory {@link org.apache.flink.runtime.io.network.buffer.Buffer}s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shuffle service -> shuffle environment
| * Interface for the implementation of shuffle service locally on task executor. | ||
| * | ||
| * <p>Input/Output interface of local shuffle service is based on memory {@link org.apache.flink.runtime.io.network.buffer.Buffer}s. | ||
| * The task can request next available memory buffers from created here {@link ResultPartitionWriter}s to write shuffle data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove next?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then available as well if we do not want to provide a lot of details here
| * | ||
| * <p>Input/Output interface of local shuffle service is based on memory {@link org.apache.flink.runtime.io.network.buffer.Buffer}s. | ||
| * The task can request next available memory buffers from created here {@link ResultPartitionWriter}s to write shuffle data | ||
| * and buffers from created here {@link InputGate}s to read it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the buffers are not requested from InputGate by task, and it is done by network thread.
Maybe and the task can read buffers from created here {@link InputGate}s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think requests has similar meaning in this context because task still gets the buffers from gates (network thread is an internal detail how to give buffers to the Task) but I will rephrase it a bit.
| * The task can request next available memory buffers from created here {@link ResultPartitionWriter}s to write shuffle data | ||
| * and buffers from created here {@link InputGate}s to read it. | ||
| * | ||
| * <h2>Service lifecycle management.</h2> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Service here, only Lifecycle management
| * | ||
| * <p>The interface implements a factory of result partition writers for the task output: {@code createResultPartitionWriters}. | ||
| * The created writers are grouped per task and handed over to the task thread upon its startup. | ||
| * The task is responsible for the writers lifecycle from that moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writers -> writers'
| * <li>{@code releasePartitions(Collection<ResultPartitionID>)} is called outside of the task thread, | ||
| * e.g. to manage the local resource lifecycle of external partitions which outlive the task production.</li> | ||
| * </ol> | ||
| * The partitions, which currently still occupy local resources, can be queried with {@code updatePartitionInfo}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queried -> updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the method is wrong :) should be updatePartitionInfo -> getUnreleasedPartitions
| * | ||
| * <p>The interface implements a factory for the task input gates: {@code createInputGates}. | ||
| * The created gates are grouped per task and handed over to the task thread upon its startup. | ||
| * The task is responsible for the gates lifecycle from that moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gates -> gates'
| public interface ShuffleEnvironment { | ||
|
|
||
| /** | ||
| * Starts the internal related services upon {@link TaskExecutor}'s startup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starts -> Start to keep the same form with the following methods.
| * Interface for the implementation of shuffle service local environment. | ||
| * | ||
| * <p>Input/Output interface of local shuffle service environment is based on memory {@link Buffer Buffers}. | ||
| * A producer can write shuffle data into the buffers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that putting subordinate clauses on separate lines increase readability. If it does, then we break this convention later in the docs by putting multiple sentences on the same line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's stick to keeping similar line lengths as in other classes.
| * | ||
| * <p>Input/Output interface of local shuffle service environment is based on memory {@link Buffer Buffers}. | ||
| * A producer can write shuffle data into the buffers, | ||
| * obtained from the created here {@link ResultPartitionWriter ResultPartitionWriters} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here seems wrong
| * <p>Input/Output interface of local shuffle service environment is based on memory {@link Buffer Buffers}. | ||
| * A producer can write shuffle data into the buffers, | ||
| * obtained from the created here {@link ResultPartitionWriter ResultPartitionWriters} | ||
| * and a consumer read the buffers from the created here {@link InputGate InputGates}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read -> reads and here seems wrong
| * if the production has failed.</li> | ||
| * <li>{@link ResultPartitionWriter#finish()} and {@link ResultPartitionWriter#close()} are called | ||
| * if the production is done. The actual release can take some time | ||
| * if 'the end of consumption' confirmation is being awaited implicitly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now I got it. You are talking here about the ResultPartition and not the ResultPartitionWriter. This makes sense now.
tillrohrmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments about the JavaDocs of ShuffleEnvironment.
tillrohrmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for resolving all of my comments @azagrebin. The changes now look really good. I'll be merging the PR now. While merging I'll address my own last two comments.
| this.hostAddress = checkNotNull(hostAddress); | ||
| this.eventPublisher = eventPublisher; | ||
| this.parentMetricGroup = parentMetricGroup; | ||
| this.ioManager = ioManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not doing the checkNotNull for the last three parameters?
| RetryingRegistrationConfiguration retryingRegistrationConfiguration, | ||
| Optional<Time> systemResourceMetricsProbingInterval) { | ||
| this.configuration = configuration; | ||
| this.resourceID = resourceID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkNotNull missing for these parameters.
Move <Flink configuration> -> NetworkEnvironmentConfiguration parsing from TaskManagerServicesConfiguration to NetworkEnviroment.fromConfiguration(flinkConfiguration) factory method and use it to create NetworkEnviroment in TaskManagerServices. This way TaskManagerServices does not depend on network internals. Memory page size is additionally parsed for TaskManagerServicesConfiguration to decouple it from NetworkEnvironmentConfiguration because the page size is used also for memory manager outside of NetworkEnviroment. Theoretically shuffle implementations can have their own page size in future, different from TaskManagerOptions.MEMORY_SEGMENT_SIZE.
|
Thanks for the review @tillrohrmann, I've pushed the final version with the last changes including yours. |
| return parentMetricGroup; | ||
| } | ||
|
|
||
| public IOManager getIoManager() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be getIOManager, upper case for both I and O
| * @param localCommunicationOnly True if only local communication is possible. | ||
| * Use only in cases where only one task manager runs. | ||
| * | ||
| * @return TaskExecutorConfiguration that wrappers InstanceConnectionInfo, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return TaskManagerServicesConfiguration
| return memoryType; | ||
| } | ||
|
|
||
| public long getFreeHeapMemoryWithDefrag() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package private
| return freeHeapMemoryWithDefrag; | ||
| } | ||
|
|
||
| public long getMaxJvmHeapMemory() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto: package private
| return taskManagerAddress; | ||
| } | ||
|
|
||
| public boolean isLocalCommunicationOnly() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package private
|
@azagrebin @tillrohrmann sorry for leaving several nit comments at the point of merging this PR. |
|
Thanks @zhijiangW , addressed |
|
Thanks for the review @zhijiangW and the good work @azagrebin. Merging now. |
What is the purpose of the change
Extract
ShuffleEnvironmentinterface fromNetworkEnvironment.Brief change log
TaskManagerServicesConfigurationbecause this general option is also used forMemoryManagerindependently fromNetworkEnvironment. Any otherShuffleEnvironmentcan potentially use its own page size.ShuffleEnvironmentinterface fromNetworkEnvironmentwith its public methods.NetworkEnvironmentits default implementationNetworkEnvironmenttoNettyShuffleEnvironment.Verifying this change
Refactoring covered by existing tests.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (no)Documentation