-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEMO-472] Implement Intermediate Combine #318
base: master
Are you sure you want to change the base?
Conversation
implemented compile time pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did my first pass. Overall, looks good to me. Thanks for the work @Kangji !
common/src/main/java/org/apache/nemo/common/ir/IRDAGChecker.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/apache/nemo/common/ir/IRDAGChecker.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/nemo/common/ir/vertex/executionproperty/ShuffleExecutorSetProperty.java
Outdated
Show resolved
Hide resolved
@@ -45,9 +45,9 @@ | |||
* @param <InputT> input type | |||
* @param <OutputT> output type | |||
*/ | |||
public final class GBKTransform<K, InputT, OutputT> | |||
public final class CombineTransform<K, InputT, OutputT> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why the class name is changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because GroupByKey Transform in Beam semantics can be represented as Combine PerKey Transform, and I thought that this class represents Combine PerKey rather than GroupByKey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, GroupByKey Transform is not always be represented as Combine PerKey Transform, so changing the name is confusing to me. For instance, CoGroupByKey is not combining, but it is also represented as GroupByKey as far as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually have a separate GroupByKeyTransform. We have renamed the GBKTransform since it actually works as a CombineTransform. It was misnamed in the first place.
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
@taegeonum Thanks for the review! I've addressed your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kangji I did another pass.
...main/java/org/apache/nemo/common/ir/vertex/executionproperty/ShuffleExecutorSetProperty.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/nemo/common/ir/vertex/executionproperty/ShuffleExecutorSetProperty.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/nemo/common/ir/vertex/executionproperty/ShuffleExecutorSetProperty.java
Outdated
Show resolved
Hide resolved
...on/src/main/java/org/apache/nemo/common/ir/vertex/utility/IntermediateAccumulatorVertex.java
Outdated
Show resolved
Hide resolved
import java.util.HashSet; | ||
|
||
/** | ||
* List of set of node names to limit the scheduling of the tasks of the vertex to while shuffling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation is not clear to me. Does this property set the destination executor for the output of intermediate vertex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It limits the sources of the data that each task reads from, depending on where the task is located at. I'll add the explanation.
@@ -45,9 +45,9 @@ | |||
* @param <InputT> input type | |||
* @param <OutputT> output type | |||
*/ | |||
public final class GBKTransform<K, InputT, OutputT> | |||
public final class CombineTransform<K, InputT, OutputT> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, GroupByKey Transform is not always be represented as Combine PerKey Transform, so changing the name is confusing to me. For instance, CoGroupByKey is not combining, but it is also represented as GroupByKey as far as I know.
...nemo/compiler/optimizer/pass/compiletime/reshaping/IntermediateAccumulatorInsertionPass.java
Outdated
Show resolved
Hide resolved
...nemo/compiler/optimizer/pass/compiletime/reshaping/IntermediateAccumulatorInsertionPass.java
Outdated
Show resolved
Hide resolved
@Kangji Any update? |
not yet... :( It has been delayed due to the fall semester, even though i'm trying to do asap. I'll let you know. |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
@taegeonum Can you take a final look? |
JIRA: NEMO-472: Fix and Implement Hierarchical Aggregation
Major changes:
[NEMO-472: Implement Hierarchical Aggregation] aims to add additional intermediate accumulation operator in front of final combine operator that accumulates data among physically nearby containers prior to shuffling across WAN, when needed. It is expected that data aggregation among nearby containers will reduce the data size that must be transferred across WAN. To achieve it,
Minor changes to note:
Tests for the changes:
Other comments: