[BEAM-5041] Java Fn SDK Harness use pTransform to track processed graph #6093

angoenka · 2018-07-30T03:13:47Z

Java Sdk Harness used pCollections to keep track of computed consumers here. This is incorrect as consumers are based on pTransforms so pTransforms should be used to keep track of computed consumers.

In case of Flatten, this creates an issue where pTransforms having same input as that to flatten are not executed. This causes

beam/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java

Line 316 in ff95a82

public void testFlattenMultiplePCollectionsHavingMultipleConsumers() {

to fail.

Follow this checklist to help us incorporate your contribution quickly and easily:

Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang	Apex	Dataflow	Flink	Gearpump	Samza	Spark
Go	---	---	---	---	---	---
Java
Python	---		---	---	---	---

angoenka · 2018-07-30T03:18:49Z

Run Java PreCommit

angoenka · 2018-08-01T20:36:53Z

R: @bsidhom @ryan-williams @tweise

ryan-williams · 2018-08-02T14:49:15Z

I'm not familiar with this part of the code, but the change makes sense based on your explanation in the OSS Runners chat.

Slightly obligatory: is there a test that could be added to verify the new/correct behavior?

bsidhom

Looks good, but just echoing Ryan's question: is it possible to validate this does what we want? Or is this already addressed by failing ValidatesRunner tests that fail before but succeed after this change?

bsidhom · 2018-08-02T20:07:05Z

sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java

+              processBundleDescriptor.getPcollectionsMap(),
+              processBundleDescriptor.getCodersMap(),
+              processBundleDescriptor.getWindowingStrategiesMap(),
+              pCollectionIdsToConsumers,


How/when was pCollectionIdsToConsumers modified previously? I would like to know whether this is the right place to update processedPTransformIds.

pCollectionIdsToConsumers is consumed at the time of bundle execution to lookup consumer for the pCollection.

beam/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/DoFnPTransformRunnerFactory.java

Line 282 in 45ea59f

localNameToConsumerBuilder.putAll(

Sorry, I meant when were elements added?

angoenka · 2018-08-02T23:55:59Z

There are no unit tests specific to ProcessBundleHandler.java
Most of the functionality is tested using VR tests at the moment.
We should add it but I don't want to do it in this PR if adding unit tests is not trivial.
I can file a jira to track it.

lukecwik · 2018-08-06T19:50:55Z

This change looks good to me. Thanks for figuring out the flaw in the existing logic since it assumed consumers were only added to the current PCollection that was being iterated over.

Ben, IMO the validates runner test is a better was to make sure that end to end this isn't broken across runners.

I looked at the Python SDK and it builds the transforms by using the topological height of the transforms. We could do something very similar and rely on QueryablePipeline#getTopologicallyOrderedTransforms instead of the current recursive descent logic. This would be a good follow up change if your interested.

lukecwik · 2018-08-06T19:51:27Z

Run Java PreCommit

angoenka · 2018-08-06T19:59:37Z

Yes, Topological sort is a better way to tackle this issue in more logical manner. Current implementation is very close to what topological sort would do but as we already have means to do topological sort, we should use QueryablePipeline#getTopologicallyOrderedTransforms
Jira to track this https://issues.apache.org/jira/browse/BEAM-5090

lukecwik · 2018-08-06T20:45:23Z

Run Java PreCommit

lukecwik · 2018-08-06T21:21:03Z

Run Java PreCommit

bsidhom

Looks good!

angoenka force-pushed the fix_java_sdk_process_bundle branch from 171b1e0 to bd55ae5 Compare July 30, 2018 03:16

angoenka force-pushed the fix_java_sdk_process_bundle branch from bd55ae5 to 7959444 Compare July 30, 2018 23:48

Use pTransforms to keep track of processed graph.

8d96644

angoenka force-pushed the fix_java_sdk_process_bundle branch from 7959444 to 8d96644 Compare August 1, 2018 20:36

bsidhom reviewed Aug 2, 2018

View reviewed changes

Minor method parameter type change.

5b9fa2d

lukecwik self-requested a review August 6, 2018 18:21

lukecwik approved these changes Aug 6, 2018

View reviewed changes

bsidhom approved these changes Aug 7, 2018

View reviewed changes

lukecwik merged commit 2be98cd into apache:master Aug 8, 2018

kennknowles mentioned this pull request Jun 3, 2022

Use topological sort during ProcessBundle in Java SDKHarness #19062

Open

[BEAM-5041] Java Fn SDK Harness use pTransform to track processed graph #6093

[BEAM-5041] Java Fn SDK Harness use pTransform to track processed graph #6093

Uh oh!

Conversation

angoenka commented Jul 30, 2018

beam/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java Line 316 in ff95a82 public void testFlattenMultiplePCollectionsHavingMultipleConsumers() { to fail.

Post-Commit Tests Status (on master branch)

Uh oh!

angoenka commented Jul 30, 2018

Uh oh!

angoenka commented Aug 1, 2018

Uh oh!

ryan-williams commented Aug 2, 2018

Uh oh!

bsidhom left a comment

Choose a reason for hiding this comment

Uh oh!

bsidhom Aug 2, 2018

Choose a reason for hiding this comment

Uh oh!

angoenka Aug 2, 2018

Choose a reason for hiding this comment

Uh oh!

bsidhom Aug 3, 2018

Choose a reason for hiding this comment

Uh oh!

angoenka commented Aug 2, 2018

Uh oh!

lukecwik commented Aug 6, 2018

Uh oh!

lukecwik commented Aug 6, 2018

Uh oh!

angoenka commented Aug 6, 2018

Uh oh!

lukecwik commented Aug 6, 2018

Uh oh!

lukecwik commented Aug 6, 2018

Uh oh!

bsidhom left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

beam/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java

Line 316 in ff95a82

public void testFlattenMultiplePCollectionsHavingMultipleConsumers() {

to fail.