-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-5041] Java Fn SDK Harness use pTransform to track processed graph #6093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
171b1e0 to
bd55ae5
Compare
|
Run Java PreCommit |
bd55ae5 to
7959444
Compare
7959444 to
8d96644
Compare
|
I'm not familiar with this part of the code, but the change makes sense based on your explanation in the OSS Runners chat. Slightly obligatory: is there a test that could be added to verify the new/correct behavior? |
bsidhom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but just echoing Ryan's question: is it possible to validate this does what we want? Or is this already addressed by failing ValidatesRunner tests that fail before but succeed after this change?
| processBundleDescriptor.getPcollectionsMap(), | ||
| processBundleDescriptor.getCodersMap(), | ||
| processBundleDescriptor.getWindowingStrategiesMap(), | ||
| pCollectionIdsToConsumers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How/when was pCollectionIdsToConsumers modified previously? I would like to know whether this is the right place to update processedPTransformIds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pCollectionIdsToConsumers is consumed at the time of bundle execution to lookup consumer for the pCollection.
beam/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/DoFnPTransformRunnerFactory.java
Line 282 in 45ea59f
| localNameToConsumerBuilder.putAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant when were elements added?
|
There are no unit tests specific to ProcessBundleHandler.java |
|
This change looks good to me. Thanks for figuring out the flaw in the existing logic since it assumed consumers were only added to the current PCollection that was being iterated over. Ben, IMO the validates runner test is a better was to make sure that end to end this isn't broken across runners. I looked at the Python SDK and it builds the transforms by using the topological height of the transforms. We could do something very similar and rely on |
|
Run Java PreCommit |
|
Yes, Topological sort is a better way to tackle this issue in more logical manner. Current implementation is very close to what topological sort would do but as we already have means to do topological sort, we should use |
|
Run Java PreCommit |
1 similar comment
|
Run Java PreCommit |
bsidhom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Java Sdk Harness used pCollections to keep track of computed consumers here. This is incorrect as consumers are based on pTransforms so pTransforms should be used to keep track of computed consumers.
In case of Flatten, this creates an issue where pTransforms having same input as that to flatten are not executed. This causes
beam/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java
Line 316 in ff95a82
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username) to look at it.Post-Commit Tests Status (on master branch)