Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14101] [CdapIO] Add ReceiverBuilder for SparkReceiverIO #17111

Merged
merged 1 commit into from
Jul 15, 2022

Conversation

Amar3tto
Copy link
Contributor

@Amar3tto Amar3tto commented Mar 17, 2022

Resolves #24960
Resolves #21673


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented Mar 17, 2022

Codecov Report

Merging #17111 (423e973) into master (e94f33a) will decrease coverage by 0.06%.
The diff coverage is n/a.

❗ Current head 423e973 differs from pull request most recent head 87267d5. Consider uploading reports for the commit 87267d5 to get more accurate results

@@            Coverage Diff             @@
##           master   #17111      +/-   ##
==========================================
- Coverage   74.00%   73.94%   -0.07%     
==========================================
  Files         703      700       -3     
  Lines       92936    92471     -465     
==========================================
- Hits        68779    68375     -404     
+ Misses      22891    22847      -44     
+ Partials     1266     1249      -17     
Flag Coverage Δ
python 83.58% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/go/pkg/beam/core/runtime/graphx/user.go 0.00% <0.00%> (-42.31%) ⬇️
sdks/go/pkg/beam/testing/passert/hash.go 0.00% <0.00%> (-28.00%) ⬇️
.../go/pkg/beam/core/runtime/harness/worker_status.go 51.16% <0.00%> (-21.21%) ⬇️
sdks/go/pkg/beam/core/graph/fn.go 76.91% <0.00%> (-8.43%) ⬇️
sdks/go/pkg/beam/runners/dataflow/dataflow.go 53.02% <0.00%> (-6.76%) ⬇️
...ks/go/pkg/beam/runners/dataflow/dataflowlib/job.go 16.26% <0.00%> (-5.29%) ⬇️
...o/pkg/beam/io/rtrackers/offsetrange/offsetrange.go 75.70% <0.00%> (-4.93%) ⬇️
sdks/go/pkg/beam/testing/passert/passert.shims.go 57.23% <0.00%> (-4.80%) ⬇️
sdks/go/pkg/beam/pardo.go 43.75% <0.00%> (-3.67%) ⬇️
sdks/go/pkg/beam/testing/passert/count.go 76.19% <0.00%> (-2.98%) ⬇️
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e94f33a...87267d5. Read the comment docs.

@Amar3tto
Copy link
Contributor Author

retest this please

1 similar comment
@Amar3tto
Copy link
Contributor Author

retest this please

@Amar3tto
Copy link
Contributor Author

Run Java PreCommit

3 similar comments
@Amar3tto
Copy link
Contributor Author

Run Java PreCommit

@Amar3tto
Copy link
Contributor Author

Run Java PreCommit

@Amar3tto
Copy link
Contributor Author

Run Java PreCommit

@Amar3tto Amar3tto changed the title [BEAM-14101] Add ProxyReceiverBuilder [BEAM-14101] [CdapIO] Add ProxyReceiverBuilder Mar 28, 2022
@Amar3tto
Copy link
Contributor Author

Run Java PreCommit

1 similar comment
@Amar3tto
Copy link
Contributor Author

Run Java PreCommit

@Amar3tto Amar3tto changed the title [BEAM-14101] [CdapIO] Add ProxyReceiverBuilder [BEAM-14101] [CdapIO] Add ReceiverBuilder May 10, 2022
@Amar3tto
Copy link
Contributor Author

Run Java_PVR_Flink_Batch PreCommit

1 similar comment
@Amar3tto
Copy link
Contributor Author

Run Java_PVR_Flink_Batch PreCommit

@Amar3tto
Copy link
Contributor Author

Run Java PreCommit

@Amar3tto
Copy link
Contributor Author

Run Java_PVR_Flink_Batch PreCommit

2 similar comments
@elizaveta-lomteva
Copy link
Contributor

Run Java_PVR_Flink_Batch PreCommit

@Amar3tto
Copy link
Contributor Author

Run Java_PVR_Flink_Batch PreCommit

@Amar3tto Amar3tto changed the title [BEAM-14101] [CdapIO] Add ReceiverBuilder [BEAM-14101] [CdapIO] Add ReceiverBuilder for SparkReceiverIO May 16, 2022
@elizaveta-lomteva
Copy link
Contributor

Run Java_PVR_Flink_Batch PreCommit

@Amar3tto Amar3tto marked this pull request as ready for review May 16, 2022 18:52
@Amar3tto
Copy link
Contributor Author

retest this please

@elizaveta-lomteva
Copy link
Contributor

Run Java_PVR_Flink_Batch PreCommit

1 similar comment
@Amar3tto
Copy link
Contributor Author

Run Java_PVR_Flink_Batch PreCommit

@elizaveta-lomteva
Copy link
Contributor

Run Java_PVR_Flink_Docker PreCommit

@elizaveta-lomteva
Copy link
Contributor

Run Java_PVR_Flink_Batch PreCommit

@elizaveta-lomteva
Copy link
Contributor

Run Java_Examples_Dataflow PreCommit

@elizaveta-lomteva
Copy link
Contributor

Run Java PreCommit

@Amar3tto
Copy link
Contributor Author

Run Java PreCommit

@elizaveta-lomteva
Copy link
Contributor

Run Java_PVR_Flink_Docker PreCommit

@elizaveta-lomteva
Copy link
Contributor

Run Java_PVR_Flink_Batch PreCommit

@elizaveta-lomteva
Copy link
Contributor

Run Java PreCommit

@elizaveta-lomteva
Copy link
Contributor

"Run Website_Stage_GCS PreCommit

@Amar3tto
Copy link
Contributor Author

Amar3tto commented Jun 2, 2022

Run Website_Stage_GCS PreCommit

@aaltay
Copy link
Member

aaltay commented Jun 6, 2022

@aromanenko-dev - do you know anyone who might be able to review this?

@aromanenko-dev
Copy link
Contributor

@aaltay I'll take a look

@aromanenko-dev aromanenko-dev self-requested a review June 7, 2022 10:30
Copy link
Contributor

@aromanenko-dev aromanenko-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I left my comments, ptal.

PS: Please, squash multiple commits into one or several atomic ones (if it's needed to have more than one) and avoid merge commits to keep a commit history clean (use git rebase instead).


/** Transforms for reading and writing from streaming CDAP plugins. */
@Experimental(Kind.SOURCE_SINK)
package org.apache.beam.sdk.io.sparkreceiver;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a dedicated IO package for this since this builder, IIUC, is mostly a part of CdapIO and doesn't provide any read/write IO API for users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the CdapIO Connector project, we are implementing another IO, this is SparkReceiverIO. It will provide an API to read from the Custom Spark Receiver, which we will call in CdapIO. The SparkReceiverIO will be independent of the CdapIO, and this builder is part of the SparkReceiverIO, so we've put this code in a dedicated sparkreceiver IO package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will SparkReceiverIO provide a separate user API for read and/or write in this case that is not dependent on CdapIO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. The SparkReceiverIO will provide a read API that can be used with the custom receiver independently of the Cdap IO

@aromanenko-dev
Copy link
Contributor

aromanenko-dev commented Jun 20, 2022

Kind ping on this. What is a state for this PR? Is it ready for another round of review?

@johnjcasey
Copy link
Contributor

@aromanenko-dev I asked, and they said this is ready for another round of review. The only issue they had ran into is that they are getting nullness warnings when they use checkArgument instead of using an if statement

@elizaveta-lomteva
Copy link
Contributor

@aromanenko-dev

PS: Please, squash multiple commits into one or several atomic ones (if it's needed to have more than one) and avoid merge commits to keep a commit history clean (use git rebase instead).

Could you tell me please what you mean exactly? Do we need to squash the published comments into one or do we need to merge them before pushing them to a remote repository?

Could the person who will be merging the PR use the squash and merge option?

@elizaveta-lomteva
Copy link
Contributor

Run Java PreCommit

@aromanenko-dev
Copy link
Contributor

aromanenko-dev commented Jun 23, 2022

@Lizzfox

Could you tell me please what you mean exactly?

You need to rebase your feature branch with master branch (e.g. git checkout <feature-branch> && git rebase). It will put your commit(s), that you committed in your feature, on top of master branch.

Also, it's usually better to have your local master up-to-date with origin one (e.g. git checkout master && git fetch origin && git pull).

And, please, avoid merge master branch into your feature branch, use git rebase instead, as I mentioned above.

Do we need to squash the published comments into one or do we need to merge them before pushing them to a remote repository?

Yes, you need to squash all commits into one or into several logical ones with a proper commit message. That will help to keep commits history more clear and evident. Also, it should be much easier to rollback such changes if needed.

Could the person who will be merging the PR use the squash and merge option?

Yes, it's possible, but it will squash all your commits into only one and the squashed commits should not contain merge commits (like it has for now, as I can see).

Feel free to ping me on the-asf channel on Slack if you have more questions on this!

@aromanenko-dev
Copy link
Contributor

aromanenko-dev commented Jun 23, 2022

@Lizzfox Java PreCommit job fails because of this:

18:41:01 > Task :sdks:java:io:sparkreceiver:compileJava
18:41:01 /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Phrase/src/sdks/java/io/sparkreceiver/src/main/java/org/apache/beam/sdk/io/sparkreceiver/ReceiverBuilder.java:84: error: [dereference.of.nullable] dereference of possibly-null reference currentConstructor
18:41:01     currentConstructor.setAccessible(true);

Could you fix it?

You can test it locally with this command:

./gradlew :sdks:java:io:sparkreceiver:check

@Amar3tto
Copy link
Contributor Author

Run Go PreCommit

@Amar3tto
Copy link
Contributor Author

Run Java PreCommit

@aromanenko-dev
Copy link
Contributor

@Amar3tto Could you rebase your feature branch against master and squash your commits? Thanks!

@Amar3tto
Copy link
Contributor Author

Amar3tto commented Jun 24, 2022

@Amar3tto Could you rebase your feature branch against master and squash your commits? Thanks!

@aromanenko-dev I've squashed all commits into one.

@Amar3tto
Copy link
Contributor Author

Run Java PreCommit

@aromanenko-dev
Copy link
Contributor

@Amar3tto @Lizzfox
I'm sorry for delay with a review, I was on holidays.

Thanks for addressing the comments! It looks ok for me in general but it's still not clear why do we need a new dedicated IO for that since it doesn't provide any IO user API? Could you give more details about the next steps on this IO?

@elizaveta-lomteva
Copy link
Contributor

@aromanenko-dev

why do we need a new dedicated IO for that since it doesn't provide any IO user API? Could you give more details about the next steps on this IO?

The SparkReceiverIO will provide a user interface for reading data from the Spark Streaming Custom Receiver in Apache Beam pipelines. Users will be able to use this IO to read data from their own Spark Receivers.

The next step is to implement the read interface. We have SparkReceiverIO Read draft PR that is in progress for now. I hope it would clarify the details.

We just decided to divide PRs to make them easier to review. So the current PR doesn't implement any user interface 🙂

Copy link
Contributor

@aromanenko-dev aromanenko-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aromanenko-dev aromanenko-dev merged commit b0225ca into apache:master Jul 15, 2022
@aromanenko-dev
Copy link
Contributor

@Lizzfox Thanks for details! Feel free to ping me for other PRs when they are ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants