Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Destination: Kafka #3746

Merged
merged 30 commits into from
Jul 22, 2021

Conversation

mmolimar
Copy link
Contributor

@mmolimar mmolimar commented May 30, 2021

Destination for Apache Kafka.

Related with #1855

Checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • Secrets are annotated with airbyte_secret in the connector's spec
  • Credentials added to Github CI if needed and not already present. instructions for injecting secrets into CI.
  • Unit & integration tests added as appropriate (and are passing)
    • Community members: please provide proof of this succeeding locally e.g: screenshot or copy-paste acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • /test connector=connectors/<name> command as documented here is passing.
    • Community members can skip this, Airbyters will run this for you.
  • Code reviews completed
  • Documentation updated
    • README.md
    • docs/SUMMARY.md if it's a new connector
    • Created or updated reference docs in docs/integrations/<source or destination>/<name>.
    • Changelog in the appropriate page in docs/integrations/.... See changelog example
    • docs/integrations/README.md contains a reference to the new connector
    • Build status added to build page
  • Build is successful
  • Connector version bumped like described here
  • New Connector version released on Dockerhub by running the /publish command described here
  • No major blockers
  • PR merged into master branch
  • Follow up tickets have been created
  • Associated tickets have been closed & stakeholders notified

@auto-assign auto-assign bot requested review from ChristopheDuong and jrhizor May 30, 2021 17:02
@marcosmarxm
Copy link
Member

that's amazing @mmolimar

@marcosmarxm marcosmarxm requested review from sherifnada and subodh1810 and removed request for ChristopheDuong May 30, 2021 18:33
@michel-tricot
Copy link
Contributor

Thank you @mmolimar !!

@sherifnada
Copy link
Contributor

@mmolimar incredible, thank you for sharing your connector! Will review very soon -- need to bootstrap my Kafka knowledge. Are you blocked on merging this to Airbyte to be able to use it in your instance?

Copy link
Contributor

@subodh1810 subodh1810 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmolimar This looks great. I am just curious about the setup options that end user will have. Chances are that the end user might not know the ideal value for all the options for their use case. For instance what should be the right Batch size , Buffer memory , Max request size and in that case they might go ahead with the default option. Is the default option ideal for all the use cases? Also can we add more information in the docs about these setup options to help people figure out the right values for these based on their use case

@mmolimar
Copy link
Contributor Author

mmolimar commented Jun 2, 2021

@mmolimar incredible, thank you for sharing your connector! Will review very soon -- need to bootstrap my Kafka knowledge. Are you blocked on merging this to Airbyte to be able to use it in your instance?

Thanks @sherifnada
Actually I'm not blocked right now but it'd be great to know your thoughts about this to get it merge ;-)

@mmolimar
Copy link
Contributor Author

mmolimar commented Jun 2, 2021

@mmolimar This looks great. I am just curious about the setup options that end user will have. Chances are that the end user might not know the ideal value for all the options for their use case. For instance what should be the right Batch size , Buffer memory , Max request size and in that case they might go ahead with the default option. Is the default option ideal for all the use cases? Also can we add more information in the docs about these setup options to help people figure out the right values for these based on their use case

Thanks for your comments @subodh1810!
The producer config is very configurable to fit your specific use case (delivery guarantees, throughput, latency, etc.) so there is not an ideal config for all scenarios. The configs set by default are the ones that the Kafka Producer has in its API.
Btw, I added to the README a link with the Kafka Producer configs (not all of them are in the spec but just the most relevant ones).

@marcosmarxm
Copy link
Member

Thanks @mmolimar, do you think this is ready to review, or do you need any assistance?

@mmolimar
Copy link
Contributor Author

I think it's fine. Just to resolve this little conflict

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Jun 12, 2021
@github-actions github-actions bot added the area/connectors Connector related issues label Jun 16, 2021
@jrhizor jrhizor removed their request for review June 30, 2021 22:04
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmolimar Thanks again for creating the PR. This is going to be huge. Left a review!

protected void setup(TestDestinationEnv testEnv) {
kafka = new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:6.1.1"));
kafka.start();
try (var ignored = AdminClient.create(Map.of(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this block? Could you add a source code comment to help clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I was creating the topics via the AdminClient but I just removed it.


private void sendRecordInTransaction(ProducerRecord<String, JsonNode> record) throws Exception {
try {
producer.beginTransaction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the value of creating a transaction on a single record? isn't that equivalent to having no transactionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removed this. Having acks=all and sync=true it's enough


@Override
protected void startTracked() {
Map<AirbyteStreamNameNamespacePair, String> mapped = catalog.getStreams().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we refactor this logic into a method which returns topicMap and then write some unit tests for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

final JsonNode value = Jsons.jsonNode(ImmutableMap.of(
JavaBaseConstants.COLUMN_NAME_AB_ID, key,
JavaBaseConstants.COLUMN_NAME_EMITTED_AT, recordMessage.getEmittedAt(),
JavaBaseConstants.COLUMN_NAME_DATA, recordMessage.getData()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a user is writing to a hardcoded stream, they have no way of knowing which stream this came from. Should we include the stream name in the output record?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I just added it

@marcosmarxm marcosmarxm changed the title New destination: Kafka 🎉 New Destination: Kafka Jul 9, 2021
@mmolimar
Copy link
Contributor Author

mmolimar commented Jul 17, 2021

Hey @sherifnada !
I've addressed all your comments and made the changes. Let me know your thoughts ;-)

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmolimar thanks for the great PR! LGTM - just a few suggestions for docs/wording but I think we can release this pretty soon!

mmolimar and others added 9 commits July 20, 2021 16:10
…tions.yaml

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
…a/io/airbyte/integrations/destination/kafka/KafkaRecordConsumer.java

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
@sherifnada sherifnada linked an issue Jul 22, 2021 that may be closed by this pull request
@sherifnada sherifnada merged commit fc3c692 into airbytehq:master Jul 22, 2021
@sherifnada
Copy link
Contributor

Thank you @mmolimar ! Amazing contribution 🎉 🎉 🎉 🎉 🎉 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Destination: Kafka
6 participants