Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added data cache in receiver #608

Merged
merged 1 commit into from
Aug 12, 2021
Merged

added data cache in receiver #608

merged 1 commit into from
Aug 12, 2021

Conversation

nyaghma
Copy link
Contributor

@nyaghma nyaghma commented Jul 28, 2021

In some situations, spark re-executes a stream multiple times (e.g. multiple actions or writers using the same reader stream). This results in having multiple receivers using the same consumer group-partition combo which is against the connector's best practice guideline. In order to avoid such situations, this PR adds a data cache in the cachedReceiver class which keeps the data received in the latest batch in order to re-use it if spark decides to re-execute the stream.

@nyaghma nyaghma requested a review from sjkwak July 28, 2021 22:01
@sjkwak sjkwak merged commit 6b1b0eb into master Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants