Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31236][DSTREAMS][Kinesis] KCL 2 support added to solve few ongoing issue with KCL 1 implementation #28581

Closed
wants to merge 4 commits into from

Conversation

tprabh509
Copy link

What changes were proposed in this pull request?

Currently KCL 1 is no longer supported by AWS and they already moved to KCL 2. Since Spark kinesis asl library user KCL 1 it can cause issues reported by AWS for KCL 1. awslabs/amazon-kinesis-client#391
The issue I have already reported in JIRA
https://issues.apache.org/jira/browse/SPARK-31236

Why are the changes needed?

Added KCL 2 support. Did not remove KCL 1. With current KCL 1 implementation the user can run into few limitation

  1. Application cannot use kinesis direct end. We can use custom URL with KCL 2
  2. dynamoDB proxy cannot be used. With KCL 2 added support for dynamoDB proxy
  3. cloud watch direct endpoint cannot be used. We can use custom URL
    Application cannot run without internet connection or firewall restrictions.

Does this PR introduce any user-facing change?

Added KCL 2 support. Did not remove KCL 1. With current KCL 1 implementation the user can run into few limitation

  1. Application cannot use kinesis direct end.
  2. dynamoDB proxy cannot be used
  3. cloud watch direct endpoint cannot be used
    Application cannot run without internet connection or firewall restrictions.

How was this patch tested?

Tested with our application and test client updated with KCL 2 testing.

import org.apache.spark.streaming.kinesis2.KinesisInputDStream;
import org.apache.spark.streaming.kinesis2.SparkAWSCredentials;

SparkAWSCredentials credentials = SparkAWSCredentials.builder().basicCredentials(awsKey, awsSecret).build();

URI uri = new URI(endpointURL);
URI cloudWatchURI = new URI(cloudWatchURL);

InitialPositionInStream initPosition = InitialPositionInStream.TRIM_HORIZON;

KinesisInputDStream<byte[]> kinStream =KinesisInputDStream.builder()
.streamingContext(jssc)
.checkpointAppName(applicationName)
.streamName(streamName)
.regionName(regionName)
.endpointUrl(uri)
.cloudWatchUrl(cloudWatchURI)
.kinesisCreds(credentials)
.dynamoDBCreds(credentials)
.maxRecords(maxRecords)
.protocol(httpProtocol)
.initialPositionInStream(initPosition)
.cloudWatchCreds(credentials)
.dynamoProxyHost(proxyHost)
.dynamoProxyPort(proxyPort)
.checkpointInterval(checkpointInterval)
.storageLevel(StorageLevel.MEMORY_AND_DISK_2()).build();

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@tprabh509 tprabh509 changed the title KCL 2 support added to solve few ongoing issue with KCL 1 implementation [SPARK-31236][DSTREAMS][Kinesis] KCL 2 support added to solve few ongoing issue with KCL 1 implementation May 19, 2020
val initialPositionInStream: Option[InitialPositionInStream],
val dynamoProxyHost: Option[String],
val dynamoProxyPort: Option[Integer],
val _storageLevel: StorageLevel,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have underscore?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was copied from KCL 1 parameter list. Anyway fixes for new version. _ is removed.

@github-actions
Copy link

github-actions bot commented Sep 4, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Sep 4, 2020
@pthukar
Copy link

pthukar commented Sep 4, 2020

Hi. This PR is required since KCL 1 implementation is already out of support from AWS and also there are already known issues reported in AWS support site. Our application already faced issue using current API implementation so we moved with implementation of KCL 2. This pull request useful and needs to be merged so others teams can make use of this implementation.

@github-actions github-actions bot closed this Sep 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants