Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform load balancing when communicating with a cluster of Alternators #132

Closed
wants to merge 5 commits into from

Conversation

julienrf
Copy link
Collaborator

@julienrf julienrf commented May 1, 2024

Customize the emr-dynamodb-connector to configure the DynamoDB client to use the load balancing request handler from https://github.com/scylladb/alternator-load-balancing.

Unfortunately, the project emr-dynamodb-connector has not been designed to support customization of the underlying DynamoDB client. Therefore, I had to copy-paste code from the project and change it to customize the underlying DynamoDB client. This led to creating a parallel class hierarchy co-existing with the existing one, but with names all prefixed by LoadBalanced.

The PR should be reviewed commit-by-commit and ignoring the content of 424141b and 2b7f076 which copy-paste verbatim content from emr-dynamodb-connector.

Blocked by scylladb/alternator-load-balancing#18
Fixes #117

* @param item The item retrieved from DynamoDB
* @param toValue Where the converted item should be stored
*/
protected abstract void convertDynamoDBItemToValue(DynamoDBItemWritable item, V toValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what other classes implemented this?
Aren't we loosing support for other key types than Text ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only implementation in emr-dynamodb-connector was DefaultDynamoDBRecordReader, which also fixes the key type to Text. Maybe there are other implementations in the wild…

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to focus just on what is in emr-dynamodb-connector

@@ -38,26 +39,23 @@
* convertDynamoDBItemToValue with a conversion from the DynamoDB item to the desired value type. As
* an alternative, you can also use DefaultDynamoDBRecordReader, and do the conversion to a
* DynamoDBItemWritable inside of the reducer.
*
* @param <K> The type of Key that will be passed to the Mapper
* @param <V> The type of Value that will be passed to the Mapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand you are simplifying, but I don't get how only Text can be the key
(I didn't check DynamoDBItemWritable but I assume it's an interface to most data types value can have in Dynamo)

Copy link
Collaborator Author

@julienrf julienrf May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadoop requires a key, but here in emr-dynamodb-connector they always use an empty Text instance as a key. So, it is not really used. DynamoDBItemWritable is a class that only contains a Map<String, AttributeValue> (where AttributeValue comes from the AWS Java SDK and models a column in a DynamoDB row).

@julienrf
Copy link
Collaborator Author

julienrf commented May 6, 2024

Unfortunately, the PR contains a lot of changes, but in my opinion this is still the best approach to go for now.

I’ve also considered using reflection-based approaches to access the private DynamoDB client of the Hadoop job to configure it to supply our custom request handler, but this is not simple to achieve because we don’t have access to the instance that holds that private DynamoDB client. This means that we would have to instrument the bytecode to impact the places that create and configure the DynamoDB client. This could be achieved with tools like AspectJ or Java Instrumentation, but in my opinion using those tools would not make our maintenance work simpler because they require some specific expertise.

On the other hand, with simple changes in emr-dynamodb-connector, we could easily achieve our goal. For instance, if awslabs/emr-dynamodb-connector#196 is merged, we will be able to set up our custom request handler like so:

// in DynamoUtils.setDynamoDBJobConf
jobConf.set(DynamoDBConstants.CUSTOM_CLIENT_BUILDER_TRANSFORMER, "WithLoadBalancing")
jobConf.set("scylladb.alternator.endpoint", "https://127.0.0.1:8043")

// Where the class WithLoadBalancing would be defined as follows
class WithLoadBalancing extends DynamoDbClientBuilderTransformer with Configurable {
  private var conf: Configuration = null
  def apply(builder: DynamoDbClientBuilder): DynamoDbClientBuilder =
    builder.endpointProvider(
      new AlternatorEndpointProvider(URI.create(conf.get("scylladb.alternator.endpoint")))
    )
  def setConf(conf: Configuration): Unit = {
    this.conf = conf
  }
  def getConf: Configuration = this.conf
}

(note that we would also have to upgrade to v2 of AWS SDK to use the latest version of the connector)

@tarzanek
Copy link
Contributor

tarzanek commented May 6, 2024

we have no power over AWS code, so if they will accept it or not is a dice roll ;-)
so we can worst case go temporarily with this big fork and then cleanup once AWS merges and releases your PR ?

@julienrf
Copy link
Collaborator Author

Superseded by #173

@julienrf julienrf closed this Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support load balancing for alternator access when doing migration for DynamoDB
2 participants