Skip to content

Conversation

@ivanyu
Copy link
Contributor

@ivanyu ivanyu commented Oct 8, 2020

This commit adds a new replication policy for MirrorMaker 2, LegacyReplicationPolicy. This policy imitates MirrorMaker 1 behavior of not renaming replicated topics. The exception is made for heartbeats topic, that is replicated according to DefaultReplicationPolicy.

Avoiding renaming topics brings a number of limitations, among which the most important one is the impossibility of detecting replication cycles. This makes cross-replication using LegacyReplicationPolicy effectively impossible. See LegacyReplicationPolicy Javadoc for details.

A new method canTrackSource is added to ReplicationPolicy. Its result indicates if the replication policy can track back to the source topic of a topic. It is needed to allow detecting target topics work when LegacyReplicationPolicy is used.

On the testing side, the tests have the same strategy as for DefaultReplicationPolicy with nicessary adjustments (e.g. no active/active replication is tested).

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@ivanyu ivanyu force-pushed the legacy-replication-policy branch 6 times, most recently from 909c5c2 to 409a20b Compare October 8, 2020 17:59
@ivanyu ivanyu marked this pull request as ready for review October 8, 2020 18:00
This commit adds a new replication policy for MirrorMaker 2, `LegacyReplicationPolicy`. This policy imitates MirrorMaker 1 behavior of not renaming replicated topics. The exception is made for `heartbeats` topic, that is replicated according to `DefaultReplicationPolicy`.

Avoiding renaming topics brings a number of limitations, among which the most important one is the impossibility of detecting replication cycles. This makes cross-replication using `LegacyReplicationPolicy` effectively impossible. See `LegacyReplicationPolicy` Javadoc for details.

A new method `canTrackSource` is added to `ReplicationPolicy`. Its result indicates if the replication policy can track back to the source topic of a topic. It is needed to allow detecting target topics work when `LegacyReplicationPolicy` is used.

On the testing side, the tests have the same strategy as for `DefaultReplicationPolicy` with nicessary adjustments (e.g. no active/active replication is tested).
@ivanyu ivanyu force-pushed the legacy-replication-policy branch from 409a20b to 160719a Compare October 9, 2020 03:47
|| topic.startsWith(".");
}

/** Checks if the policy can track back to the source of the topic. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "track back to the source of the topic". The word "track" might mean a few things here, and it's not obvious what you mean. Can you clarify?

}

/** Checks if the policy can track back to the source of the topic. */
default boolean canTrackSource(String topic) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a public API change like this is required, you will need to propose a small KIP. I'm unclear why it's required tho, and ideally we would not alter the existing API if possible.

If a new method is required, I think "track" is too ambiguous and should not be used here.

if (isOriginalTopicHeartbeats(topic)) {
return heartbeatTopicReplicationPolicy.topicSource(topic);
} else {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen alternative solutions floating around that use a configurable source here. Basically, the configuration passed to configure() is consulted to find the "source cluster", rather than looking at the topic name. That approach lets you return an actual source here, which obviates the new canTrackSource() method etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've explored this possibility, too. The main problem with it is that the replication policy should answer differently for source and target clusters. It's essential for methods like MirrorSourceConnector.isCycle and MirrorClient.remoteTopics. For a source, topicSource should return null; for a target, a predefined value.

It leaves two possibility. In one, we set up two different replication policy instances with different configurations, e.g.:

replication.policy.source.class=org.apache.kafka.connect.mirror.LegacyReplicationPolicy
replication.policy.source.source=
replication.policy.target.class=org.apache.kafka.connect.mirror.LegacyReplicationPolicy
replication.policy.target.source=primary-cluster

Of course, we can make that the current configurations work as before.

Another possibility is to modify the ReplicationPolicy interface to allow it to pass additional information out (like canTrackSource or similar) or in (like topicSource(String topic, boolean isSourceCluster)).

What do you think would be the best approach?

@mdedetrich
Copy link
Contributor

@ivanyu Can you close this since a new PR has been created that is up to date with the latest trunk?

@ivanyu
Copy link
Contributor Author

ivanyu commented May 7, 2021

Yep, I'm closing this one

@ivanyu ivanyu closed this May 7, 2021
@ivanyu ivanyu deleted the legacy-replication-policy branch May 22, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants