Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MockResolverIT#replace_cluster_test() #335

Merged
merged 4 commits into from
Sep 6, 2024

Conversation

Bouncheck
Copy link
Collaborator

@Bouncheck Bouncheck commented Sep 2, 2024

Adds MockResolverIT#replace_cluster_test() as a test method that runs the replace cluster scenario and checks if driver managed to reconnect.

In the test we create three node cluster and replace it with completely new one. Hostname is mocked and points to all 3 nodes.

getNodeInetAddress(ccmBridge, 2),
getNodeInetAddress(ccmBridge, 3)
}));
ResolverProvider.setDefaultResolverFactory(mockResolverFactory);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure that there is only one instance of MockResolverFactory that is created and registered.
In the same way we do that for CCM_RULE.
I would recomment to have some method CreateAndRegisterMockResolverFactory that does that.

Also we need to move MockResolverFactory initialization to the class level.

Copy link
Collaborator Author

@Bouncheck Bouncheck Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure If I understand the intent here. Don't we want to have different test methods in this class that test different scenarios? So in that case we should be able to set different resolvers for each test method.

And for the scenario where we kill old cluster and start new with different IPs don't we want to have the ability to change the resolver to the new IPs too? I think this change would prevent that.

Copy link
Collaborator

@dkropachev dkropachev Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure If I understand the intent here. Don't we want to have different test methods in this class that test different scenarios? So in that case we should be able to set different resolvers for each test method.

Yes we want.
Not we don't need different resolvers, test can work with resolver targeting only dns records that are relevant to the test.

And for the scenario where we kill old cluster and start new with different IPs don't we want to have the ability to change the resolver to the new IPs too? I think this change would prevent that.

how so ? don't see anything that would block you from doing it.

ResolverProvider is a global factory, when you call ResolverProvider.setDefaultResolverFactory it changes resolverFactory globally.
Now, if you do that twice, second call will override first factory, but test won't be aware of that and fail, next you will have to debug such test to figure out what happend.
Not only that, if classes were already initialized with old resolver factory, you may get half of classes with old factory, half with new, which is going to create lot's of confusion.
To catch this issue early, we need ResolverFactory to throw an exception when this happnes.

Copy link
Collaborator Author

@Bouncheck Bouncheck Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that now it's as we discussed

@Bouncheck Bouncheck force-pushed the scylla-4.x-215-test-3node branch 2 times, most recently from 4c6e0b1 to 6281ff1 Compare September 3, 2024 17:17
@Bouncheck
Copy link
Collaborator Author

Bouncheck commented Sep 3, 2024

I'll squash review adjustments commit into others before merge.
Actually we probably don't want to merge this until the reconnection to the new cluster works, which means fixing the unresolved socket getting overwritten.

@Bouncheck Bouncheck force-pushed the scylla-4.x-215-test-3node branch 2 times, most recently from 5a302d9 to 18ac7ff Compare September 4, 2024 11:58
@Bouncheck Bouncheck mentioned this pull request Sep 4, 2024
@Bouncheck Bouncheck marked this pull request as ready for review September 4, 2024 15:48
@@ -16,7 +19,8 @@ public class ResolverProvider {
* @param clazz Class that is requesting the {@link Resolver}.
* @return new {@link Resolver}.
*/
public static Resolver getResolver(Class<?> clazz) {
public static synchronized Resolver getResolver(Class<?> clazz) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it work correctly in 100% cases you better use ReadWriteLock, synchronized does not solve any problem here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it solves the problem I described. With synchronized you cannot have getResolver race with setDefaultResolverFactory and it's simpler

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually with synchronized i think we can have normal booleans instead

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it solves the problem I described. With synchronized you cannot have getResolver race with setDefaultResolverFactory and it's simpler

I am pretty sure it is not the case, can you make a test to test it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, let's convert all of them to regular attributes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Bouncheck Bouncheck force-pushed the scylla-4.x-215-test-3node branch 2 times, most recently from 3b40118 to 1d8f6f2 Compare September 5, 2024 14:23
@dkropachev
Copy link
Collaborator

@Bouncheck , can you please address test failures on CICD

@Bouncheck
Copy link
Collaborator Author

Created #340 . It should solve the 6.1.1 failures.

should_connect_with_mocked_hostname() will use CcmBridge.Builder builder
instead.
@Bouncheck Bouncheck force-pushed the scylla-4.x-215-test-3node branch 3 times, most recently from 55e1557 to 77bca82 Compare September 6, 2024 14:05
Adds another method that runs scenario in which we replace three node
cluster with the completely new three node cluster. This method runs 20 times
called by `run_replace_test_20_times()` test method.
Adds "--wait-other-notice", "--wait-for-binary-proto" if missing.
@dkropachev
Copy link
Collaborator

@Bouncheck , is it ready ?

@Bouncheck
Copy link
Collaborator Author

Yes, but it was a little flaky with cassandra. Should i reduce number of nodes to 2?

@dkropachev
Copy link
Collaborator

Yes, but it was a little flaky with cassandra. Should i reduce number of nodes to 2?

I see it is fine, let's address it later.

@dkropachev dkropachev merged commit 275bade into scylladb:scylla-4.x Sep 6, 2024
11 of 12 checks passed
@Bouncheck Bouncheck self-assigned this Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants