Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: CCR plugin remoteIntegTest are failing in deb and rpm distribution due to multiple clusters are not forming #4610

Open
nisgoel-amazon opened this issue Apr 8, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@nisgoel-amazon
Copy link

Describe the bug

In Cross Cluster Replication plugin remoteIntegTest are failing from 2.12 release onwards. We are getting java.net.ConnectException: Connection refused error while running these test at time of release activity. These errors are coming because while running these we create multi clusters to run integration tests. This pre setup of creating cluster is just creating one cluster at a time. We have seen in logs that when 2nd cluster is coming up openseach-build package is removing the previously created cluster.
https://build.ci.opensearch.org/blue/rest/organizations/jenkins/pipelines/integ-test/runs/7981/nodes/122/steps/765/log/?start=0

In above log we can see after 1st cluster return 200 and before creating 2nd cluster pre remove script in debian distribution remove the 1st cluster.
Below are the lines printed in the above log file.

Removing opensearch (2.13.0) ...
Running OpenSearch Pre-Removal Script
Stop existing opensearch.service

To reproduce

We can replicate this by running this command

./test.sh integ-test manifests/2.13.0/opensearch-2.13.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.13.0/latest/linux/x64/deb --component cross-cluster-replication

### Expected behavior

Above command should create the multiple clusters first before running the cross cluster integration tests.

### Screenshots

If applicable, add screenshots to help explain your problem.

### Host / Environment

_No response_

### Additional context

_No response_

### Relevant log output

_No response_
@nisgoel-amazon nisgoel-amazon added bug Something isn't working untriaged Issues that have not yet been triaged labels Apr 8, 2024
@nisgoel-amazon nisgoel-amazon changed the title [Bug]: CCR plugin remoteIntegTest are failing in deb and rpm distribution due to multi cluster are not forming [Bug]: CCR plugin remoteIntegTest are failing in deb and rpm distribution due to multiple clusters are not forming Apr 8, 2024
@bbarani
Copy link
Member

bbarani commented Apr 8, 2024

Thanks for opening an issue. We will look in to it when we have bandwidth. CCR is a unique use case hence I would appreciate if your team can contribute the fix as before. We prioritize closing gaps for generic use cases but need your teams support to close specialized use cases. Let us know if you need any help.

@peterzhuamazon
Copy link
Member

Will go ahead and ignore ccr test for deb and rpm after discussion with Nandan Kumar, he will PR.

@peterzhuamazon
Copy link
Member

Hi @nisgoel-amazon is there any progress on making CCR testing on remote cluster for deb and rpm?

Thanks.

@nisgoel-amazon
Copy link
Author

@peterzhuamazon This needs an infra side change, we need help from infra team to understand why multi clusters are not coming up on same node in deb and rpm.
We had analysed why ccr repo tests are failing on deb and rpm. Can you help us in scoping down the effort for this issue.

Then i think @ankitkala can align someone to pickup the change.

@nisgoel-amazon
Copy link
Author

@peterzhuamazon can you confirm on one thing, as of today can we create multi node cluster on same node in deb and rpm? Means ES process running on different ports to form cluster on single node in deb and rpm?

@peterzhuamazon
Copy link
Member

Not unless you significantly / heavily modify the existing deb/rpm package, you cant run multiple instance of that on a single host. You have to run them on multiple hosts, which probably require a cdk to set things up just for CCR on deb/rpm.

If you try to modify the pkg it defeat the purpose of integTest because you are testing something that will not be used by the customer in the same way.

@nisgoel-amazon
Copy link
Author

No, its not like that we will defeat the purpose of integ test as we need 2 clusters to run CCR plugin. It doesn't matter whether we are running 2 clusters on different host or we configure 2 clusters on different ports on same host.

We are doing same thing in win and tar distributions too and that is serving our purpose.

Can you suggest how can we setup CDK to run CCR on deb/rpm.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Sep 17, 2024

You misunderstand, our current integTest framework is specifically running every test on 1 host, which you cannot do for CCR on deb and rpm.

If you want it to work for CCR, you have to:

  1. Either heavily modify the deb/rpm pkg so they can run multiple instances on the same host
  2. Or implement multi-host in our integTest architecture, using CDK is just an example since our opensearch-build code is designed to run on single host.
  3. Or create a specific jenkins workflow and modify opensearch-build in a way, so that you can deploy to multiple jenkins agents/containers, while retrieving IPs of each agent/container, and test remotely.

The reason I suggest cdk is because of its ease of retrieving separate host IPs so you can do the test remotely. I am still not sure what would be the change to make this happen, as CCR team has more expertise in how CCR test works.

Happy to have more discussion on this via call.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Now(This Quarter)
Development

No branches or pull requests

3 participants