Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multinode cluster logging #4097

Merged
merged 3 commits into from
Oct 16, 2023

Conversation

monusingh-1
Copy link
Contributor

@monusingh-1 monusingh-1 commented Oct 3, 2023

Description

Allow logging of multiple nodes if a component uses multi node configuration to run integtest.

Fixes following when there are multiple clusters being created for integ test run:
FileExistsError: [Errno 17] File exists: '/var/jenkins/workspace/integ-test@4/test-results/5327/integ-test/cross-cluster-replication/with-security/local-cluster-logs/opensearch-service-logs'

Issues Resolved

List any issues this PR will resolve, e.g. Closes [...].

#3448

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: monusingh-1 <msnghgw@amazon.com>
@monusingh-1 monusingh-1 marked this pull request as draft October 3, 2023 11:46
@monusingh-1 monusingh-1 marked this pull request as ready for review October 9, 2023 18:57
@monusingh-1
Copy link
Contributor Author

Will work on fixing integration tests

Signed-off-by: monusingh-1 <msnghgw@amazon.com>
@peterzhuamazon
Copy link
Member

Hi @monusingh-1 could you add some quick log or list of dir to showcase this function work in action?

Thanks.

@monusingh-1
Copy link
Contributor Author

2023-10-10 17:14:14 INFO     ===============================================
2023-10-10 17:14:14 INFO     Running integration tests for cross-cluster-replication with-security
2023-10-10 17:14:14 INFO     ===============================================
2023-10-10 17:14:14 INFO     Executing "ls" in /tmp/tmpo7i2z2u7/cross-cluster-replication
2023-10-10 17:14:14 INFO     Sending SIGKILL to PID 145104
2023-10-10 17:14:14 INFO     Process killed with exit code None
2023-10-10 17:14:14 INFO     Recording local cluster logs for cross-cluster-replication with test configuration as with-security at /home/ec2-user/monu_test/os_build/test-results/b5334a012b9646618ccd4fe9dbd345eb/integ-test/cross-cluster-replication/with-security/local-cluster-logs/id-0
2023-10-10 17:14:14 INFO     Cleanup /tmp/tmpo7i2z2u7/1/local-test-cluster/opensearch-2.10.0 content after the test
2023-10-10 17:14:15 INFO     Sending SIGKILL to PID 145500
2023-10-10 17:14:15 INFO     Process killed with exit code None
2023-10-10 17:14:15 INFO     Recording local cluster logs for cross-cluster-replication with test configuration as with-security at /home/ec2-user/monu_test/os_build/test-results/b5334a012b9646618ccd4fe9dbd345eb/integ-test/cross-cluster-replication/with-security/local-cluster-logs/id-1
2023-10-10 17:14:15 INFO     Cleanup /tmp/tmpo7i2z2u7/2/local-test-cluster/opensearch-2.10.0 content after the test

2023-10-10 17:15:35 INFO     ===============================================
2023-10-10 17:15:35 INFO     Running integration tests for cross-cluster-replication without-security
2023-10-10 17:15:35 INFO     ===============================================
2023-10-10 17:15:35 INFO     Executing "ls" in /tmp/tmpo7i2z2u7/cross-cluster-replication
2023-10-10 17:15:36 INFO     Sending SIGKILL to PID 145904
2023-10-10 17:15:36 INFO     Process killed with exit code None
2023-10-10 17:15:36 INFO     Recording local cluster logs for cross-cluster-replication with test configuration as without-security at /home/ec2-user/monu_test/os_build/test-results/b5334a012b9646618ccd4fe9dbd345eb/integ-test/cross-cluster-replication/without-security/local-cluster-logs/id-2
2023-10-10 17:15:36 INFO     Cleanup /tmp/tmpo7i2z2u7/1/local-test-cluster/opensearch-2.10.0 content after the test
2023-10-10 17:15:36 INFO     Sending SIGKILL to PID 146204
2023-10-10 17:15:36 INFO     Process killed with exit code None
2023-10-10 17:15:36 INFO     Recording local cluster logs for cross-cluster-replication with test configuration as without-security at /home/ec2-user/monu_test/os_build/test-results/b5334a012b9646618ccd4fe9dbd345eb/integ-test/cross-cluster-replication/without-security/local-cluster-logs/id-3
2023-10-10 17:15:36 INFO     Cleanup /tmp/tmpo7i2z2u7/2/local-test-cluster/opensearch-2.10.0 content after the test

Comment on lines +102 to +103
dest_directory = os.path.join(base, "local-cluster-logs/id-" + str(self.number_of_nodes))
self.number_of_nodes += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this design as per your log, is that once with-security test complete, the id will not reset and result in id-2/id-3 in without-security dir.

Would suggest reset the number after each run. So it will results in this:

with-security/id-0 id-1
without-security/id-0 id-1

instead of:

with-security/id-0 id-1
without-security/id-2 id-3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are triggered separately for with and without security so we should be good I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior continues to increase the id throughout different components.


def save_test_result_data(self, test_result_data: TestResultData) -> None:
base = self.parent_class._create_base_folder_structure(test_result_data.component_name, test_result_data.component_test_config)
dest_directory = os.path.join(base, "local-cluster-logs")
dest_directory = os.path.join(base, "local-cluster-logs/id-" + str(self.number_of_nodes))
self.number_of_nodes += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a generic change. Can you make sure that this doesn't impact other plugins? CC: @peterzhuamazon @gaiksaya

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might impact the resports workflows that @zelinh was working on.

Need confirmation on how he retrieve the logs.

Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The test report just dumps grabs and dumps everything in integ-test folder to manifest. It does not parse each folder for logs I believe. @zelinh can confirm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was worried this might impact the logs path for component yml file we generated during test record; but i just checked and confirmed that we are walking through the test folder here. The inside file/directory structure should also be recorded properly.
I also checked that we aren't hardcorded anywhere local-cluster-logs so I think this should be fine.

Copy link
Member

@gaiksaya gaiksaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rishabh6788 @bbarani Need to know if it is okay to merge this PR now or if we should hold on until 2.11 is done.
Thanks!

@bbarani
Copy link
Member

bbarani commented Oct 10, 2023

@rishabh6788 @bbarani Need to know if it is okay to merge this PR now or if we should hold on until 2.11 is done. Thanks!

I think the tests have been disabled for CCR now. Lets merge it and enable tests for CCR once 2.11.0 is released?


def save_test_result_data(self, test_result_data: TestResultData) -> None:
base = self.parent_class._create_base_folder_structure(test_result_data.component_name, test_result_data.component_test_config)
dest_directory = os.path.join(base, "local-cluster-logs")
dest_directory = os.path.join(base, "local-cluster-logs/id-" + str(self.number_of_nodes))
self.number_of_nodes += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was worried this might impact the logs path for component yml file we generated during test record; but i just checked and confirmed that we are walking through the test folder here. The inside file/directory structure should also be recorded properly.
I also checked that we aren't hardcorded anywhere local-cluster-logs so I think this should be fine.

@gaiksaya gaiksaya merged commit dbfc699 into opensearch-project:main Oct 16, 2023
10 checks passed
@monusingh-1 monusingh-1 deleted the test_rec_fix branch October 17, 2023 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants