This repository has been archived by the owner on Feb 8, 2024. It is now read-only.
CORTX-33787: [v0.9.0][2.0.0-880] Kafka errors UNKNOWN_TOPIC_OR_PART d… #721
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…uring build deployment
Problem:
HA mini provisioning is failing because of Kafka connection. It is happening because Kafka and HA pod is getting deployed simultaneously. For that, HA needs to try to reconnect/retry. But that is not happening. Hence Kafka topics consul keys and other keys are not getting created. So, HA POD is running but not functional.
Solution:
To reconnect/retry, the init container needs to be restarted because mini provisioning gets executed as part of the init container. For the init container to restart, a proper failure code must be returned to the caller. For here, the exception needs to be re-raised to the caller and there, already the error code returning is handled.
Signed-off-by: Madhura Mande madhura.mande@seagate.com
Problem Statement
https://jts.seagate.com/browse/CORTX-33787
Design
HA and third party kafka pods now gets deployed simultaneously. HA connects to kafka at its init stage(mini provisioning) for creating topics. As HA tries to connect to kafka, but that time, kafka was running but it is not ready to serve. Hence HA fails at mini provisioning stage and fails to create consul keys. For this init container needs to be restarted so that kafka connection retries will be executed. Ideally init container is meant to be executed only once. It will be restarted only if some failure occurs. And failure can be propagated in the form of return code.
From HA side, the exception was not getting re-raised and that is why return code was always sent as 0 which was not causing the restart of init container. So, re-raising the exception and proper return code handling is needed.
Coding
Testing
https://jts.seagate.com/secure/attachment/532015/CORTX-33787_test_results.txt
Review Checklist
Review Checklist
Documentation
Checklist for Author