Skip to content
This repository has been archived by the owner on Feb 8, 2024. It is now read-only.

CORTX-33787: [v0.9.0][2.0.0-880] Kafka errors UNKNOWN_TOPIC_OR_PART d… #721

Merged
merged 1 commit into from
Aug 24, 2022

Conversation

Madhura-08
Copy link
Contributor

@Madhura-08 Madhura-08 commented Aug 17, 2022

…uring build deployment

Problem:
HA mini provisioning is failing because of Kafka connection. It is happening because Kafka and HA pod is getting deployed simultaneously. For that, HA needs to try to reconnect/retry. But that is not happening. Hence Kafka topics consul keys and other keys are not getting created. So, HA POD is running but not functional.

Solution:
To reconnect/retry, the init container needs to be restarted because mini provisioning gets executed as part of the init container. For the init container to restart, a proper failure code must be returned to the caller. For here, the exception needs to be re-raised to the caller and there, already the error code returning is handled.

Signed-off-by: Madhura Mande madhura.mande@seagate.com

Problem Statement

https://jts.seagate.com/browse/CORTX-33787

Design

HA and third party kafka pods now gets deployed simultaneously. HA connects to kafka at its init stage(mini provisioning) for creating topics. As HA tries to connect to kafka, but that time, kafka was running but it is not ready to serve. Hence HA fails at mini provisioning stage and fails to create consul keys. For this init container needs to be restarted so that kafka connection retries will be executed. Ideally init container is meant to be executed only once. It will be restarted only if some failure occurs. And failure can be propagated in the form of return code.
From HA side, the exception was not getting re-raised and that is why return code was always sent as 0 which was not causing the restart of init container. So, re-raising the exception and proper return code handling is needed.

Coding

  • Coding conventions are followed and code is consistent

Testing

Review Checklist

  • PR is self reviewed
  • JIRA number/GitHub Issue added to PR
  • Jira and state/status is updated and JIRA is updated with PR link
  • Check if the description is clear and explained
  • Is there a change in filename/package/module or signature? [Y/N]:
  • If yes for above point, is a notification sent to all other cortx components? [Y/N]
  • Side effects on other features (deployment/upgrade)? [Y/N]
  • Dependencies on other component(s)? [Y/N]
  • If yes for above point, post link to the corresponding PR.
    

Review Checklist

  • Is perfline test run and the report with and without the changes updated in the PR? [Y/N]:

Documentation

Checklist for Author

  • Changes done to WIKI / Confluence page / Quick Start Guide

…uring build deployment

- re-raise the exception in order to properly propagate the script return code
  to caller

Signed-off-by: Madhura Mande <madhura.mande@seagate.com>
@mssawant
Copy link

@Madhura-08, following rules are exercised in Hare commit messages,

CORTX-33787: [v0.9.0][2.0.0-880] Kafka errors UNKNOWN_TOPIC_OR_PART during build deployment

- re-raise the exception in order to properly propagate the script return code
  to caller

Signed-off-by: Madhura Mande <madhura.mande@seagate.com>
  1. keep the summary line short (80 cols)
  2. Its good to describe the problem a bit. From the commit message I am not able to understand what the problem was and why are we implementing the fix.
  3. Its good to separate classify the Solution with a Solution tag.

Please consider a following commit message format,

CORTX-33787: ha deployment fails due to kafka errors

<Describe the problem, e.g. unknown topic exception not handled>

Solution:
<Describe the solution, mainly how solution fixes the problem>

Signed-off-by: Madhura Mande <madhura.mande@seagate.com>

Copy link
Contributor Author

@Madhura-08 Madhura-08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mssawant Sorry, I missed following the commit message format. I fixed that now. Please take a look. Thanks!

@mssawant mssawant merged commit ede7903 into Seagate:main Aug 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants