Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splunk_cluster_master : Apply cluster bundle tasks fails when cluster is in process of applying another bundle. Causes kubernetes pod to go in a crashback loop. #767

Open
cderocco5 opened this issue Dec 6, 2023 · 4 comments

Comments

@cderocco5
Copy link

cderocco5 commented Dec 6, 2023

When restarting a splunk cluster manager/master kubernetes pod. The pod restart will fail if there is already a bundle in the process of applying to indexers.

Error message:

TASK [splunk_cluster_master : Apply cluster bundle] ****************************
fatal: [localhost]: FAILED! => {
    "changed": false,
    "cmd": [
        "/opt/splunk/bin/splunk",
        "apply",
        "cluster-bundle",
        "-auth",
        "admin:CQL5A/oeVbda/I711kFP2PhFXvIv3k2w",
        "--skip-validation",
        "--answer-yes"
    ],
    "delta": "0:00:01.015543",
    "end": "2023-12-06 15:28:24.842204",
    "failed_when_result": true,
    "rc": 0,
    "start": "2023-12-06 15:28:23.826661"
}

STDOUT:


Encountered some errors while applying the bundle.


STDERR:

WARNING: Server Certificate Hostname Validation is disabled. Please see server.conf/[sslConfig]/cliVerifyServerName for details.
Cannot apply (or) validate configuration settings. Rolling restart of the peers is in progress.

Steps to recreate:

  1. apply cluster bundle from the cluster manager/master. splunk apply cluster-bundle
  2. delete the kubernetes splunk cluster manager/master pod
  3. pod will try to restart and fail with the above error message.

Expected Behavior:

Ansible task is able to detect that a bundle is already being applied and does not run the "Apply Cluster Bundle" task or the "Apply cluster bundle" should ignore the error and not cause the pod to crash on a restart. Or have a default.yml key that disables the "Apply Cluster Bundle" task. This will prevent any unexpected indexer rolling restarts from happening in a pod or node dies.

@martinr103
Copy link

I believe that this is very much related to another issue that was already closed 3 years ago, without actually being resolved.

You might want to review my comments to the closed issue, that I wrote 2 weeks ago:
#35 (comment)

@adityapinglesf
Copy link
Contributor

thanks @martinr103 and @cderocco5 for reporting the concern again. I am also in touch with the support team @cderocco5 has likely interacted with.
Going over the issue and possible resolution. Will get back to you soon.

@cderocco5
Copy link
Author

Thanks @adityapinglesf . When you have 455 indexers and 6 PBs of data. Rolling restarts take over 30 hours. We need a way for the cluster manager docker image to stop doing a one indexer at a time rolling restart when the cluster manager pod restarts.

@cderocco5
Copy link
Author

Any progress on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants