[docs] Move troubleshooting section to top level of ToC (#8145)

This moves the Troubleshoot ECK section up to the top level of the ECK Guide to make this section more prominent, and because the troubleshooting topics may apply more broadly than only "operating ECK".
elastic · Oct 28, 2024 · a50d41d · a50d41d
1 parent 41cae15
commit a50d41d
Show file tree

Hide file tree

Showing 9 changed files with 29 additions and 15 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/docs/help.asciidoc b/docs/help.asciidoc
@@ -1,4 +1,4 @@
-If you are an existing Elastic customer with an active support contract, you can create a case in the link:https://support.elastic.co/[Elastic Support Portal]. Kindly attach an <<{p}-take-eck-dump,ECK diagnostic>> when opening your case.
+If you are an existing Elastic customer with an active support contract, you can create a case in the link:https://support.elastic.co/[Elastic Support Portal]. Kindly attach an <<{p}-run-eck-diagnostics,ECK diagnostic>> when opening your case.
 
 Alternatively, or if you do not have a support contract, and if you are unable to find a solution to your problem with the information provided in these documents, ask for help:
 

diff --git a/docs/index.asciidoc b/docs/index.asciidoc
@@ -17,7 +17,7 @@ include::quickstart.asciidoc[]
 include::operating-eck/operating-eck.asciidoc[]
 include::orchestrating-elastic-stack-applications/orchestrating-elastic-stack-applications.asciidoc[]
 include::advanced-topics/advanced-topics.asciidoc[]
+include::troubleshooting.asciidoc[l]
 include::reference/reference.asciidoc[]
-
 include::release-notes/highlights.asciidoc[]
 include::release-notes.asciidoc[]
diff --git a/docs/operating-eck/air-gapped.asciidoc b/docs/operating-eck/air-gapped.asciidoc
@@ -73,6 +73,6 @@ For example, if your private registry is `my.registry` and all Elastic images ar
 [id="{p}-eck-diag-air-gapped"]
 == ECK Diagnostics in air-gapped environments
 
-The <<{p}-take-eck-dump,eck-diagnostics tool>> optionally runs diagnostics for Elastic Stack applications in a separate container that is deployed into the Kubernetes cluster.
+The <<{p}-run-eck-diagnostics,eck-diagnostics tool>> optionally runs diagnostics for Elastic Stack applications in a separate container that is deployed into the Kubernetes cluster.
 
 In air-gapped environments with no access to the `docker.elastic.co` registry, you should copy the latest support-diagnostics container image to your internal image registry and then run the tool with the additional flag `--diagnostic-image <custom-support-diagnostics-image-name>`. To find out which support diagnostics container image matches your version of eck-diagnostics run the tool once without arguments and it will print the default image in use.
diff --git a/docs/operating-eck/operating-eck.asciidoc b/docs/operating-eck/operating-eck.asciidoc
@@ -15,7 +15,6 @@ endif::[]
 - <<{p}-configure-operator-metrics>>
 - <<{p}-restrict-cross-namespace-associations>>
 - <<{p}-licensing>>
-- <<{p}-troubleshooting>>
 - <<{p}-installing-eck>>
 - <<{p}-upgrading-eck>>
 - <<{p}-uninstalling-eck>>
@@ -28,7 +27,6 @@ include::webhook.asciidoc[leveloffset=+1]
 include::configure-operator-metrics.asciidoc[leveloffset=+1]
 include::restrict-cross-namespace-associations.asciidoc[leveloffset=+1]
 include::licensing.asciidoc[leveloffset=+1]
-include::troubleshooting.asciidoc[leveloffset=+1]
 include::installing-eck.asciidoc[leveloffset=+1]
 include::upgrading-eck.asciidoc[leveloffset=+1]
 include::uninstalling-eck.asciidoc[leveloffset=+1]

diff --git a/docs/operating-eck/troubleshooting.asciidoc → docs/troubleshooting.asciidoc b/docs/operating-eck/troubleshooting.asciidoc → docs/troubleshooting.asciidoc
@@ -5,13 +5,14 @@ link:https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-{page_id}.html[View
 ****
 endif::[]
 [id="{p}-{page_id}"]
-= Troubleshoot ECK
+= Troubleshooting ECK
 
 - <<{p}-common-problems>>
 - <<{p}-troubleshooting-methods>>
+- <<{p}-run-eck-diagnostics>>
 
-include::../help.asciidoc[]
+include::./help.asciidoc[]
 
 include::troubleshooting/common-problems.asciidoc[leveloffset=+1]
 include::troubleshooting/troubleshooting-methods.asciidoc[leveloffset=+1]
-include::troubleshooting/take-eck-dump.asciidoc[leveloffset=+1]
+include::troubleshooting/run-eck-diagnostics.asciidoc[leveloffset=+1]
diff --git a/.../troubleshooting/common-problems.asciidoc → .../troubleshooting/common-problems.asciidoc b/.../troubleshooting/common-problems.asciidoc → .../troubleshooting/common-problems.asciidoc
@@ -7,6 +7,7 @@ endif::[]
 [id="{p}-{page_id}"]
 = Common problems
 
+[float]
 [id="{p}-{page_id}-operator-oom"]
 == Operator crashes on startup with `OOMKilled`
 
@@ -59,6 +60,7 @@ kubectl patch sts elastic-operator -n elastic-system -p '{"spec":{"template":{"s
 
 NOTE: Set limits (`spec.containers[].resources.limits`) that match requests (`spec.containers[].resources.requests`) to prevent operator's Pod from being terminated during link:https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/[node-pressure eviction].
 
+[float]
 [id="{p}-{page_id}-webhook-timeout"]
 == Timeout when submitting a resource manifest
 
@@ -71,6 +73,7 @@ Error from server (Timeout): error when creating "elasticsearch.yaml": Timeout:
 
 This error is usually an indication of a problem communicating with the validating webhook. If you are running ECK on a private Google Kubernetes Engine (GKE) cluster, you may need to link:https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules[add a firewall rule] allowing port 9443 from the API server. Another possible cause for failure is if a strict network policy is in effect. Refer to the <<{p}-webhook-troubleshooting-timeouts,webhook troubleshooting>> documentation for more details and workarounds.
 
+[float]
 [id="{p}-{page_id}-owner-refs"]
 == Copying secrets with Owner References
 
@@ -128,6 +131,7 @@ type: Opaque
 
 Failure to do so can cause data loss.
 
+[float]
 [id="{p}-{page_id}-scale-down"]
 == Scale down of Elasticsearch master-eligible Pods seems stuck
 
@@ -160,6 +164,7 @@ Then, scale down the StatefulSet to the right size `m`, removing the pending Pod
 
 CAUTION: Do not use this method to scale down Pods that have already joined the Elasticsearch cluster, as additional data loss protection that ECK applies is sidestepped.
 
+[float]
 [id="{p}-{page_id}-pod-updates"]
 == Pods are not replaced after a configuration update
 
@@ -235,6 +240,7 @@ In this case, you have to add more K8s nodes, or free up resources.
 
 For more information, check <<{p}-troubleshooting-methods>>.
 
+[float]
 [id="{p}-{page_id}-olm-upgrade"]
 == ECK operator upgrade stays pending when using OLM
 
@@ -254,13 +260,15 @@ If you are using one of the affected versions of OLM and upgrading OLM to a newe
 can still be upgraded by uninstalling and reinstalling it. This can be done by removing the `Subscription` and both `ClusterServiceVersion` resources and adding them again.
 On OpenShift the same workaround can be performed in the UI by clicking on "Uninstall Operator" and then reinstalling it through OperatorHub.
 
+[float]
 [id="{p}-{page_id}-version-downgrade"]
 == If you upgraded Elasticsearch to the wrong version
 If you accidentally upgrade one of your Elasticsearch clusters to a version that does not exist or a version to which a direct upgrade is not possible from your currently deployed version, a validation will prevent you from going back to the previous version.
 The reason for this validation is that ECK will not allow downgrades as this is not supported by Elasticsearch and once the data directory of Elasticsearch has been upgraded there is no way back to the old version without a link:https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-upgrade.html[snapshot restore].
 
 These two upgrading scenarios, however, are exceptions because Elasticsearch never started up successfully. If you annotate the Elasticsearch resource with `eck.k8s.elastic.co/disable-downgrade-validation=true` ECK allows you to go back to the old version at your own risk. If you also attempted an upgrade of other related Elastic Stack applications at the same time you can use the same annotation to go back. Remove the annotation afterwards to prevent accidental downgrades and reduced availability.
 
+[float]
 [id="{p}-{page_id}-815-reconfigure-role-mappings"]
 == Reconfigure stack config policy based role mappings after an upgrade to 8.15.3 from 8.14.x or 8.15.x
 

diff --git a/...ck/troubleshooting/take-eck-dump.asciidoc → ...ubleshooting/run-eck-diagnostics.asciidoc b/...ck/troubleshooting/take-eck-dump.asciidoc → ...ubleshooting/run-eck-diagnostics.asciidoc
@@ -1,4 +1,4 @@
-:page_id: take-eck-dump
+:page_id: run-eck-diagnostics
 ifdef::env-github[]
 ****
 link:https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-{page_id}.html[View this document on the Elastic website]

diff --git a/...shooting/troubleshooting-methods.asciidoc → ...shooting/troubleshooting-methods.asciidoc b/...shooting/troubleshooting-methods.asciidoc → ...shooting/troubleshooting-methods.asciidoc
@@ -17,14 +17,15 @@ Most common issues can be identified and resolved by following these instruction
 - <<{p}-exclude-resource,Exclude a resource from reconciliation>>
 - <<{p}-get-k8s-events,Get Kubernetes events>>
 - <<{p}-exec-into-containers,Exec into containers>>
+- <<{p}-resize-pv>>
 - <<{p}-suspend-elasticsearch>>
 - <<{p}-capture-jvm-heap-dumps>>
 
 If you are still unable to find a solution to your problem, ask for help:
 
-include::../../help.asciidoc[]
-
+include::./../help.asciidoc[]
 
+[float]
 [id="{p}-get-resources"]
 == View the list of resources
 
@@ -55,6 +56,7 @@ elasticsearch-sample-es-http   ClusterIP   10.19.248.93    <none>        9200/TC
 kibana-sample-kb-http          ClusterIP   10.19.246.116   <none>        5601/TCP   3d
 ----
 
+[float]
 [id="{p}-describe-failing-resources"]
 == Describe failing resources
 
@@ -101,6 +103,7 @@ Events:
 
 If you get an error with unbound persistent volume claims (PVCs), it means there is not currently a persistent volume that can satisfy the claim. If you are using automatically provisioned storage (for example Amazon EBS provisioner), sometimes the storage provider can take a few minutes to provision a volume, so this may resolve itself in a few minutes. You can also check the status by running `kubectl describe persistentvolumeclaims` to monitor events of the PVCs.
 
+[float]
 [id="{p}-eck-debug-logs"]
 == Enable ECK debug logs
 
@@ -124,6 +127,7 @@ change the `args` array as follows:
 
 Once your change is saved, the operator is automatically restarted by the StatefulSet controller to apply the new settings.
 
+[float]
 [id="{p}-view-logs"]
 == View logs
 
@@ -172,7 +176,7 @@ Logs with `ERROR` level indicate something is not going as expected.
 Due to link:https://github.com/eBay/Kubernetes/blob/master/docs/devel/api-conventions.md#concurrency-control-and-consistency[optimistic locking],
 you can get errors reporting a conflict while updating a resource. You can ignore them, as the update goes through at the next reconciliation attempt, which will happen almost immediately.
 
-
+[float]
 [id="{p}-resource-level-config"]
 == Configure Elasticsearch timeouts
 
@@ -188,7 +192,7 @@ To set the Elasticsearch client timeout to 60 seconds for a cluster named `quick
 kubectl annotate elasticsearch quickstart eck.k8s.elastic.co/es-client-timeout=60s
 ----
 
-
+[float]
 [id="{p}-exclude-resource"]
 == Exclude resources from reconciliation
 
@@ -212,6 +216,7 @@ Or in one line:
 kubectl annotate elasticsearch quickstart --overwrite eck.k8s.elastic.co/managed=false
 ----
 
+[float]
 [id="{p}-get-k8s-events"]
 == Get Kubernetes events
 
@@ -246,7 +251,7 @@ LAST SEEN   FIRST SEEN   COUNT     NAME                                    KIND
 You can set filters for Kibana and APM Server too.
 Note that the default TTL for events in Kubernetes is 1h, so unless your cluster settings have been modified you will not get events older than 1h.
 
-
+[float]
 [id="{p}-resize-pv"]
 == Resizing persistent volumes
 
@@ -307,6 +312,7 @@ spec:
 
 and ECK will automatically create a new StatefulSet and begin migrating data into it.
 
+[float]
 [id="{p}-exec-into-containers"]
 == Exec into containers
 
@@ -319,6 +325,7 @@ kubectl exec -ti elasticsearch-sample-es-p45nrjch29 bash
 
 This can also be done for Kibana and APM Server.
 
+[float]
 [id="{p}-suspend-elasticsearch"]
 == Suspend Elasticsearch
 
@@ -348,7 +355,7 @@ Once you are done with troubleshooting the node, you can resume normal operation
 kubectl annotate es quickstart eck.k8s.elastic.co/suspend-
 ----
 
-
+[float]
 [id="{p}-capture-jvm-heap-dumps"]
 == Capture JVM heap dumps