Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Move Troubleshooting section to top level of ToC #8145

Merged
merged 4 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ include::quickstart.asciidoc[]
include::operating-eck/operating-eck.asciidoc[]
include::orchestrating-elastic-stack-applications/orchestrating-elastic-stack-applications.asciidoc[]
include::advanced-topics/advanced-topics.asciidoc[]
include::troubleshooting.asciidoc[l]
include::reference/reference.asciidoc[]

include::release-notes/highlights.asciidoc[]
include::release-notes.asciidoc[]
2 changes: 0 additions & 2 deletions docs/operating-eck/operating-eck.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ endif::[]
- <<{p}-configure-operator-metrics>>
- <<{p}-restrict-cross-namespace-associations>>
- <<{p}-licensing>>
- <<{p}-troubleshooting>>
- <<{p}-installing-eck>>
- <<{p}-upgrading-eck>>
- <<{p}-uninstalling-eck>>
Expand All @@ -28,7 +27,6 @@ include::webhook.asciidoc[leveloffset=+1]
include::configure-operator-metrics.asciidoc[leveloffset=+1]
include::restrict-cross-namespace-associations.asciidoc[leveloffset=+1]
include::licensing.asciidoc[leveloffset=+1]
include::troubleshooting.asciidoc[leveloffset=+1]
include::installing-eck.asciidoc[leveloffset=+1]
include::upgrading-eck.asciidoc[leveloffset=+1]
include::uninstalling-eck.asciidoc[leveloffset=+1]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ link:https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-{page_id}.html[View
****
endif::[]
[id="{p}-{page_id}"]
= Troubleshoot ECK
= Troubleshooting ECK

- <<{p}-common-problems>>
- <<{p}-troubleshooting-methods>>
thbkrkr marked this conversation as resolved.
Show resolved Hide resolved

include::../help.asciidoc[]
include::./help.asciidoc[]

include::troubleshooting/common-problems.asciidoc[leveloffset=+1]
include::troubleshooting/troubleshooting-methods.asciidoc[leveloffset=+1]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ endif::[]
[id="{p}-{page_id}"]
= Common problems

[float]
[id="{p}-{page_id}-operator-oom"]
== Operator crashes on startup with `OOMKilled`

Expand Down Expand Up @@ -59,6 +60,7 @@ kubectl patch sts elastic-operator -n elastic-system -p '{"spec":{"template":{"s

NOTE: Set limits (`spec.containers[].resources.limits`) that match requests (`spec.containers[].resources.requests`) to prevent operator's Pod from being terminated during link:https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/[node-pressure eviction].

[float]
[id="{p}-{page_id}-webhook-timeout"]
== Timeout when submitting a resource manifest

Expand All @@ -71,6 +73,7 @@ Error from server (Timeout): error when creating "elasticsearch.yaml": Timeout:

This error is usually an indication of a problem communicating with the validating webhook. If you are running ECK on a private Google Kubernetes Engine (GKE) cluster, you may need to link:https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules[add a firewall rule] allowing port 9443 from the API server. Another possible cause for failure is if a strict network policy is in effect. Refer to the <<{p}-webhook-troubleshooting-timeouts,webhook troubleshooting>> documentation for more details and workarounds.

[float]
[id="{p}-{page_id}-owner-refs"]
== Copying secrets with Owner References

Expand Down Expand Up @@ -128,6 +131,7 @@ type: Opaque

Failure to do so can cause data loss.

[float]
[id="{p}-{page_id}-scale-down"]
== Scale down of Elasticsearch master-eligible Pods seems stuck

Expand Down Expand Up @@ -160,6 +164,7 @@ Then, scale down the StatefulSet to the right size `m`, removing the pending Pod

CAUTION: Do not use this method to scale down Pods that have already joined the Elasticsearch cluster, as additional data loss protection that ECK applies is sidestepped.

[float]
[id="{p}-{page_id}-pod-updates"]
== Pods are not replaced after a configuration update

Expand Down Expand Up @@ -235,6 +240,7 @@ In this case, you have to add more K8s nodes, or free up resources.

For more information, check <<{p}-troubleshooting-methods>>.

[float]
[id="{p}-{page_id}-olm-upgrade"]
== ECK operator upgrade stays pending when using OLM

Expand All @@ -254,13 +260,15 @@ If you are using one of the affected versions of OLM and upgrading OLM to a newe
can still be upgraded by uninstalling and reinstalling it. This can be done by removing the `Subscription` and both `ClusterServiceVersion` resources and adding them again.
On OpenShift the same workaround can be performed in the UI by clicking on "Uninstall Operator" and then reinstalling it through OperatorHub.

[float]
[id="{p}-{page_id}-version-downgrade"]
== If you upgraded Elasticsearch to the wrong version
If you accidentally upgrade one of your Elasticsearch clusters to a version that does not exist or a version to which a direct upgrade is not possible from your currently deployed version, a validation will prevent you from going back to the previous version.
The reason for this validation is that ECK will not allow downgrades as this is not supported by Elasticsearch and once the data directory of Elasticsearch has been upgraded there is no way back to the old version without a link:https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-upgrade.html[snapshot restore].

These two upgrading scenarios, however, are exceptions because Elasticsearch never started up successfully. If you annotate the Elasticsearch resource with `eck.k8s.elastic.co/disable-downgrade-validation=true` ECK allows you to go back to the old version at your own risk. If you also attempted an upgrade of other related Elastic Stack applications at the same time you can use the same annotation to go back. Remove the annotation afterwards to prevent accidental downgrades and reduced availability.

[float]
[id="{p}-{page_id}-815-reconfigure-role-mappings"]
== Reconfigure stack config policy based role mappings after an upgrade to 8.15.3 from 8.14.x or 8.15.x

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,15 @@ Most common issues can be identified and resolved by following these instruction
- <<{p}-exclude-resource,Exclude a resource from reconciliation>>
- <<{p}-get-k8s-events,Get Kubernetes events>>
- <<{p}-exec-into-containers,Exec into containers>>
- <<{p}-resize-pv>>
- <<{p}-suspend-elasticsearch>>
- <<{p}-capture-jvm-heap-dumps>>
If you are still unable to find a solution to your problem, ask for help:

include::../../help.asciidoc[]

include::./../help.asciidoc[]

[float]
[id="{p}-get-resources"]
== View the list of resources

Expand Down Expand Up @@ -55,6 +56,7 @@ elasticsearch-sample-es-http ClusterIP 10.19.248.93 <none> 9200/TC
kibana-sample-kb-http ClusterIP 10.19.246.116 <none> 5601/TCP 3d
----

[float]
[id="{p}-describe-failing-resources"]
== Describe failing resources

Expand Down Expand Up @@ -101,6 +103,7 @@ Events:

If you get an error with unbound persistent volume claims (PVCs), it means there is not currently a persistent volume that can satisfy the claim. If you are using automatically provisioned storage (for example Amazon EBS provisioner), sometimes the storage provider can take a few minutes to provision a volume, so this may resolve itself in a few minutes. You can also check the status by running `kubectl describe persistentvolumeclaims` to monitor events of the PVCs.

[float]
[id="{p}-eck-debug-logs"]
== Enable ECK debug logs

Expand All @@ -124,6 +127,7 @@ change the `args` array as follows:

Once your change is saved, the operator is automatically restarted by the StatefulSet controller to apply the new settings.

[float]
[id="{p}-view-logs"]
== View logs

Expand Down Expand Up @@ -172,7 +176,7 @@ Logs with `ERROR` level indicate something is not going as expected.
Due to link:https://github.com/eBay/Kubernetes/blob/master/docs/devel/api-conventions.md#concurrency-control-and-consistency[optimistic locking],
you can get errors reporting a conflict while updating a resource. You can ignore them, as the update goes through at the next reconciliation attempt, which will happen almost immediately.


[float]
[id="{p}-resource-level-config"]
== Configure Elasticsearch timeouts

Expand All @@ -188,7 +192,7 @@ To set the Elasticsearch client timeout to 60 seconds for a cluster named `quick
kubectl annotate elasticsearch quickstart eck.k8s.elastic.co/es-client-timeout=60s
----


[float]
[id="{p}-exclude-resource"]
== Exclude resources from reconciliation

Expand All @@ -212,6 +216,7 @@ Or in one line:
kubectl annotate elasticsearch quickstart --overwrite eck.k8s.elastic.co/managed=false
----

[float]
[id="{p}-get-k8s-events"]
== Get Kubernetes events

Expand Down Expand Up @@ -246,7 +251,7 @@ LAST SEEN FIRST SEEN COUNT NAME KIND
You can set filters for Kibana and APM Server too.
Note that the default TTL for events in Kubernetes is 1h, so unless your cluster settings have been modified you will not get events older than 1h.


[float]
[id="{p}-resize-pv"]
== Resizing persistent volumes

Expand Down Expand Up @@ -307,6 +312,7 @@ spec:

and ECK will automatically create a new StatefulSet and begin migrating data into it.

[float]
[id="{p}-exec-into-containers"]
== Exec into containers

Expand All @@ -319,6 +325,7 @@ kubectl exec -ti elasticsearch-sample-es-p45nrjch29 bash

This can also be done for Kibana and APM Server.

[float]
[id="{p}-suspend-elasticsearch"]
== Suspend Elasticsearch

Expand Down Expand Up @@ -348,7 +355,7 @@ Once you are done with troubleshooting the node, you can resume normal operation
kubectl annotate es quickstart eck.k8s.elastic.co/suspend-
----


[float]
[id="{p}-capture-jvm-heap-dumps"]
== Capture JVM heap dumps

Expand Down