Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kuma issues and heartbeat monitoring #104

Open
wants to merge 81 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
ff6f6a5
update k8s version
rlratcliffe Jun 29, 2023
6fa19d4
update sealed secrets version
rlratcliffe Jun 29, 2023
02ded59
fix postgresql using new chart
rlratcliffe Jun 29, 2023
940de0b
add resource group/storage account to create/delete
rlratcliffe Jun 30, 2023
1f11e13
maybe fix kong
rlratcliffe Jul 1, 2023
b09860c
upgrade cert-manager
rlratcliffe Jul 1, 2023
f64552a
update keycloak
rlratcliffe Jul 3, 2023
d145cbd
add command to create dns zone in aks
rlratcliffe Jul 3, 2023
051e5e9
maybe fix elastic
rlratcliffe Jul 3, 2023
1908ae5
update cronjob for jaeger, but leave operator same for now
rlratcliffe Jul 3, 2023
f5832e4
part 1 of fixing jenkins
rlratcliffe Jul 4, 2023
1ba160b
part 2 of fixing jenkins (still broken)
rlratcliffe Jul 5, 2023
c0b1f8b
docs: add start of manual backups info
rlratcliffe Jul 6, 2023
2957299
update postgres sql init image
rlratcliffe Jul 6, 2023
1a4512c
part 3 of fixing jenkins
rlratcliffe Jul 6, 2023
b19cee4
update default_vars
rlratcliffe Jul 6, 2023
c0676ed
fix formatting
rlratcliffe Jul 6, 2023
8b7b16d
docs: add nexus backup info
rlratcliffe Jul 6, 2023
ec21d01
add wait after dns
rlratcliffe Jul 8, 2023
b30cf81
docs: modify version
rlratcliffe Jul 8, 2023
97879bc
docs: add access info and acceptance tests
rlratcliffe Jul 8, 2023
29b8c97
allow retries of adding kibana assets
rlratcliffe Jul 8, 2023
15dda7a
docs: add additional tests
rlratcliffe Jul 9, 2023
178835a
don't validate certs in kibana call to match old curl call
rlratcliffe Jul 9, 2023
217d026
docs: add some more info
rlratcliffe Jul 9, 2023
3e5842b
update acceptance test results, will delete results later
rlratcliffe Jul 9, 2023
db3c746
add todo
rlratcliffe Jul 9, 2023
d8abcaf
update TODO
rlratcliffe Jul 9, 2023
ddc5c11
add keycloak roles
rlratcliffe Jul 9, 2023
0cd4f76
dunno if this is a real issue yet
rlratcliffe Jul 9, 2023
84c2e2d
#101 remove todos, moved to issue
rlratcliffe Jul 9, 2023
fb5c628
docs: update acceptance tests
rlratcliffe Jul 12, 2023
b95d74f
fix keycloak issue with kong
rlratcliffe Jul 13, 2023
596d104
reset acceptance tests
rlratcliffe Jul 13, 2023
e7e3d38
add helm repo and troubleshooting shell to namespaces
rlratcliffe Jul 14, 2023
0f336c8
add test
rlratcliffe Jul 14, 2023
dedd482
fix issue with jenkins-ci not able to pull images/access registry
rlratcliffe Jul 14, 2023
5b59a6b
docs: add gitea backup info
rlratcliffe Jul 15, 2023
dc7c361
revert postgres changes, use older bitnami index
rlratcliffe Jul 15, 2023
9349785
update acceptance tests
rlratcliffe Jul 16, 2023
0afb651
update acceptance tests again
rlratcliffe Jul 16, 2023
0b23687
add comment for clarity of separating out all-in-one
rlratcliffe Jul 22, 2023
0322bc8
add gitshell
rlratcliffe Jul 23, 2023
352a257
update flux version
rlratcliffe Jul 23, 2023
015723b
Revert "update flux version"
rlratcliffe Jul 23, 2023
805d2a3
update acceptance tests
rlratcliffe Aug 7, 2023
20f7bcc
update aks version
rlratcliffe Aug 7, 2023
a9c51b8
update jenkins (working)
rlratcliffe Aug 7, 2023
f7d986d
update acceptance tests again
rlratcliffe Aug 8, 2023
d2fbf85
ignore vscode folder
rlratcliffe Aug 13, 2023
2591f71
fix jaeger (partially working)
rlratcliffe Aug 13, 2023
f5e2654
misc keycloak changes
rlratcliffe Aug 13, 2023
a488f80
fix issues with elastic init-runner
rlratcliffe Aug 13, 2023
b1b2012
fix init of ca.crt secret for jaeger
rlratcliffe Aug 14, 2023
e6cfd8f
add main tavros host env var
rlratcliffe Aug 14, 2023
80bf6b8
update postgres backup instructions
rlratcliffe Aug 17, 2023
0be68a5
reduce duplication of jenkins vars
rlratcliffe Aug 17, 2023
33f61a0
Revert "reduce duplication of jenkins vars"
rlratcliffe Aug 17, 2023
9388d43
update acceptance tests
rlratcliffe Aug 22, 2023
451543a
prevent troubleshooting shell from running as root
rlratcliffe Aug 27, 2023
ef12f5d
remove unnecessary ansible step
rlratcliffe Aug 27, 2023
1cb75e5
update docs and default_vars
rlratcliffe Aug 27, 2023
921111f
add info about rate limits
rlratcliffe Aug 27, 2023
6311ff1
cleanup
rlratcliffe Sep 1, 2023
645bc20
update acceptance tests
rlratcliffe Sep 2, 2023
2129ed4
update main readme
rlratcliffe Sep 2, 2023
599e43c
bump jenkins versions/attempt to lock other versions
rlratcliffe Sep 3, 2023
9decda6
update acceptance tests again
rlratcliffe Sep 3, 2023
0d7a2a4
fix tests except for .stdout issue
rlratcliffe Sep 3, 2023
b81cd1f
fix enterprise tests
rlratcliffe Sep 3, 2023
cf500f1
change how elastic cert is retrieved & secret is created
rlratcliffe Sep 4, 2023
b22b1f6
update acceptance tests again
rlratcliffe Sep 4, 2023
2a92ec5
update ansible config otherwise docker image doesn't work
rlratcliffe Sep 4, 2023
6aa1c69
move sidecar injection to label, per kuma upgrade
rlratcliffe Oct 17, 2023
729d172
replace heartbeat instance with instance per api namespace
rlratcliffe Oct 17, 2023
2e446e0
move sidecar injection to label for kong
rlratcliffe Oct 17, 2023
e078a78
update kuma version
rlratcliffe Oct 17, 2023
bccfa74
update flux
rlratcliffe Oct 17, 2023
bc68e00
fix tests
rlratcliffe Oct 21, 2023
737b661
update acceptance tests
rlratcliffe Oct 21, 2023
348cda5
add retries for kong and elastic steps
rlratcliffe Feb 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/buildtools-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ set -o errexit

KUBECTL_VERSION=1.21.0

# https://github.com/fluxcd/kustomize-controller/blob/v0.10.0/go.mod#L34
# https://github.com/fluxcd/kustomize-controller/blob/v0.41.2/go.mod#L34
# https://github.com/kubernetes-sigs/kustomize/blob/kustomize/v3.9.4/kustomize/go.mod#L11
FLUX_VERSION=0.10.0
FLUX_VERSION=0.41.2
KUSTOMIZE_VERSION=3.9.4

KUBESEAL_VERSION=0.15.0
Expand Down
14 changes: 7 additions & 7 deletions docs/AcceptanceTests.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ These are in order. Refer to the above access chart for how to login to each com
| Is secure | | | Checks cert manager is working correctly |
| Assets are correctly installed | | | |
| Can login with admin creds from and view logs. All expected dashboards (including 'Dashboard' -> 'Tavros - Logs Dashboard') are there. | | | |
| Observability -> Uptime shows UP for prod, dev, and test pod | | | |

#### Jaeger

Expand All @@ -118,13 +119,12 @@ These are in order. Refer to the above access chart for how to login to each com

| Expected result | Actual result | PASS/FAIL | Purpose of test |
|---|---|---|---|
| Can't curl across namespaces (ex: from prod to test) | | | Tests mesh is working correctly |

Notes:

In prod troubleshooting shell:
- curl http://api-repo.prod.svc.cluster.local:9000/api/pet/123 should return` {"opId":"get-pet-petId"}`
- curl http://api-repo.dev.svc.cluster.local:9000/api/pet/123 should return `(52) Empty reply from server`
| From PROD shell `curl 'http://api-repo.prod.svc.cluster.local:8080/actuator/health/liveness'` returns {"status":"UP"} | | | |
| From PROD shell `curl 'http://api-repo.dev.svc.cluster.local:8080/actuator/health/liveness'` & `curl 'http://api-repo.test.svc.cluster.local:8080/actuator/health/liveness'` returns Empty reply from server | | |
| From DEV shell `curl 'http://api-repo.prod.svc.cluster.local:8080/actuator/health/liveness'` returns Empty reply from server | | | |
| From DEV shell `curl 'http://api-repo.dev.svc.cluster.local:8080/actuator/health/liveness'` & `curl 'http://api-repo.test.svc.cluster.local:8080/actuator/health/liveness'` returns {"status":"UP"} | | | |
| From TEST shell `curl 'http://api-repo.prod.svc.cluster.local:8080/actuator/health/liveness'` returns Empty reply from server | | | |
| From TEST shell `curl 'http://api-repo.dev.svc.cluster.local:8080/actuator/health/liveness'` & `curl 'http://api-repo.test.svc.cluster.local:8080/actuator/health/liveness'` returns {"status":"UP"} | | | |

#### Postgres

Expand Down
150 changes: 116 additions & 34 deletions roles/elastic_cloud/files/heartbeat.yaml
Original file line number Diff line number Diff line change
@@ -1,35 +1,3 @@
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: tavros-heartbeat
namespace: elastic-system
spec:
type: heartbeat
version: 7.13.4
elasticsearchRef:
name: tavros
config:
heartbeat:
autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
hints.enabled: true
daemonSet:
podTemplate:
spec:
serviceAccountName: heartbeat
automountServiceAccountToken: true
securityContext:
runAsUser: 0
containers:
- name: heartbeat
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
Expand Down Expand Up @@ -61,7 +29,7 @@ rules:
apiVersion: v1
kind: ServiceAccount
metadata:
name: heartbeat
name: heartbeat-svc
namespace: elastic-system
---
apiVersion: rbac.authorization.k8s.io/v1
Expand All @@ -70,9 +38,123 @@ metadata:
name: heartbeat
subjects:
- kind: ServiceAccount
name: heartbeat
name: heartbeat-svc
namespace: elastic-system
roleRef:
kind: ClusterRole
name: heartbeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: tavros-heartbeat-prod
namespace: elastic-system
spec:
type: heartbeat
version: 7.13.4
elasticsearchRef:
name: tavros
config:
heartbeat:
autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
namespace: "prod"
hints.enabled: true
daemonSet:
podTemplate:
metadata:
labels:
kuma.io/sidecar-injection: enabled
annotations:
kuma.io/mesh: prod
spec:
serviceAccountName: heartbeat-svc
automountServiceAccountToken: true
securityContext:
runAsUser: 0
containers:
- name: heartbeat
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
---
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: tavros-heartbeat-dev
namespace: elastic-system
spec:
type: heartbeat
version: 7.13.4
elasticsearchRef:
name: tavros
config:
heartbeat:
autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
namespace: "dev"
hints.enabled: true
daemonSet:
podTemplate:
metadata:
labels:
kuma.io/sidecar-injection: enabled
annotations:
kuma.io/mesh: sandbox
spec:
serviceAccountName: heartbeat-svc
automountServiceAccountToken: true
securityContext:
runAsUser: 0
containers:
- name: heartbeat
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
---
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: tavros-heartbeat-test
namespace: elastic-system
spec:
type: heartbeat
version: 7.13.4
elasticsearchRef:
name: tavros
config:
heartbeat:
autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
namespace: "test"
hints.enabled: true
daemonSet:
podTemplate:
metadata:
labels:
kuma.io/sidecar-injection: enabled
annotations:
kuma.io/mesh: sandbox
spec:
serviceAccountName: heartbeat-svc
automountServiceAccountToken: true
securityContext:
runAsUser: 0
containers:
- name: heartbeat
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
2 changes: 1 addition & 1 deletion roles/fluxtoolkit/tasks/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

- name: Generate Flux GitOps Toolkit Resources
shell: |
flux install --version=v0.10.0 \
flux install --version=v0.41.2 \
--export > /tmp/{{ cluster_fqdn }}/platform/flux-system/gotk-components.yaml

- name: Template Files
Expand Down
5 changes: 4 additions & 1 deletion roles/kong/templates/release.j2
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,6 @@ spec:
{% endif %}
podAnnotations:
{% if item[0].kuma_mesh_name is defined %}
kuma.io/sidecar-injection: enabled
kuma.io/gateway: {{ 'enabled' if ((item[0].hybrid) and (item[1].role == 'control_plane')) else 'enabled' }}
kuma.io/mesh: {{ item[0].kuma_mesh_name }}
{% endif %}
Expand All @@ -123,6 +122,10 @@ spec:
co.elastic.logs.proxy/module: nginx
co.elastic.logs.proxy/fileset.stdout: access
co.elastic.logs.proxy/fileset.stderr: error
{% endif %}
{% if item[0].kuma_mesh_name is defined %}
podLabels:
kuma.io/sidecar-injection: enabled
{% endif %}
admin:
enabled: {{ false if ((item[0].hybrid) and (item[1].role == 'data_plane')) else true }}
Expand Down
4 changes: 2 additions & 2 deletions roles/kuma/files/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ spec:
kind: HelmRepository
name: kuma
namespace: flux-system
# https://github.com/kumahq/kuma/blob/1.2.0/deployments/charts/kuma/values.yaml
version: 0.6.0
# https://github.com/kumahq/kuma/blob/2.4.3/deployments/charts/kuma/values.yaml
version: 2.4.3
interval: 30m
install:
remediation:
Expand Down
5 changes: 3 additions & 2 deletions roles/namespace/templates/ns.j2
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ kind: Namespace
metadata:
name: {{ item.name }}
{% if item.kuma_mesh_name is defined %}
labels:
kuma.io/sidecar-injection: enabled
annotations:
kuma.io/mesh: {{ item.kuma_mesh_name }}
kuma.io/sidecar-injection: enabled
{% endif %}
{% endif %}
Loading
Loading