Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 1edba9d

Browse files
committedNov 10, 2017
updating files
1 parent 0df3fcd commit 1edba9d

File tree

4 files changed

+199
-9
lines changed

4 files changed

+199
-9
lines changed
 

‎examples/gluster-statefulset-cloud/README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,20 @@ This example is initial development and research that utilizes the following:
3434
but wondering about what happens when a pod is killed by a user - not sure the liveness probe can recover from that
3535

3636

37-
# Issues
37+
# Potential and Real Issues
3838
1. Node goes down, since Statefulsets keep consistent DNS naming (but not guaranteed IPs - although, so far it seems they are mostly retained)
3939
when we bring the node back up OR bring up a new node in it's place, what happens with the TSP?
4040

41+
2. Best way to handle the `peer rejected` status, which happens when you delete a pod outside of the normal healthy scale up or down.
42+
- We could maybe use a lifecycle hook for preStop which fires before termination to remove from TSP and let the normal liveness probe add back in
43+
- Add another condition to our liveness probe to remedy the situation (about 6 steps)
44+
45+
3. I'm using hostNetwork because the pods can't communicate with each other, so this gives me a hostname of the node in the container rather than
46+
the pod name as the hostname (i.e. ip-172-18-12-34.ec2.internal vs. glusterfs-2). This makes it hard to tie the pod name back to the host.
47+
If I could get that info, it would help with some recovery decisions.
48+
49+
50+
4151
# Experimentation
4252
1. After initial cluster is running (make sure to give it time for liveness probe initial delay), check the TSP
4353
```

‎examples/gluster-statefulset-cloud/gluster-post.sh

Lines changed: 56 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,61 @@ IFS=$'\n\t'
55
if systemctl status glusterd | grep -q '(running) since'
66
then
77

8-
# Using API let's get the current replica count
9-
# curl -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://kubernetes.default.svc.cluster.local/apis/apps/v1beta1/statefulsets
10-
# Using API let's get the current replica count
11-
# curl -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://kubernetes.default.svc.cluster.local/apis/apps/v1beta1/statefulsets
8+
# Run some api commands to figure out who we are and our state
129
CURL_COMMAND="curl -v"
1310
K8_CERTS="--cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
1411
GET_TOKEN="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"
1512
K8_TOKEN="-H \"Authorization: Bearer $GET_TOKEN\""
13+
14+
# StatefulSet Calls
1615
STATEFULSET_API_CALL="https://kubernetes.default.svc.cluster.local/apis/apps/v1beta1/namespaces/$NAMESPACE/statefulsets/$BASE_NAME"
17-
POD_API_CALL="https://kubernetes.default.svc.cluster.local/api/v1/namespaces/default/pods/glusterfs-1"
1816
STATEFULSET_API_COMMAND="$CURL_COMMAND $K8_CERTS $K8_TOKEN $STATEFULSET_API_CALL"
19-
POD_API_COMMAND="$CURL_COMMAND $K8_CERTS $K8_TOKEN $POD_API_CALL"
2017
REPLICA_COUNT=`eval $STATEFULSET_API_COMMAND | grep 'replicas'|cut -f2 -d ":" |cut -f2 -d "," | tr -d '[:space:]'`
18+
echo "replica count = $REPLICA_COUNT"
19+
20+
# Get Node running on
21+
PODS_API_CALL="https://kubernetes.default.svc.cluster.local/api/v1/namespaces/default/pods"
22+
PODS_API_COMMAND="$CURL_COMMAND $K8_CERTS $K8_TOKEN $PODS_API_CALL"
23+
MY_PODS=`eval $PODS_API_COMMAND | grep 'pod.beta.kubernetes.io/hostname'|cut -f2 -d ":" | tr -d '[:space:]'`
24+
25+
# Get Node running on
26+
PODS_API_CALL="https://kubernetes.default.svc.cluster.local/api/v1/namespaces/default/pods?labelSelector=app=glusterfs"
27+
PODS_API_COMMAND="$CURL_COMMAND $K8_CERTS $K8_TOKEN $PODS_API_CALL"
28+
MY_PODS=`eval $PODS_API_COMMAND | grep 'pod.beta.kubernetes.io/hostname'|cut -f2 -d ":" | tr -d '[:space:]' | tr -d '"'`
29+
30+
# Get Host the pods are running
31+
HOSTS_API_CALL="https://kubernetes.default.svc.cluster.local/api/v1/namespaces/default/pods?labelSelector=app=glusterfs"
32+
HOSTS_API_COMMAND="$CURL_COMMAND $K8_CERTS $K8_TOKEN $HOSTS_API_CALL"
33+
MY_HOSTS=`eval $HOSTS_API_COMMAND | grep 'nodeName'|cut -f2 -d ":" | tr -d '[:space:]' | tr -d '"'`
34+
35+
# Find the pod running on this particular host
36+
HOSTCOUNT=0
37+
HOSTPOD=""
38+
mycount=0
39+
40+
for host in $(echo $MY_HOSTS | tr ',' '\n')
41+
do
42+
# call your procedure/other scripts here below
43+
mycount=$(( $mycount + 1 ))
44+
if [ "$HOSTNAME" == "$host" ]
45+
then
46+
# get index
47+
HOSTCOUNT=$mycount
48+
fi
49+
done
50+
51+
echo " --- NEXT ---"
52+
mycount=0
53+
for pod in $(echo $MY_PODS | tr ',' '\n')
54+
do
55+
# call your procedure/other scripts here below
56+
mycount=$(( $mycount + 1 ))
57+
if [ "$HOSTCOUNT" -eq "$mycount" ]
58+
then
59+
# get the pod
60+
HOSTPOD=$pod
61+
fi
62+
done
2163

2264
#Figure State of Cluster
2365
# Keeps track of initial peer count only run on original starting cluster
@@ -46,8 +88,15 @@ then
4688
echo "expected_replica_count = $EXPECTED_REPLICA_COUNT" >> /usr/share/bin/gluster.log
4789
echo "replica_count = $REPLICA_COUNT" >> /usr/share/bin/gluster.log
4890
echo "initial run? $INITIAL_RUN" >> /usr/share/bin/gluster.log
91+
echo "MY_HOSTS = $MY_HOSTS" >> /usr/share/bin/gluster.log
92+
echo "MY_PODS = $MY_PODS" >> /usr/share/bin/gluster.log
93+
echo "HOSTCOUNT = $HOSTCOUNT" >> /usr/share/bin/gluster.log
94+
echo "HOSTPOD = $HOSTPOD" >> /usr/share/bin/gluster.log
4995

50-
96+
97+
# TODO: Add "peer rejected" status and mitigation
98+
# TODO: test volume management
99+
51100
if [ "$INITIAL_RUN" == "yes" ]
52101
then
53102
echo "Initial Run on host" >> /usr/share/bin/gluster.log
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
#! /bin/bash
2+
set -euo pipefail
3+
IFS=$'\n\t'
4+
5+
if systemctl status glusterd | grep -q '(running) since'
6+
then
7+
8+
# Run some api commands to figure out who we are and our state
9+
CURL_COMMAND="curl -v"
10+
K8_CERTS="--cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
11+
GET_TOKEN="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"
12+
K8_TOKEN="-H \"Authorization: Bearer $GET_TOKEN\""
13+
14+
# StatefulSet Calls
15+
STATEFULSET_API_CALL="https://kubernetes.default.svc.cluster.local/apis/apps/v1beta1/namespaces/$NAMESPACE/statefulsets/$BASE_NAME"
16+
STATEFULSET_API_COMMAND="$CURL_COMMAND $K8_CERTS $K8_TOKEN $STATEFULSET_API_CALL"
17+
REPLICA_COUNT=`eval $STATEFULSET_API_COMMAND | grep 'replicas'|cut -f2 -d ":" |cut -f2 -d "," | tr -d '[:space:]'`
18+
echo "replica count = $REPLICA_COUNT"
19+
20+
# Get Node running on
21+
PODS_API_CALL="https://kubernetes.default.svc.cluster.local/api/v1/namespaces/default/pods"
22+
PODS_API_COMMAND="$CURL_COMMAND $K8_CERTS $K8_TOKEN $PODS_API_CALL"
23+
MY_PODS=`eval $PODS_API_COMMAND | grep 'pod.beta.kubernetes.io/hostname'|cut -f2 -d ":" | tr -d '[:space:]'`
24+
25+
# Get Node running on
26+
PODS_API_CALL="https://kubernetes.default.svc.cluster.local/api/v1/namespaces/default/pods?labelSelector=app=glusterfs"
27+
PODS_API_COMMAND="$CURL_COMMAND $K8_CERTS $K8_TOKEN $PODS_API_CALL"
28+
MY_PODS=`eval $PODS_API_COMMAND | grep 'pod.beta.kubernetes.io/hostname'|cut -f2 -d ":" | tr -d '[:space:]' | tr -d '"'`
29+
30+
# Get Host the pods are running
31+
HOSTS_API_CALL="https://kubernetes.default.svc.cluster.local/api/v1/namespaces/default/pods?labelSelector=app=glusterfs"
32+
HOSTS_API_COMMAND="$CURL_COMMAND $K8_CERTS $K8_TOKEN $HOSTS_API_CALL"
33+
MY_HOSTS=`eval $HOSTS_API_COMMAND | grep 'nodeName'|cut -f2 -d ":" | tr -d '[:space:]' | tr -d '"'`
34+
35+
# Find the pod running on this particular host
36+
HOSTCOUNT=0
37+
HOSTPOD=""
38+
mycount=0
39+
40+
for host in $(echo $MY_HOSTS | tr ',' '\n')
41+
do
42+
# call your procedure/other scripts here below
43+
mycount=$(( $mycount + 1 ))
44+
if [ "$HOSTNAME" == "$host" ]
45+
then
46+
# get index
47+
HOSTCOUNT=$mycount
48+
fi
49+
done
50+
51+
echo " --- NEXT ---"
52+
mycount=0
53+
for pod in $(echo $MY_PODS | tr ',' '\n')
54+
do
55+
# call your procedure/other scripts here below
56+
mycount=$(( $mycount + 1 ))
57+
if [ "$HOSTCOUNT" -eq "$mycount" ]
58+
then
59+
# get the pod
60+
HOSTPOD=$pod
61+
fi
62+
done
63+
echo $HOSTPOD
64+
65+
66+
# For this to work we need to be able to determine what host we are on
67+
# search on this pod.beta.kubernetes.io/hostname=
68+
69+
#Figure State of Cluster
70+
# Keeps track of initial peer count only run on original starting cluster
71+
numpeers="$(gluster peer status | grep -oP 'Peers:\s\K\w+')"
72+
EXPECTED_REPLICA_COUNT=$(( $numpeers + 1 )) #should match REPLICA_COUNT after script runs
73+
ORIGINAL_PEER_COUNT=$numpeers
74+
CURRENT_NODE_COUNT=$(( $numpeers + 1 ))
75+
EXPECTED_PEER_COUNT=$(( $REPLICA_COUNT - 1 ))
76+
PEER_COUNT=$(( $REPLICA_COUNT - 1 ))
77+
VOLUME_LIST=""
78+
INITIAL_RUN="no"
79+
80+
echo "Pre Termination Script Executed" > /usr/share/bin/gluster-stop.log
81+
echo "" >> /usr/share/bin/gluster-stop.log
82+
echo "" >> /usr/share/bin/gluster-stop.log
83+
echo "****** LOG ******" >> /usr/share/bin/gluster-stop.log
84+
echo "original_peer_count = $ORIGINAL_PEER_COUNT" >> /usr/share/bin/gluster-stop.log
85+
echo "expected_peer_count = $EXPECTED_PEER_COUNT" >> /usr/share/bin/gluster-stop.log
86+
echo "peer_count = $PEER_COUNT" >> /usr/share/bin/gluster-stop.log
87+
echo "expected_replica_count = $EXPECTED_REPLICA_COUNT" >> /usr/share/bin/gluster-stop.log
88+
echo "replica_count = $REPLICA_COUNT" >> /usr/share/bin/gluster-stop.log
89+
echo "initial run? $INITIAL_RUN" >> /usr/share/bin/gluster-stop.log
90+
echo "MY_HOSTS = $MY_HOSTS" >> /usr/share/bin/gluster-stop.log
91+
echo "MY_PODS = $MY_PODS" >> /usr/share/bin/gluster-stop.log
92+
echo "HOSTCOUNT = $HOSTCOUNT" >> /usr/share/bin/gluster-stop.log
93+
echo "HOSTPOD = $HOSTPOD" >> /usr/share/bin/gluster-stop.log
94+
95+
96+
97+
if [ "${ORIGINAL_PEER_COUNT}" -eq "0" ] && [ "$INITIAL_RUN" == "no" ]
98+
then
99+
echo "nothing in the pool, probably should do nothing" >> /usr/share/bin/gluster-stop.log
100+
101+
102+
else
103+
echo "Someone is terminating our pod" >> /usr/share/bin/gluster-stop.log
104+
105+
# Let's proactively remove the TSP??
106+
# Remove from TSP
107+
if (gluster peer status | grep -q "Hostname: $HOSTPOD.$SERVICE_NAME.$NAMESPACE.svc.cluster.local")
108+
then
109+
result=`eval gluster peer detach $HOSTPOD.$SERVICE_NAME.$NAMESPACE.svc.cluster.local`
110+
wait
111+
echo "... Removed $HOSTPOD from TSP" >> /usr/share/bin/gluster-stop.log
112+
else
113+
echo "...Nothing to do here" >> /usr/share/bin/gluster-stop.log
114+
fi
115+
116+
117+
else
118+
echo "Why did we hit this, what is our state at this point" >> /usr/share/bin/gluster.log
119+
fi
120+
echo "pre-termination script executed" >> /usr/share/bin/gluster-stop.log
121+
exit 0
122+
else
123+
echo "glusterd not running...fail" >> /usr/share/bin/gluster-stop.log
124+
exit 1
125+
fi

‎examples/gluster-statefulset-cloud/glusterfs-statefulset.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,13 @@ spec:
6262
command:
6363
- "chmod"
6464
- "+x"
65-
- "/usr/share/bin/gluster-post.sh"
65+
- "/usr/share/bin/gluster-post.sh /usr/share/bin/gluster-stop.sh"
66+
preStop:
67+
exec:
68+
command:
69+
- "/bin/sh"
70+
- "-c"
71+
- "source ./usr/share/bin/gluster-stop.sh"
6672
ports:
6773
- containerPort: 24007
6874
- containerPort: 24008

0 commit comments

Comments
 (0)
Please sign in to comment.