Skip to content

Commit

Permalink
Merge pull request #7 from kuzmik/kuzmik/pre-stop-hook
Browse files Browse the repository at this point in the history
started work on pre-stop hook
  • Loading branch information
kuzmik committed Nov 29, 2023
2 parents 13cb1de + c591b39 commit f9610d8
Show file tree
Hide file tree
Showing 13 changed files with 291 additions and 82 deletions.
5 changes: 1 addition & 4 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
# normally we'd ignore this, but i want the repo in the image so that we can
# run git commands in the build process
#/.git

/.github
/tmp

Expand All @@ -11,4 +7,5 @@
/proxysql-agent

# docs don't need to be in the image
/docs
*.md
2 changes: 1 addition & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

version: 2
updates:
- package-ecosystem: "" # See documentation for possible values
- package-ecosystem: "gomod" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "weekly"
1 change: 0 additions & 1 deletion .github/workflows/go.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ name: Go

on:
push:
branches: [ "main" ]
pull_request:

jobs:
Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## 0.9.0

Initial beta release.
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ SHELL := /bin/bash
TARGET := $(shell echo $${PWD\#\#*/})

# These will be provided to the target
VERSION := 0.1.0
VERSION := 0.9.0
BUILD_SHA := `git rev-parse HEAD`
BUILD_TIME := `date +%FT%T%z`

# Use linker flags to provide version/build settings to the target.
# If we don't need debugging symbols, add -s and -w to make a smaller binary
LDFLAGS=-ldflags "-X 'main.Version=$(VERSION)' -X 'main.Build=$(BUILD_SHA)' -X 'main.BuildTime=$(BUILD_TIME)'"
LDFLAGS=-ldflags "-s -w -X 'main.version=$(VERSION)' -X 'main.build=$(BUILD_SHA)' -X 'main.builddate=$(BUILD_TIME)'"

# go source files, ignore vendor directory
# go source files
SRC=$(shell find . -type f -name '*.go')

all: clean lint build
Expand Down Expand Up @@ -46,4 +46,4 @@ run: build
@./$(TARGET)

docker: clean lint
@docker build --build-arg="VERSION=${VERSION}" --build-arg="BUILD_TIME=${BUILD_TIME}" --build-arg="BUILD_SHA=${BUILD_SHA}" . -t proxysql-agent
@docker build --build-arg="VERSION=${VERSION}" --build-arg="BUILD_TIME=${BUILD_TIME}" --build-arg="BUILD_SHA=${BUILD_SHA}" -f build/Dockerfile . -t proxysql-agent
84 changes: 51 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,55 +4,56 @@

## About

A small, statically compiled go binary for use in maintaining the state of a [ProxySQL](https://github.com/sysown/proxysql) cluster. Also includes a [Dockerfile](Dockerfile) to generate an alpine based image, for use as a kubernetes sidecar.

**NB**: I'd like to open source this ASAP, provided we get signoff from legal. There is so little tooling for ProxySQL out there, this might be useful to someone.
A small, statically compiled go binary for use in maintaining the state of a [ProxySQL](https://github.com/sysown/proxysql) cluster, for use as a kubernetes sidecar container. The repo includes a [Dockerfile](build/Dockerfile) to generate an alpine based image.

### "Self healing" the ProxySQL cluster

TODO: diagram of core/satellite pods
TODO: link to example deployment

This is mainly useful in a kubernetes deployment if you have a horizontal pod autoscaler defined for satellite and/or core pods; as these pods scale in and out, the state of the ProxySQL cluster needs to be maintained. If you are running a static cluster on VMs and the hosts rarely change, or you don't use an HPA, this probably won't be as useful to you (though there are some features coming that might help with even that).

Some examples of where this is necessary:

- As satellite pods scale in, the core pods need to run `LOAD PROXYSQL SERVERS TO RUNTIME` in order to accept the new pods to the cluster; until that is done, the satellite pod will not receive configuration from the core pods
- As satellite pods scale in, one of the core pods need to run `LOAD PROXYSQL SERVERS TO RUNTIME` in order to accept the new pods to the cluster; until that is done, the satellite pod will not receive configuration from the core pods
- As core pods recycle (or all core pods are recycled) and IPs to them change, the satellites need to run some commands to load the new core pods into runtime
- If _all_ core pods recycle, the satellite pods will run `LOAD PROXYSQL SERVERS FROM CONFIG` which points them to the `proxysql-core` service, and once the core pods are up the satellites should receive configuration again
- Note that if your cluster is running fine and the core pods all go away, the satellites will continue to function with the settings they already had; in other words, even if the core pods vanish, you will still serve proxied MySQL traffic as long as the satellites have fetched the configuration once

You can see the code for this in `proxysql.go` in the `Core()` and `Satellite()` functions.
### Why did you pick golang, if you work at a Ruby shop?

Note that if your cluster is running fine, and the core pods all go away, the satellites will continue to function with the settings they already had; in other words, even if the core pods vanish, you will still serve proxied MySQL traffic.
I looked into using ruby, and in fact the "agents" we are currently running **are** written in ruby, but there have been some issues:

#### Why did you pick golang, if you work at a Ruby shop?
- If the proxysql admin interface gets wedged, the ruby and mysl processes still continue to spawn and spin, which will eventually lead to either inode exhaustion or a container OOM
- The scheduler spawns a new ruby process every 10s
- Each ruby process shells out to the mysql binary several times times per script invocation
- In addition to the scheduler process, the health probes is a separate ruby script that also spawns several mysql processes per run
- Two script invocations every 10s, one for liveness and one for readiness

I looked into using ruby, and in fact the "agents" we are currently running **are** written in ruby, but there have been some issues:

- The scheduler spawns a new ruby process every 10s
- Each ruby process shells out to the mysql binary one or more times per script run; I wanted to avoid installing mysql gems
- If the proxysql admin interface gets wedged, the ruby and mysl processes still continue to spawn and block, which will eventually lead to either inode exhaustion or an OOM
- We can statically compile this, and don't need to mess with a bunch of ruby gems. And I mean a _bunch_ of ruby gems
- k8s tooling is generally written in Golang, and it shows. The ruby k8s gems are not as good as the golang libraries, unfortunately
We wanted to avoid having to install a bunch of ruby gems in the container, so we decided shelling out to mysql was fine; we got most of the patterns from existing ProxySQL tooling and figured it'd work short term. The ruby has worked fine, though there have been enough instances of OOM'd containers that it's become worrisome. This usually happens if someone is in a pod doing any kind of work (modifying mysql query rules, etc), but we haven't been able to figure out what causes the admin interface to become wedged.

Because k8s tooling is generally written in Golang, the ruby k8s gems didn't seem to be as maintained or as easy to use as the golang libraries. And because the go process is statically compiled, and we won't need to deal with a bunch of external dependencies at runtime.


## Design

In the [example repo](https://github.com/kuzmik/local-proxysql), there are two separate deployments; the `core` and the `satellite` deployments. The agent is responsible for maintaining this cluster.

I will say, I _am_ more comfortable with Ruby and am still leaning all of the Go differences, so I am more than open to feedback here. However, since this is such a simple application I have no qualms about the choice of language.
![image](docs/infra.png)

### Design
On boot, the agent will connect to the ProxySQL admin interface on `127.0.0.1:6032` (default address). It will maintain the connection throughout the life of the pod, and will periodicially run the commands necessary to maintain the cluster, depending on the run mode specified on boot.

N/A, as yet
Additionally, the agent also exposes a simple HTTP API used for k8s health checks for the pod, as well as the /shutdown endpoint, which can be used in a `container.lifecycle.preStop.httpGet` hook to gracefully drain traffic from a pod before stopping it.

### Status - Alpha

This is currently in alpha. Do not use it in production yet.
## Status - Beta

### TODOs
This is currently in beta. We are running this in staging.

There are some linear tickets, but here's a high level overview of what I have in mind.

- *P1* - Health checks; replace the ruby health probe with this
- Use an HTTP endpoint for health checks, because the proxysql container can call the agent container; meaning, if we configure k8s to load `localhost:8080/status` in the proxysql container, and that endpoint is running in the sidecar, it will work just fine
- *P2* - Replace the pre-stop ruby script with this
- same deal as the health check, use the shared FS for this
## TODOs

There are some internal linear tickets, but here's a high level overview of what we have in mind.

- *P2* - Better test coverage
- *P3* - Leader election; elect one core pod and have it be responsible for managing cluster state
- *P3* - "plugin" support; we don't necessarily need to add all the Persona specific cases to the main agent, as they won't likely apply to most people
- "chaosmonkey" feature
Expand All @@ -63,19 +64,36 @@ There are some linear tickets, but here's a high level overview of what I have i
- force a satellite resync (if running in satellite mode)
- etc
- Now I'm no sure this is that important; we can just add more commands to the agent, and run said commands from the CLI

- *P5* - If possible, cleanup the errors that are thrown when the `preStop` hook runs. This might not be possible due to how k8s kills containers, but if it is, these errors need to go away:
```
time=2023-11-29T02:32:22.422Z level=INFO msg="Pre-stop called, starting shutdown process" shutdownDelay=120
time=2023-11-29T02:32:24.341Z level=INFO msg="Pre-stop commands ran" commands="UPDATE global_variables SET variable_value = 120000 WHERE variable_name in ('mysql-connection_max_age_ms', 'mysql-max_transaction_idle_time', 'mysql-max_transaction_time'); UPDATE global_variables SET variable_value = 1 WHERE variable_name = 'mysql-wait_timeout'; LOAD MYSQL VARIABLES TO RUNTIME; PROXYSQL PAUSE;"
time=2023-11-29T02:32:24.343Z level=INFO msg="No connected clients remaining, proceeding with shutdown"
[mysql] 2023/11/29 02:32:24 packets.go:37: unexpected EOF
time=2023-11-29T02:32:24.348Z level=ERROR msg="KILL command failed" commands="PROXYSQL KILL" error="invalid connection"
rpc error: code = Unknown desc = Error: No such container: e3153c34e0ad525c280dd26695b78d917b1cb377a545744bffb9b31ad1c90670%
```
### MVP Requirements
1. ✅ Cluster management (ie: core and satellite agents) (completed)
1. 🏗️ Health checks via an HTTP endpoint, specifically for the ProxySQL container (in progress)
1. Pre-stop hook replacement
1. ✅ Cluster management (ie: core and satellite agents)
1. Health checks via an HTTP endpoint, specifically for the ProxySQL container
1. Pre-stop hook replacement
#### Done
### Done
- *P1* - ~~Dump the contents of `stats_mysql_query_digests` to a file on disk; will be used to get the data into snowflake. File format is CSV~~
- *P1* - ~~Health checks; replace the ruby health probe with this~~
- *P2* - ~~Replace the pre-stop ruby script with this~~
## Releasing a new version
1. Update version in Makefile (and anywhere that calls `go build`, like pipelines)
1. Update the CHANGELOG.md with the changes
### See also
## See also
Libraries in use:
Expand Down
10 changes: 6 additions & 4 deletions Dockerfile → build/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,24 @@ ENV GO111MODULE=on
# Set destination for COPY
WORKDIR /build

COPY go.sum go.mod .
COPY go.sum go.mod ./

RUN go mod download

COPY . .

RUN apk update && apk add --no-cache git && rm -rf /var/cache/apk/*
RUN apk update \
&& apk add --no-cache git=2.40.1-r0 \
&& rm -rf /var/cache/apk/* /lib/apk/db/*

RUN CGO_ENABLED="0" go build -ldflags "-s -w -X 'main.version=${VERSION}' -X 'main.build=${BUILD_SHA}' -X 'main.built=${BUILD_TIME}'" -o proxysql-agent .

# Stage 2
FROM alpine:3.18.4 as runner

# add mysql-client to apk add when we're ready
# add mysql-client, curl, jq, etc to apk add when we're ready
RUN apk update \
&& apk add --no-cache bash bind-tools \
&& apk add --no-cache bash=5.2.15-r5 \
&& rm -rf /var/cache/apk/* \
&& addgroup agent \
&& adduser -S agent -u 1000 -G agent
Expand Down
Binary file added docs/infra.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 26 additions & 26 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,22 @@ require (
k8s.io/client-go v0.28.4
)

require github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect

require (
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/stretchr/objx v0.5.0 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/emicklei/go-restful/v3 v3.9.0 // indirect
github.com/fsnotify/fsnotify v1.6.0 // indirect
github.com/go-logr/logr v1.2.4 // indirect
github.com/go-openapi/jsonpointer v0.19.6 // indirect
github.com/emicklei/go-restful/v3 v3.11.0 // indirect
github.com/fsnotify/fsnotify v1.7.0 // indirect
github.com/go-logr/logr v1.3.0 // indirect
github.com/go-openapi/jsonpointer v0.20.0 // indirect
github.com/go-openapi/jsonreference v0.20.2 // indirect
github.com/go-openapi/swag v0.22.3 // indirect
github.com/go-openapi/swag v0.22.4 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/google/gnostic-models v0.6.8 // indirect
github.com/google/go-cmp v0.5.9 // indirect
github.com/google/go-cmp v0.6.0 // indirect
github.com/google/gofuzz v1.2.0 // indirect
github.com/google/uuid v1.3.0 // indirect
github.com/google/uuid v1.4.0 // indirect
github.com/hashicorp/hcl v1.0.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
Expand All @@ -39,31 +39,31 @@ require (
github.com/sagikazarmark/locafero v0.3.0 // indirect
github.com/sagikazarmark/slog-shim v0.1.0 // indirect
github.com/sourcegraph/conc v0.3.0 // indirect
github.com/spf13/afero v1.10.0 // indirect
github.com/spf13/cast v1.5.1 // indirect
github.com/spf13/afero v1.11.0 // indirect
github.com/spf13/cast v1.6.0 // indirect
github.com/stretchr/testify v1.8.4
github.com/subosito/gotenv v1.6.0 // indirect
go.uber.org/atomic v1.9.0 // indirect
go.uber.org/multierr v1.9.0 // indirect
golang.org/x/exp v0.0.0-20230905200255-921286631fa9 // indirect
golang.org/x/net v0.17.0 // indirect
golang.org/x/oauth2 v0.12.0 // indirect
golang.org/x/sys v0.13.0 // indirect
golang.org/x/term v0.13.0 // indirect
golang.org/x/text v0.13.0 // indirect
golang.org/x/time v0.3.0 // indirect
google.golang.org/appengine v1.6.7 // indirect
go.uber.org/atomic v1.11.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/exp v0.0.0-20231127185646-65229373498e // indirect
golang.org/x/net v0.19.0 // indirect
golang.org/x/oauth2 v0.15.0 // indirect
golang.org/x/sys v0.15.0 // indirect
golang.org/x/term v0.15.0 // indirect
golang.org/x/text v0.14.0 // indirect
golang.org/x/time v0.5.0 // indirect
google.golang.org/appengine v1.6.8 // indirect
google.golang.org/protobuf v1.31.0 // indirect
gopkg.in/DATA-DOG/go-sqlmock.v2 v2.0.0-20180914054222-c19298f520d0
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/ini.v1 v1.67.0 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/api v0.28.4 // indirect
k8s.io/klog/v2 v2.100.1 // indirect
k8s.io/kube-openapi v0.0.0-20230717233707-2695361300d9 // indirect
k8s.io/utils v0.0.0-20230406110748-d93618cff8a2 // indirect
k8s.io/klog/v2 v2.110.1 // indirect
k8s.io/kube-openapi v0.0.0-20231113174909-778a5567bc1e // indirect
k8s.io/utils v0.0.0-20231127182322-b307cd553661 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.2.3 // indirect
sigs.k8s.io/yaml v1.3.0 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)
Loading

0 comments on commit f9610d8

Please sign in to comment.