Skip to content

Commit

Permalink
Merge pull request #1370 from wireapp/release_2021_02_16
Browse files Browse the repository at this point in the history
Release 2021-02-16
  • Loading branch information
smatting authored Feb 18, 2021
2 parents 064fd77 + 99748bf commit 175de67
Show file tree
Hide file tree
Showing 28 changed files with 552 additions and 231 deletions.
11 changes: 8 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,19 @@
-->

# [2020-02-15]
# [2020-02-16]

## Release Notes

This release requires recreating brig's ES index *before* deployment. See [instructions](https://github.com/wireapp/wire-server/blob/e3064d101ef8e9074431049135d2319335de3117/docs/reference/elasticsearch-migration-2021-02-15.md).
This release might require manual migration steps, see [ElasticSearch migration instructions for release 2021-02-16 ](https://github.com/wireapp/wire-server/blob/c81a189d0dc8916b72ef20d9607888618cb22598/docs/reference/elasticsearch-migration-2021-02-16.md).

## Features

* Team search: Add search by email (#1344) (#1286)
* Add endpoint to get client metadata for many users (#1345)
* Public end-point for getting the team size. (#1295)
* sftd: add support for multiple SFT servers (#1325) (#1377)
* SAML allow enveloped signatures (#1375)

## Bug fixes and other updates

Expand All @@ -48,7 +50,10 @@ This release requires recreating brig's ES index *before* deployment. See [instr
* Add missing internal qa routes (#1336)
* Extract and rename PolyLog to a library for reusability (#1329)
* Fix: Spar integration tests misconfigured on CI (#1343)

* Bump ormolu version (#1366, #1368)
* Update ES upgrade path (#1339) (#1376)
* Bump saml2-web-sso version to latest upstream (#1369)
* Add docs for deriving-swagger2 (#1373)

# [2020-01-15]

Expand Down
2 changes: 1 addition & 1 deletion charts/cannon/templates/statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ spec:
{{ toYaml .Values.resources | indent 12 }}
initContainers:
- name: cannon-configurator
image: alpine
image: alpine:3.13.1
command:
- /bin/sh
args:
Expand Down
2 changes: 1 addition & 1 deletion charts/demo-smtp/values.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
fullnameOverride: demo-smtp
replicaCount: 1
image: "namshi/smtp@sha256:aa63b8de68ce63dfcf848c56f3c1a16d81354f4accd4242a0086c57dd5a91d77"
image: "quay.io/wire/namshi-smtp:aa63b8"

service:
port: 25
Expand Down
8 changes: 6 additions & 2 deletions charts/elasticsearch-index/templates/create-index.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,14 @@ spec:
- --elasticsearch-server
- "http://{{ required "missing elasticsearch-index.elasticsearch.host!" .Values.elasticsearch.host }}:{{ .Values.elasticsearch.port }}"
- --elasticsearch-index
- "{{ .Values.elasticsearch.index }}"
- "{{ or (.Values.elasticsearch.additionalWriteIndex) (.Values.elasticsearch.index) }}"
- --elasticsearch-shards=5
- --elasticsearch-replicas=2
- --elasticsearch-refresh-interval=5
{{- if .Values.elasticsearch.delete_template }}
- --delete-template
- "{{ .Values.elasticsearch.delete_template }}"
{{- end}}
containers:
- name: brig-index-update-mapping
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
Expand All @@ -45,4 +49,4 @@ spec:
- --elasticsearch-server
- "http://{{ required "missing elasticsearch-index.elasticsearch.host!" .Values.elasticsearch.host }}:{{ .Values.elasticsearch.port }}"
- --elasticsearch-index
- "{{ .Values.elasticsearch.index }}"
- "{{ or (.Values.elasticsearch.additionalWriteIndex) (.Values.elasticsearch.index) }}"
2 changes: 1 addition & 1 deletion charts/elasticsearch-index/templates/migrate-data.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ spec:
- --elasticsearch-server
- "http://{{ required "missing elasticsearch-index.elasticsearch.host!" .Values.elasticsearch.host }}:{{ .Values.elasticsearch.port }}"
- --elasticsearch-index
- "{{ .Values.elasticsearch.index }}"
- "{{ or (.Values.elasticsearch.additionalWriteIndex) (.Values.elasticsearch.index) }}"
- --cassandra-host
- "{{ required "missing elasticsearch-index.cassandra.host!" .Values.cassandra.host }}"
- --cassandra-port
Expand Down
1 change: 1 addition & 0 deletions charts/elasticsearch-index/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ elasticsearch:
#host: # elasticsearch-client|elasticsearch-ephemeral
port: 9200
index: directory
delete_template: directory
cassandra:
# host:
port: 9042
Expand Down
2 changes: 1 addition & 1 deletion charts/reaper/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ spec:
containers:
- name: reaper
imagePullPolicy: Always
image: roffe/kubectl:v1.13.2
image: bitnami/kubectl:1.19.7
command: ["bash"]
args:
- -c
Expand Down
176 changes: 161 additions & 15 deletions charts/sftd/README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,80 @@
# SFTD Chart

In theory the `sftd` chart can be installed on its own, but it's usually
installed as part of the `wire-server` umbrella chart.

## Parameters

### Required
| Parameter | Description |
|-----------------|---------------------------------------------------------------------------------------------|
| `host` | The domain name on which the SFT will be reachable. Should point to your ingress controller |
| `allowOrigin` | Allows CORS requests on this domain. Set this to the domain of your wire webapp. |


### Bring your own certificate
| Parameter | Description |
|-----------------|---------------------------------------------------------------------------------------------|
| `tls.key` | Private key of the TLS certificate for `host` |
| `tls.crt` | TLS certificate for `host` |

### Cert-manager certificate

| Parameter | Description |
|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| `tls.issuerRef` | describes what [Issuer](https://cert-manager.io/docs/reference/api-docs/#meta.cert-manager.io/v1.ObjectReference) to use to request a certificate |


### Other (optional) parameters

| Parameter | Default | Description |
|---------------------------------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `terminationGracePeriodSeconds` | `10` | The time to wait after terminating an sft node before shutting it down. Useful to wait for a pod to have less calls before shutting down. Pod won't take new calls whilst terminating |
| `replicaCount` | `1` | Amount of SFT servers to run. Only one SFT server can run per node. So `replicaCount <= nodeCount` |
| `nodeSelector`, `affinity` | `{}` | Used to constraint SFT servers to only run on specific nodes |

Please see [values.yaml](./values.yaml) for an overview of other parameters that can be configured.

## Deploy

Replace `example.com` with your own domain here.

Using your own certificates:
#### As part of `wire-server` umbrella chart

The `sftd` can be deployed as part of the `wire-server` umbrella chart. You can
edit the `values.yaml` of your `wire-server` chart to configure sftd.

```yaml
tags:
sftd: true

sftd:
host: sftd.example.com
allowOrigin: https://webapp.example.com
tls:
# The https://cert-manager.io issuer to use to retrieve a certificate
issuerRef:
kind: ClusterIssuer
name: letsencrypt-prod
```
#### Standalone
You can also install `sftd` as stand-alone. This is useful if you want to be
more careful with releases and want to decouple the release lifecycle of `sftd`
and `wire-server`. For example, if you set `terminationGracePeriodSeconds` to
allow calls to drain to a large number (say a few hours), this would make the
deployment of the `wire-server` umbrella-chart that usually is snappy to run
very slow.


```
helm install sftd wire/sftd \
helm install sftd wire/sftd \
--set host=sftd.example.com \
--set allowOrigin=https://webapp.example.com \
--set-file tls.crt=/path/to/tls.crt \
--set-file tls.key=/path/to/tls.key
```
Using Cert-manager:
```
helm install sftd wire/sftd \
--set host=example.com \
--set allowOrigin=https://webapp.example.com \
--set tls.issuerRef.name=letsencrypt-staging
```
the `host` option will be used to set up an `Ingress` object.
Expand All @@ -31,12 +85,78 @@ You can switch between `cert-manager` and own-provided certificates at any
time. Helm will delete the `sftd` secret automatically and then cert-manager
will create it instead.
It is important that `allowOrigin` is synced with the domain where the web app is hosted
`allowOrigin` MUST be in sync the domain where the web app is hosted
as configured in the `wire-server` chart or the webapp will not be able to contact the SFT
server.
You should configure `brig` to hand out the SFT server to clients by setting
`brig.optSettings.setSftStaticUrl=https://sftd.example.com:443` on the `wire-server` chart
You MUST configure `brig` to hand out the SFT server to clients, in order for clients to be
able to use the new conference calling features:
```yaml
brig:
# ...
optSettings:
# ...
setSftStaticUrl: https://sftd.example.com:443
```

## Routability

We currently require network connectivity between clients and the SFT server
and between the SFT server and the restund servers. In other words; the SFT
server needs to be directly reachable on its public IP to clients and should be
able to reach the restund servers on their public IPs.

More exotic setups _are_ possible but are currently *not* officially supported. Please
contact us if you have different constraints.

## Rollout

Kubernetes will shut down pods and start new ones when rolling out a release. Any calls
that were in progress on said pod will be terminated and will cause the call to drop.

Kubernetes can be configured to wait for a certain amount of seconds before
stopping the pod. During this timeframe new calls wil not be initiated on the
pod, but existing calls will also not be disrupted. If you want to roll out a
release with minimal impact you can set the
[`terminationGracePeriodSeconds`](./values.yaml#L18) option to the maximum
length you want to wait before cutting off calls.

For example to cordon SFTs for one hour before dropping calls:
```
helm upgrade sftd wire/sftd --set terminationGracePeriodSeconds=3600
```

Currently due to the fact we're using a `StatefulSet` to orchestrate update
rollouts, and `StatefulSet`s will not replace all pods at once but instead
one-for-one (aka. *rolling update*), a rollout of a release will take `oldReplicas * terminationGracePeriodSeconds`
to complete.


## Scaling up or down

You can scale up and down by specifying `replicas`:

```yaml
sftd:
replicaCount: 3
```
By default we provision *1* replica.
Note that due to the usage of `hostNetwork` there can only be _one_ instance of
`sftd` per Kubernetes node. You will need as many nodes available as you have
replicas.

As a rule of thumb we support *50* concurrent connections per *1 vCPU*. These
numbers might improve as we work on optimizing the SFTD code. You should adjust
the amount of replicas based on your expected usage patterns and Kubernetes
node specifications.

If you're using a Kubernetes cloud offering, we recommend setting up cluster
auto-scaling so that you automatically provision new Kubernetes nodes when the
amount of replicas increases above the amount of nodes available.


## Multiple sftd deployments in a single cluster
Expand Down Expand Up @@ -69,8 +189,8 @@ node4
Then we can make two `sftd` deployments and make sure Kubernetes schedules them on distinct set of nodes:
```
helm install sftd-prod charts/sftd --set 'nodeSelector.wire\.com/role=sftd-prod' ...other-flags
helm install sftd-staging charts/sftd --set 'nodeSelector.wire\.com/role=sftd-staging' ...other-flags
helm install wire-prod charts/wire-server --set 'nodeSelector.wire\.com/role=sftd-prod' ...other-flags
helm install wire-staging charts/wire-server --set 'nodeSelector.wire\.com/role=sftd-staging' ...other-flags
```
## No public IP on default interface
Expand Down Expand Up @@ -110,3 +230,29 @@ kernel for free ports, which by default are in the `32768-61000` range
On a default installation these ranges do not overlap and sftd should never have
conflicts with kubernetes components. You should however check that on your OS
these ranges aren't configured differently.



# Future work

We're (ab-)using a `StatefulSet` to give each pod a stable DNS name and use
that to route call join requests to the right calling service.

Downside of `StatefulSet` is that rollouts are slow -- propoerionally to how
high you set `terminationGracePeriodSeconds`.

However, it seems that `coredns` supports to be configured to have the same DNS
behaviour for any pods, not just pods in `StatefulSet`s.
(https://github.com/kubernetes/kubernetes/issues/47992#issuecomment-499580692)

This requires a deployer of wire to edit their cluster's CoreDNS config to set
the
[`endpoint_pod_names`](https://github.com/coredns/coredns/tree/master/plugin/kubernetes)
option which they might not have the ability to do.

If you are able to set this setting, you could use a `Deployment` instead of a
`StatefulSet`. The benefit of a `Deployment` is that it replaces all pods at
once; such that you do not have to wait `replicaCount *
terminationGracePeriodSeconds` for a rollout to finish but just
`terminationGracePeriodSeconds`. This drastically improves operations. We
should expose this as an option for a future release.
9 changes: 9 additions & 0 deletions charts/sftd/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,11 @@ app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}
{{- define "sftd.join-call.labels" -}}
helm.sh/chart: {{ include "sftd.chart" . }}
{{ include "sftd.join-call.selectorLabels" . }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{/*
Selector labels
Expand All @@ -49,3 +54,7 @@ Selector labels
app.kubernetes.io/name: {{ include "sftd.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{- define "sftd.join-call.selectorLabels" -}}
app.kubernetes.io/name: join-call
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
20 changes: 20 additions & 0 deletions charts/sftd/templates/configmap-join-call.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "sftd.fullname" . }}-join-call
labels:
{{- include "sftd.join-call.labels" . | nindent 4 }}

data:
default.conf.template: |
server {
listen 8080;
resolver ${NAMESERVER};
location /healthz { return 204; }
location ~ ^/sfts/([a-z0-9\-]+)/(.*) {
proxy_pass http://$1.sftd.${POD_NAMESPACE}.svc.cluster.local:8585/$2;
}
}
Loading

0 comments on commit 175de67

Please sign in to comment.