-
Notifications
You must be signed in to change notification settings - Fork 49
release checklist #1164
base: master
Are you sure you want to change the base?
release checklist #1164
Conversation
I have tagged everyone who has done the release. Please add the things that I have missed, that you might have tested during the release process. |
5519dca
to
6c3d592
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions
docs/release-process/CHECKLIST.md
Outdated
- Checkout to old release tag | ||
- e.g. `git checkout v0.1.0` | ||
|
||
- Build `lokoctl` binary from the last release | ||
- e.g. `make build` | ||
- Copy `lokoctl` binary to your assets directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use release binary instead?
docs/release-process/CHECKLIST.md
Outdated
This sections checks if components work as desired. | ||
|
||
- Check all certificates are valid | ||
- e.g. `kubectl get certificates -A` | ||
- Certificates for all your components are valid. | ||
|
||
- Check external IP is assigned to contour service. This will verify that MetalLB is assigning IP to service of type `LoadBalancer`. | ||
- `kubectl get svc -n projectcontour` | ||
|
||
- Check routes are added to AWS for your components. If you have used route53 DNS provider, you can check them [here](https://console.aws.amazon.com/route53/v2/home#Dashboard). Make sure to check the correct hosted zone. | ||
|
||
- Check Gangway Ingress Host URL that you have configured works fine. | ||
|
||
- Check httpbin Ingress Host URL that you have configured works fine. | ||
|
||
- Do some **blackbox testing** by sending HTTP requests through MetalLB + Contour + cert-manager. | ||
|
||
- Check metrics for your cluster by going to Prometheus Ingress Host URL. | ||
|
||
- Check velero component works fine, by testing it for a namespace. | ||
- Run the following commands: | ||
```sh | ||
# Create test namespace. | ||
kubectl create ns test | ||
|
||
# Create a serviceaccount. | ||
kubectl create sa test | ||
|
||
# Create velero backup. | ||
velero backup create serviceaccount-backup --include-namespaces test | ||
|
||
# Delete namespace test. | ||
kubectl delete ns test | ||
|
||
# Restore namespace using velero. | ||
velero restore create --from-backup serviceaccount-backup | ||
|
||
# Check serviceaccount test exist. | ||
kubectl get sa test | ||
|
||
``` | ||
|
||
- Check web-ui Ingress Host URL that you have configured works fine. | ||
|
||
**IMPORTANT**: Follow the whole process again with multi-cluster (controller node). | ||
|
||
If everything works fine, continue with the release process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, perhaps we could run our e2e test suite to cover all that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be added as an extra step too. For making sure everything works fine. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but I think we should prefer automated tests rather than doing this by hand.
docs/release-process/CHECKLIST.md
Outdated
- Copy `lokoctl` binary to your assets directory. | ||
|
||
- Deploy `lokomotive` with old release | ||
- e.g. `./lokoctl cluster apply` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the previous step says that build using make build
and current one says ./lokoctl cluster apply
. You don't have lokoctl binary and the lokocfg files in the same place.
We can simply say make install
in previous step. In in this step we just do lokoctl cluster apply
from this directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think previous step should be changed to use a release binary, so I think it should be OK to build + copy here.
docs/release-process/CHECKLIST.md
Outdated
0.2.0 (`v0.2.0`). | ||
|
||
- Checkout to old release tag | ||
- e.g. `git checkout v0.1.0` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the commands are in this pattern like there is for e.g. and then a command. For the commands like above I understand user needs to make changes. But for other commands where we know for sure what the command is we should put it in a code block and not two back ticks.
docs/release-process/CHECKLIST.md
Outdated
@@ -0,0 +1,84 @@ | |||
## Check list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR should also create a sample lokocfg file with all the cluster features and component features. All parameterised using hcl variables and a sample lokocfg.vars file which has empty vars. So the user (fellow developer) has to just edit those values and get going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have different platforms, we can have lokocfg files for two platforms viz. packet and aws. For that we might have to put the files in two different sub dirs.
docs/release-process/CHECKLIST.md
Outdated
|
||
- Check all certificates are valid | ||
- e.g. `kubectl get certificates -A` | ||
- Certificates for all your components are valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does valid mean in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps he meant Ready
, as the column which shows up when you list the certificates.
docs/release-process/CHECKLIST.md
Outdated
velero restore create --from-backup serviceaccount-backup | ||
|
||
# Check serviceaccount test exist. | ||
kubectl get sa test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for velero testing we should ideally point the user to the velero usage doc. https://github.com/kinvolk/lokomotive/blob/master/docs/how-to-guides/backup-rook-ceph-volumes.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User? This is developer documentation. But I agree on pointing to the docs until we make this testing automated.
docs/release-process/CHECKLIST.md
Outdated
|
||
- Do some **blackbox testing** by sending HTTP requests through MetalLB + Contour + cert-manager. | ||
|
||
- Check metrics for your cluster by going to Prometheus Ingress Host URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We can point the user to the prometheus doc and ask them to verify all the different scenarios in there if they work.
- Verify if the grafana dashboards have data.
- Verify if the prometheus targets are loaded correctly.
- Verify if the alerts are loaded correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the points here would be already covered by e2e test suite, right? Prometheus data and targets.
Perhaps we should add tests for alerts and grafana data sources then.
docs/release-process/CHECKLIST.md
Outdated
|
||
- Check web-ui Ingress Host URL that you have configured works fine. | ||
|
||
**IMPORTANT**: Follow the whole process again with multi-cluster (controller node). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general question to ponder upon is that what are the tests that are single node / multi node specific? We should by default test for multi controller setup and then figure out what is single node specific and do only those tests there, no need to test everything all over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. For controller nodes we should only test upgrade path. I like the Components test
section title, perhaps we can add something like Controlplane testing
for the first points in the document, then move this sentence there.
docs/release-process/CHECKLIST.md
Outdated
|
||
- Check httpbin Ingress Host URL that you have configured works fine. | ||
|
||
- Do some **blackbox testing** by sending HTTP requests through MetalLB + Contour + cert-manager. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be standard tool used to do blackbox testing. Maybe curl and visit once on browser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e2e test should easily cover that. We can have one which can be run manually, which won't be executed by the CI, which expects that MetalLB on Packet has correctly configured EIP.
docs/release-process/CHECKLIST.md
Outdated
|
||
- Check routes are added to AWS for your components. If you have used route53 DNS provider, you can check them [here](https://console.aws.amazon.com/route53/v2/home#Dashboard). Make sure to check the correct hosted zone. | ||
|
||
- Check Gangway Ingress Host URL that you have configured works fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gangway testing actually should verify if the authentication workflow works. This will involve going to the website and github/google auth and then using token to talk to the API server.
To extend it a bit further we could assign a role to the user email like pod reading and verify if pod reading works and other access does not work, etc.
I'll take it over for now, as @knrt10 is on holidays. |
It is no longer needed. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
So that we can have all information about relase in a single place. Signed-off-by: knrt10 <kautilya@kinvolk.io>
closes: #999 Signed-off-by: knrt10 <kautilya@kinvolk.io>
6c3d592
to
9176931
Compare
Actually, I'd rather take care of #1031, so if someone wants to pick this up, go ahead. |
This PR creates a new folder
release-process
which contains all the required docs for releasing. This also adds manual checklist, which we need to follow before making a release.closes: #999