Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

release checklist #1164

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

release checklist #1164

wants to merge 3 commits into from

Conversation

knrt10
Copy link
Member

@knrt10 knrt10 commented Nov 5, 2020

This PR creates a new folder release-process which contains all the required docs for releasing. This also adds manual checklist, which we need to follow before making a release.

closes: #999

@knrt10
Copy link
Member Author

knrt10 commented Nov 5, 2020

I have tagged everyone who has done the release. Please add the things that I have missed, that you might have tested during the release process.

@knrt10 knrt10 force-pushed the knrt10/release-checklist branch 2 times, most recently from 5519dca to 6c3d592 Compare November 5, 2020 08:57
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions

docs/installer/lokoctl.md Outdated Show resolved Hide resolved
docs/release-process/RELEASING.md Outdated Show resolved Hide resolved
Comment on lines 8 to 13
- Checkout to old release tag
- e.g. `git checkout v0.1.0`

- Build `lokoctl` binary from the last release
- e.g. `make build`
- Copy `lokoctl` binary to your assets directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use release binary instead?

Comment on lines 38 to 84
This sections checks if components work as desired.

- Check all certificates are valid
- e.g. `kubectl get certificates -A`
- Certificates for all your components are valid.

- Check external IP is assigned to contour service. This will verify that MetalLB is assigning IP to service of type `LoadBalancer`.
- `kubectl get svc -n projectcontour`

- Check routes are added to AWS for your components. If you have used route53 DNS provider, you can check them [here](https://console.aws.amazon.com/route53/v2/home#Dashboard). Make sure to check the correct hosted zone.

- Check Gangway Ingress Host URL that you have configured works fine.

- Check httpbin Ingress Host URL that you have configured works fine.

- Do some **blackbox testing** by sending HTTP requests through MetalLB + Contour + cert-manager.

- Check metrics for your cluster by going to Prometheus Ingress Host URL.

- Check velero component works fine, by testing it for a namespace.
- Run the following commands:
```sh
# Create test namespace.
kubectl create ns test

# Create a serviceaccount.
kubectl create sa test

# Create velero backup.
velero backup create serviceaccount-backup --include-namespaces test

# Delete namespace test.
kubectl delete ns test

# Restore namespace using velero.
velero restore create --from-backup serviceaccount-backup

# Check serviceaccount test exist.
kubectl get sa test

```

- Check web-ui Ingress Host URL that you have configured works fine.

**IMPORTANT**: Follow the whole process again with multi-cluster (controller node).

If everything works fine, continue with the release process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, perhaps we could run our e2e test suite to cover all that.

Copy link
Member Author

@knrt10 knrt10 Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be added as an extra step too. For making sure everything works fine. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but I think we should prefer automated tests rather than doing this by hand.

docs/release-process/CHECKLIST.md Outdated Show resolved Hide resolved
- Copy `lokoctl` binary to your assets directory.

- Deploy `lokomotive` with old release
- e.g. `./lokoctl cluster apply`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the previous step says that build using make build and current one says ./lokoctl cluster apply. You don't have lokoctl binary and the lokocfg files in the same place.

We can simply say make install in previous step. In in this step we just do lokoctl cluster apply from this directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think previous step should be changed to use a release binary, so I think it should be OK to build + copy here.

0.2.0 (`v0.2.0`).

- Checkout to old release tag
- e.g. `git checkout v0.1.0`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the commands are in this pattern like there is for e.g. and then a command. For the commands like above I understand user needs to make changes. But for other commands where we know for sure what the command is we should put it in a code block and not two back ticks.

@@ -0,0 +1,84 @@
## Check list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should also create a sample lokocfg file with all the cluster features and component features. All parameterised using hcl variables and a sample lokocfg.vars file which has empty vars. So the user (fellow developer) has to just edit those values and get going.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have different platforms, we can have lokocfg files for two platforms viz. packet and aws. For that we might have to put the files in two different sub dirs.


- Check all certificates are valid
- e.g. `kubectl get certificates -A`
- Certificates for all your components are valid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does valid mean in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps he meant Ready, as the column which shows up when you list the certificates.

velero restore create --from-backup serviceaccount-backup

# Check serviceaccount test exist.
kubectl get sa test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for velero testing we should ideally point the user to the velero usage doc. https://github.com/kinvolk/lokomotive/blob/master/docs/how-to-guides/backup-rook-ceph-volumes.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User? This is developer documentation. But I agree on pointing to the docs until we make this testing automated.


- Do some **blackbox testing** by sending HTTP requests through MetalLB + Contour + cert-manager.

- Check metrics for your cluster by going to Prometheus Ingress Host URL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We can point the user to the prometheus doc and ask them to verify all the different scenarios in there if they work.
  • Verify if the grafana dashboards have data.
  • Verify if the prometheus targets are loaded correctly.
  • Verify if the alerts are loaded correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the points here would be already covered by e2e test suite, right? Prometheus data and targets.

Perhaps we should add tests for alerts and grafana data sources then.


- Check web-ui Ingress Host URL that you have configured works fine.

**IMPORTANT**: Follow the whole process again with multi-cluster (controller node).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general question to ponder upon is that what are the tests that are single node / multi node specific? We should by default test for multi controller setup and then figure out what is single node specific and do only those tests there, no need to test everything all over again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. For controller nodes we should only test upgrade path. I like the Components test section title, perhaps we can add something like Controlplane testing for the first points in the document, then move this sentence there.


- Check httpbin Ingress Host URL that you have configured works fine.

- Do some **blackbox testing** by sending HTTP requests through MetalLB + Contour + cert-manager.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be standard tool used to do blackbox testing. Maybe curl and visit once on browser?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e test should easily cover that. We can have one which can be run manually, which won't be executed by the CI, which expects that MetalLB on Packet has correctly configured EIP.


- Check routes are added to AWS for your components. If you have used route53 DNS provider, you can check them [here](https://console.aws.amazon.com/route53/v2/home#Dashboard). Make sure to check the correct hosted zone.

- Check Gangway Ingress Host URL that you have configured works fine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gangway testing actually should verify if the authentication workflow works. This will involve going to the website and github/google auth and then using token to talk to the API server.

To extend it a bit further we could assign a role to the user email like pod reading and verify if pod reading works and other access does not work, etc.

@invidian
Copy link
Member

I'll take it over for now, as @knrt10 is on holidays.

invidian and others added 3 commits November 17, 2020 10:16
It is no longer needed.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
So that we can have all information about relase in a single place.

Signed-off-by: knrt10 <kautilya@kinvolk.io>
closes: #999
Signed-off-by: knrt10 <kautilya@kinvolk.io>
@invidian invidian force-pushed the knrt10/release-checklist branch from 6c3d592 to 9176931 Compare November 17, 2020 09:16
@invidian invidian added the priority/P3 Low priority label Nov 17, 2020
@invidian
Copy link
Member

Actually, I'd rather take care of #1031, so if someone wants to pick this up, go ahead.

@invidian invidian removed their assignment Jun 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/P3 Low priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a "manual testing" checklist before release
3 participants