-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace current testnet-preview
deployment with new k8s deployment
#1659
Comments
The current k8s deployment provides TLS access to the Tendermint RPC endpoint (load-balancing over fullnodes). We should provide an additional endpoint that gives TLS access to the We are not in a position to use a TLS endpoint from |
Took a look at what's required here for cut-over. We currently run two discrete testnets:
Right now, the k8s deployment logic assumes there's only one testnet, and it destructively resets on updates. That's already a great match for how we manage testnet-preview, but we want to do both on k8s. I'll work on adding a few more knobs to the new deployment logic, so we can set HELM_RELEASE or similar and touch only the proper set of testnet resources during CI runs. |
WIP branch coming together at https://github.com/penumbra-zone/penumbra/tree/1659-testnet-preview-via-k8s. Mostly that diff is adding comments, docs, and some refactoring of the test scripts to make more space for multiple environments. I haven't created a separate cluster, but the Terraform logic is already present to do so. Currently working on:
Once those problems are resolved, I'll move on to creating side-by-side environments, and touch up the script as necessary to make sure that subsequent deployments don't clobber unwanted resources. |
Cluster config for "testnet" setup is solid, will PR in some housekeeping changes with more docs, comments, and labels throughout. Encountered a problem when I tried to deploy "testnet-preview":
So it appears we've exhausted our account limit on global reserved IPs. I'll see if we can raise that limit, but more likely we'll need to switch to a lower tier of reserved IP to sidestep that limit. |
Looked into the IP quota issue. There are actually two limits in play: For posterity: |
Comparing the two testnet deployments for disparities, it looks like |
Not yet implemented on #1719; relevant docs are here https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_run |
Currently working on sorting out the implicit workflow dependencies, and making them explicit. For instance, do we want to build container images if the tests fail? We do not! However, that's currently how things work: the container images get built regardless of the state of other workflows. Similarly, we must strictly order the workflows so that 1) tests pass; then 2) container image is built; then 3) a deploy is made to the relevant environment. GitHub Actions will allow us to chain up to a maximum of three (3) workflows:
Other potential footguns include the need to manually inspect a previous workflow run and inspect whether it failed: by default, a failed dependency workflow will still trigger execution of the dependent workflow, which I still find surprising; additionally, it may not be possible to inspect whether a dependency workflow was triggered due to a tag or a branch change (which is important for us because it's how we gate testnet vs testnet-preview deploys). In the short term, I may opt to copy/paste several workflows and embed them as jobs, to take advantage of more finegrained control of trigger events. Next testnet is due Monday, 2022-12-12, and I'd very much like to use the new setup. Today, 2022-12-08, I plan to cut over |
This is done: testnet-preview.penumbra.zone now points to the new k8s deployment. Post-merge it was automatically updated. terminal output monitoring rollout, for those interested
Still more work to do on the workflow dependencies for Monday's deployment; I'll pick that back up tomorrow. |
Calling this done for now. Here's a recent automatic deploy of testnet-preview to the k8s cluster: https://github.com/penumbra-zone/penumbra/actions/runs/3661278131 Come Monday, we'll need to update the A record for
We'll do that as part of the testnet deploy. Already lowered the TTL 30m -> 5m in prep for the cut-over. |
Was not able to use the new cluster setup for testnet 038 today (#1743). In the interest of The root cause of the botched cluster deployment lies in my oversight last week of mistakenly deploying the testnet tag to the preview environment (#1744); this was fixed this morning in 17a3267, but the late discovery of the misconfiguration means we did not have an adequate "preview" environment to observe the most recent cluster config. As such, I suspect we missed identifying some breaking changes recently. As a result, the current state of our deployments is a bit brittle right now. To wit:
Starting tomorrow, I'll focus on unbreaking testnet-preview, since that's our canary in the coal mine. Once preview is happy again, I'll resume deploys of testnet-on-k8s, and provide updates here. |
This is done: testnet-preview is now served via k8s, and has been since 2022-12-12, via 5b42c45. I'll open another issue tracking the transition of |
Is your feature request related to a problem? Please describe.
We should try to move over to the new k8s deployment system built by Strangelove, and start with replacing
testnet-preview
. The goal oftestnet-preview
is that it should be an exact preview of what would be deployed if the current state of themain
branch were tagged as a release. This ensures that there are no deployment surprises when tagging a release, and allows testing client protocols against the current state of themain
branch.The only difference between
testnet-preview.
andtestnet.
should be that when deployingtestnet.
, we pass the--preserve-chain-id
parameter topd testnet generate
to avoid randomizing the chain ID (since there should only be one deployment per tag).Describe the solution you'd like
main
and uses the latest container images (does it need to wait for them to be built?) - ci: set explicit workflow dependencies #1730/status
endpoints between e.g. http://testnet-preview.penumbra.zone:26657/status & http://fullnode.testnet-preview.penumbra.zone:26657/statusThe text was updated successfully, but these errors were encountered: