ci: separate deploy logic for networks and nodes #2983

conorsch · 2023-09-08T16:53:59Z

Is your feature request related to a problem? Please describe.
Our CI logic currently bundles all deploy-related tasks for network provisioning and node deployment into a single action. There are actually subtle distinctions that we should manage separately:

create new network from fresh chain id (happens on preview deploys & testnet deploys)
spin up validators based on pd testnet generate output (currently the ci.sh clobbers this data)
join fullnodes to an existing network of validators (should be a repeatable action)
handle metrics per deployment

Describe the solution you'd like
Separating out these actions will enable us to manage longer-lived deployments with more confidence—for example, to support upgrade testing as described in #1804—as well as ease the deployment of ad-hoc networks to test-drive new functionality. In the past, we've done this manually, but there's no reason we can't have a point-and-click CI workflow to do it. Handling this problem would also resolve #1783, and as a side-effect, make recovery of a failed deployment possible.

Describe alternatives you've considered
We could treat the existing deployments as "good enough", but that will likely pose problems with upgrade testing.

Additional context
Three logical charts jump out at me:

penumbra-network (essentially wrapping pd testnet generate and spinning up initial validators)
penumbra-node (essentially wrapping pd testnet join and spinning up full nodes against an existing network)
penumbra-metrics (long-lived deployments to scrape pd & tm metrics endpoints)

We can ignore the provisioning logic for helper services like the bots and relayer for now.

Relevant tickets
The following should be resolved by the rewrite:

Helm release manifest too large #1783
k8s: default validator config should use PL definition #1832
Migrate to CometBFT 0.37 #2263 (at least to v0.34.x)

Progress checklist

For tracking follow-up tasks toward completion.

The text was updated successfully, but these errors were encountered:

conorsch · 2023-09-22T15:37:38Z

This is done, except for

generate ips for all envs

which I'll do as part of the teardown/release process on Monday for #3046 .

Removes reserved IPv4 addresses that are no longer used. For HTTPS services, we now use a single entry IP to a Traefik daemonset to handle traffic for all the various endpoints [0]. Regenerates the public IPs for the P2P services and commits them to version control. We do this as part of release prep for Testnet 61 [1], building on the deploy overhaul described in [2]. [0] #2341 [1] #3046 [2] #2983

Makes changes encountered while deploying Testnet 61 on the new deploy logic for the first time: * fixes a YAML whitespace error on the testnet external IPs * make sure that strategy=recreate for metrics, otherwise config changes may encountered a failed concurrent bind on the pvc * also clean up jobs, which can get stuck if there are pvc errors Made these changes locally and ran the deploy logic from my workstation to finalize the Testnet 61 setup. Couldn't use the GHA in this scenario, because of a chicken-or-egg problem: we need the change in the .0 tag, but that tag was already pushed; we can't use a .1 tag because that only modifies an existing deployment. Refs #2983, #3046.

Dusted off the compose setup and updated it to use an initcontainer, same as with the recent overhaul of deploy logic (#2983). This change also removes the requirement for the host machine to use `pd` to bootstrap the config: now, docker-compose is all that's required. The goal is to make the Penumbra containers easier to work with, for example for the block explorer push.

github-project-automation bot added this to Testnets Sep 8, 2023

conorsch moved this to Next (Steal from here) in Testnets Sep 8, 2023

conorsch changed the title ~~Separate deploy logic for networks and nodes~~ ci: separate deploy logic for networks and nodes Sep 14, 2023

conorsch moved this from Next (Steal from here) to In Progress (Already claimed) in Testnets Sep 14, 2023

conorsch self-assigned this Sep 14, 2023

This was referenced Sep 15, 2023

ci: separate deploy logic into charts #3033

Merged

deployment: support comparative performance measurement #3058

Open

conorsch moved this from In Progress (Already claimed) to Testnet 61: Dione in Testnets Sep 22, 2023

conorsch closed this as completed Sep 25, 2023

conorsch mentioned this issue Oct 12, 2023

deploy: update docker-compose setup #3184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: separate deploy logic for networks and nodes #2983

ci: separate deploy logic for networks and nodes #2983

conorsch commented Sep 8, 2023 •

edited

Loading

conorsch commented Sep 22, 2023

ci: separate deploy logic for networks and nodes #2983

ci: separate deploy logic for networks and nodes #2983

Comments

conorsch commented Sep 8, 2023 • edited Loading

conorsch commented Sep 22, 2023

conorsch commented Sep 8, 2023 •

edited

Loading