-
Notifications
You must be signed in to change notification settings - Fork 8
Overview
Upena (OohPenAh) and Routing Bird Overview
Routing Bird’s Initial Requirements:
- Need a way to do canary deployments.
- Need a way for developers to rapidly iterate.
- Need a way test config changes before the service has live load.
Basic system:
Basic system augmented with Routing Bird:
Developer and Routes:
Tenants and Routes:
Green Tenant Example use cases:
- Developer wants to add new features to service ‘B’.
- Service B is having issues in prod so a second instance is stood up that takes ‘Green Tenant’ traffic allowing exploration and troubleshooting.
- Deploying a new version of service ‘B’ to production.
- Continuous deployment can deploy candidate instance of ‘B’.
- A host image, BIOS, etc needs to be upgraded so service ‘B’ is deployed on upgraded host before the upgrade is applied to all hosts.
Benefits of a system that lets us route based on Tenant:
- Continuous deployment
- Ability to roll sideways instead of rolling back when thing go wrong.
- Developer Productivity
- Developers do not need to run the entire system locally.
- Need a way to not have to standup up an entire cluster remotely for single or group of developers.
- Developers can safely and iteratively run their code in an environment before committing.
- Reduces the total number of clusters we have to run.
UPENA
Upena (hawaiian word for ‘Net’ pron. OohPenAh) is an ensemble of conventions that allows safe iterative deployments of services to a cluster as well as dynamically routing traffic between these services based on tenancy. For developers to be effective they need a way to interactively develop their services. They need a system that allows them to safely start an instance of a service on a particular node and then route some subset of traffic to/through that instance. To support this we created Upena. What follows is a explanation of how Upena works. Upena can be broken down into two services, a library, and an on disk convention. The two services are Upena-declaration and Upena-nanny. The library is called Upena-routing and the on disk convention Upena-deployable. Here is a description of what they are and do.
Upena-declaration: is not a service discovery solution. It is a service declaration solution. This service supports creation, retrieval, updating and deletion of cluster-ids, host-ids, service-ids, release-group-ids, and instance-ids. Upena-declaration answers the following two questions.
-
- InstancesForHost: For this host-id what instance-id should be running on said host-id.
-
- WhoCanIConnectTo: For this tenantId + instance-id + serviceId(connect to) what are available instance-ids.
Upena-routing: is a collection of tenant aware proxies implementations for the following clients. Http, Kafka, HDFS, NGINX, Zookeeper, and HBase. (Others will be implemented as need). A upena-routing proxy implementation is provided four things at startup. 1. an instance-id, 2. a hostname to access upena-declaration, 3. a port to access upena-declaration, and 4. the serviceId that is desired to be connected to. At runtime the tenantId is provided which completes the required state to ask upena-declaration “WhoCanIConnectTo”. It isn’t reasonable to ask who can be connected to for every call. Who can be connected to is cached and this cache can be cleared by calling a RESTful endpoint or via a periodic clear. Upena-deployable: is a tar.gz file that has a very specific directory layout which allows it to be managed by upena-nanny.
- ./bin/download // An executable that will pull the desired artifacts.
- ./bin/init // Called only once the first time the deployable is laid down.
- ./bin/config // Called before every start ./bin/start // starts all desired process for said deployable.
- ./bin/status // tells nanny if the process is happy or not via exitCode 0 or 1.
- ./bin/kill // kills all the processes related to said deployable
- ./etc/…
- ./var/log/...
Upena-nanny: is a daemon process which is associated with a given host-id that runs on every node in the cluster. Upena-nanny pulls its updates from upena-declaration via “InstancesForHost” and then makes a best effort to deployed/undeploy and start/stop the provided instances. If for any reason upena-declaration is unreachable, upena-nanny will maintain the last provided set of instance. You can ask upena-nanny for a nanny-report. This report contains the states of all the service that upena-nanny is responsible for. A key requirement the nanny fulfills is providing the appropriate instance-id to each service at startup time. So far upena-nanny has been described as a service that pulls for changes. It is also possible to run Upena-nanny in a passive role where an external solution coordinates when instances are pushed to Upena-nanny.
Here is a diagram that shows for the active nanny (pull) who depends on who based on the state needed to make it all work.
Here is a diagram that shows for the passive nanny (push) who depends on who based on the state needed to make it all work.
It turns out that the continuous deployment, canary testing and upgrades to prod are a natural subset of the system required to support developers. When we want to do a canary test we create a new release group and deploy out all the changed services under this release group. We can then route our integration tests and bots to point at this release group for a test tenant/s. When the release group tests out we then grow the number of instance to support all of the load and switch the production traffic over to the release group by updating the cluster definition. The same pattern is applied to support continuous deployment to the dev cluster.
Process boundaries:
Definition and terminology:
**Tenant-id **- a globally unique id used to identify a customer. Tenant-ids can be associated with one and only one release-group-id. This release-group-id is typically not specified.
Service-id - a globally unique id used to identify a service. A service-id should be thought of as the ‘class’ of service and does not represent a running instance. A running instance of a given service is identified using an instance-id. Service-ids are the identity that is used to allow a given instance to locate another service without having knowledge of the actual instances.
Cluster-id - a globally unique id that is used to segregate a collection of service-ids. A tenant id will only ever be routed to other instances that are in the same cluster. A cluster-id references a cluster definition which declares what the default release-group-id is for a given serviceId within said cluster is.
Host-id - a globally unique id that is used to identify a given node within a cluster-id. Host-ids are associated with one and only one cluster-id.
Release-group-id - a globally unique id that creates a subcluster of segregated services. Once a tenant enters a given release group said tenant will be routed within that release group for the declared services. If a service within a release-group talks to a service that is not part of said release group then the release-groups declared in the cluster definition will be used.
Instance-id - a globally unique id used to identify an instance of a given service within a given release group within a given cluster on a given host. Instance ids are a composite id produced by combining a cluster-id, a host-id, a service-id, a release-group-id and a number which is typically 1 indexed and monotonically increasing.