Skip to content

Customer Cluster Chronical

Jorge Silva edited this page Apr 27, 2016 · 1 revision

Problems

To solve any problem we must first fully understand the problem.

Here at runnable we care about the following things. And so far we have promoted this priority

1. Build Speed
2. Stability
3. Cost Optimization
4. Performance
5. Security

Definitions

Now that I have listed the priorities I will define what each entail.

Build Speed

Here are the factors that go into build speed:

  • docker cache
  • FROM image availability
  • cpu shares
  • available ram

Docker Cache

Docker cache is the availability of docker layers used in the previous builds The more previous layers that are on the box, the more cache will be used If we have cache misses, the RUN commands will have to be re run The limitation here is disk space

FROM Image Availability

If the FROM image is not on the box it will have to be pulled The limitation here is disk space

CPU Shares

The more containers that are on the box, the less CPU each container gets The limitation here is cpus

Available RAM

The more ram a container has to use the more ram cache it will have The limitation here is ram

Stability

Here are the factors that go into stability:

  • built image repository
  • services
  • dock availability
  • limiting ram
  • disk space / inodes

Built Image Repository

When a dock gets unhealthy and we need to rollover, we need to migrate the image Having images stored in a repository allows us to recover the image with only pull time

Services

in order to provide the best experience all of our services need to be robust networking / DNS / file tree / registry all need to be up

Dock Availability

In order to run containers we need to ensure we have enough docks to run builds / containers

Limiting RAM

In order for builds and running containers to run smoothly they need enough ram. we need to limit ram so one container does not use all the systems ram

Disk Space / inodes

Each container needs enough disk space and inodes to perform its task if we run out of disk space or inodes containers can not function properly

Cost Optimization

here are the factors that go into cost optimization:

  • number and size of docks
  • size of disks
  • disk and network IO

Number and Size of Docks

the more docks we have the higher our cost the bigger docks we have the higher our cost

Size of Disks

The bigger the disk we put on each dock the higher our cost

Disk and Network IO

The more IO a user container or build uses the higher our cost

Performance

here are the factors that go into performance:

  • ram
  • cpu

RAM

the more ram a container has the more performant it will be

CPU

the less cpu is shared the more performant the container will be

Security

here are the factors that go into security:

  • isolated access

Isolated Access

containers should not be able to access anything they are not supposed to people should not be allowed to access containers they should not be

Solutions

now we know what each problem entails I will detail how things can be improved

Build speed

Docker Cache

To improve docker cache we need layers to be available

  • ensure layers are on docks builds are scheduled on
  • distribute layers so we have high availability

FROM Image Availability

  • ensure FROM images are on docks builds are scheduled on
  • distribute FROM images so we have high availability

CPU Shares

ensure we run the least amount of containers per dock that we can

Available RAM

ensure we run the least amount of containers per dock that we can

Stability

Built Image Repository

  • localhost registry
  • amazon ECR (other hosted solutions like quay.io)

Services

  • ensure services are always up and robust

Dock Availability

  • autoscale groups
  • correct scaling in / out

Limiting RAM

  • limit ram to reasonable limit

Disk Space / inodes

  • ensure we provide enough disk space
  • clean old images

Cost Optimization / Performance / Security

I will stop here as those are our highest priority

Example Tradeoffs

  • less containers we schedule on docks more CPU and RAM per container, inc stability, inc performance, dec cost optimization dec build speed
  • build push pull scheduling dec build speed, inc stability, inc performance, in cost optimization

Data

screen shot 2016-04-27 at 12 15 29 am screen shot 2016-04-27 at 12 16 48 am

Clone this wiki locally