-
Notifications
You must be signed in to change notification settings - Fork 0
Customer Cluster Chronical
To solve any problem we must first fully understand the problem.
Here at runnable we care about the following things. And so far we have promoted this priority
1. Build Speed
2. Stability
3. Cost Optimization
4. Performance
5. Security
Now that I have listed the priorities I will define what each entail.
Here are the factors that go into build speed:
- docker cache
-
FROMimage availability - cpu shares
- available ram
Docker cache is the availability of docker layers used in the previous builds
The more previous layers that are on the box, the more cache will be used
If we have cache misses, the RUN commands will have to be re run
The limitation here is disk space
If the FROM image is not on the box it will have to be pulled
The limitation here is disk space
The more containers that are on the box, the less CPU each container gets
The limitation here is cpus
The more ram a container has to use the more ram cache it will have
The limitation here is ram
Here are the factors that go into stability:
- built image repository
- services
- dock availability
- limiting ram
- disk space / inodes
When a dock gets unhealthy and we need to rollover, we need to migrate the image Having images stored in a repository allows us to recover the image with only pull time
in order to provide the best experience all of our services need to be robust networking / DNS / file tree / registry all need to be up
In order to run containers we need to ensure we have enough docks to run builds / containers
In order for builds and running containers to run smoothly they need enough ram. we need to limit ram so one container does not use all the systems ram
Each container needs enough disk space and inodes to perform its task if we run out of disk space or inodes containers can not function properly
here are the factors that go into cost optimization:
- number and size of docks
- size of disks
- disk and network IO
the more docks we have the higher our cost the bigger docks we have the higher our cost
The bigger the disk we put on each dock the higher our cost
The more IO a user container or build uses the higher our cost
here are the factors that go into performance:
- ram
- cpu
the more ram a container has the more performant it will be
the less cpu is shared the more performant the container will be
here are the factors that go into security:
- isolated access
containers should not be able to access anything they are not supposed to people should not be allowed to access containers they should not be
now we know what each problem entails I will detail how things can be improved
To improve docker cache we need layers to be available
- ensure layers are on docks builds are scheduled on
- distribute layers so we have high availability
- ensure FROM images are on docks builds are scheduled on
- distribute FROM images so we have high availability
ensure we run the least amount of containers per dock that we can
ensure we run the least amount of containers per dock that we can
- localhost registry
- amazon ECR (other hosted solutions like quay.io)
- ensure services are always up and robust
- autoscale groups
- correct scaling in / out
- limit ram to reasonable limit
- ensure we provide enough disk space
- clean old images
I will stop here as those are our highest priority
- less containers we schedule on docks more CPU and RAM per container, inc stability, inc performance, dec cost optimization dec build speed
- build push pull scheduling dec build speed, inc stability, inc performance, in cost optimization