MVP Smart load balancer / API Gateway #13

rdallman · 2017-07-27T01:44:37Z

In gitlab by @treeder on May 11, 2017, 11:15

In order to optimize various things such as:

reducing image pulls
reducing disk space for image cache
streaming inputs to running containers (Fix app updating #214) - hot functions

the load balancer will need to be smart about routing requests to a specific function to a subset of machines.

IronLB

The Problem

IronFunctions requires a load balancer to route requests to IronFunctions nodes. The problem is if we just use a regular load balancer, the requests will go to all the nodes which is very suboptimal since every machine will need to store all of the image functions, and we can't take advantage of hot/streaming containers.

The Solution

If we route requests for a particular function to a subset of
machines, we get the following benefits:

reducing image pulls
reducing disk space for image cache
streaming inputs to running containers, AKA hot containers (Fix app updating #214)

See iron-io/functions#151

We can extend an existing load balancer like Vulcand
to solve the problem. At a minimum, the load balancer(s) should be able to route function X to a fixed set of nodes (say 3 by default).

Usage

Like any other load balancer, user will start X number of IronLB nodes to route traffic to IronFunctions nodes.
The logic to route traffic to specific nodes will be baked in so there shouldn't be much more configuration than
telling the load balancers where to route traffic.

Will be delivered via a docker image.

High Level Implementation

The Docker image will start the LB and etcd.

For each request, get app_name and path (can be obtained from URL).
Check etcd for app_name.path, if exists, send traffic to one node from the list received. Else continue:
Use consistent hash or similar to route request based on app_name.path to MAX_NODES (3 default) nodes
- How to consistent hash to multiple nodes?
Store node app_name.path -> set of IPs in etcd
Send traffic to one of the nodes.

Try to implement via a Vulcand middleware.
Not sure if that's possible, I don't see a way to route to a specific server, will have to dig in. Otherwise,
fork Vulcand and add this feature to the "Backend" that handles the servers.

Also consider https://github.com/containous/traefik instead of Vulcan.

First Deliverable

Working MVP for use with IronFunctions.

Future Improvements - not part of initial scope

Additional configuration may be how many nodes to route a functions traffic too, for instance, for really high load
functions, you may want to say that a particular function can go to 10 nodes, instead of the default 3.
If we knew stats on particular routes, we could start with putting all requests to a function on one node and
increase as traffic increases.

The text was updated successfully, but these errors were encountered:

rdallman · 2017-07-27T01:44:38Z

In gitlab by @treeder on May 11, 2017, 11:23

changed title from {-API gateway / smart load balancer-} to {+MVP Smart load balancer / API Gateway+}

rdallman · 2017-07-27T01:44:38Z

In gitlab by @treeder on May 11, 2017, 11:23

changed milestone to %3

rdallman · 2017-07-27T01:44:39Z

In gitlab by @treeder on May 17, 2017, 12:06

assigned to @rdallman

rdallman · 2017-07-27T01:44:39Z

In gitlab by @treeder on May 17, 2017, 12:08

I like TJ's idea of function server returning info about it's current state to tell LB's to backoff and scale.

return 503 with json body explaining status.

cc @rdallman

rdallman · 2017-07-27T01:44:40Z

In gitlab by @treeder on May 18, 2017, 13:09

changed milestone to %1

rdallman · 2017-07-27T01:44:41Z

In gitlab by @treeder on May 30, 2017, 14:55

added ~6 label

rdallman · 2017-07-27T01:44:41Z

In gitlab by @carimura on May 31, 2017, 08:36

removed ~6 label

rdallman · 2017-07-27T01:44:42Z

In gitlab by @rdallman on May 31, 2017, 16:41

gonna not edit the body to leave those ideas laying around, but there were some deviations taken. have a working version that works so long as there's only 1 LB, mostly just need to fix that and then beef things up a bit (posting pr soon-ish). going to add a checklist of things to do as a second brain:

integrate lb with data store, to store a list of fn nodes (can share db with fn -- right now it's just in RAM). each lb polls list of nodes every 1s-ish so that multiple lbs all have same list of nodes [and same hash function so routing is same] -- this doesn't (shouldn't?) have to be perfect, 1s window should be plenty. possibly covered by swapping to more robust lb/proxy code.
move routing logic into robust lb that handles connection draining, re-routing failed in flight requests, circuit breaking. tbd as if we don't need many more features then it could be better to just implement what we actually want (better maintainability, possible initial robustness hit)
tinker with load shedding ratios, and add ability to drop a request altogether (504 it) if wait times are really high. since we aren't limiting concurrency now we can probably get pretty aggressive, need to account for hot vs not still though and don't want to look it up in lb (when can we make them all hot? ;))

rdallman · 2017-07-27T01:44:43Z

In gitlab by @treeder on Jun 13, 2017, 10:14

closed

rdallman self-assigned this Jul 27, 2017

rdallman closed this as completed Jul 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MVP Smart load balancer / API Gateway #13

MVP Smart load balancer / API Gateway #13

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

MVP Smart load balancer / API Gateway #13

MVP Smart load balancer / API Gateway #13

Comments

rdallman commented Jul 27, 2017

IronLB

The Problem

The Solution

Usage

High Level Implementation

First Deliverable

Future Improvements - not part of initial scope

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017

rdallman commented Jul 27, 2017