Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVP Smart load balancer / API Gateway #13

Closed
rdallman opened this issue Jul 27, 2017 · 9 comments
Closed

MVP Smart load balancer / API Gateway #13

rdallman opened this issue Jul 27, 2017 · 9 comments
Assignees

Comments

@rdallman
Copy link
Contributor

In gitlab by @treeder on May 11, 2017, 11:15

In order to optimize various things such as:

  • reducing image pulls
  • reducing disk space for image cache
  • streaming inputs to running containers (Fix app updating #214) - hot functions

the load balancer will need to be smart about routing requests to a specific function to a subset of machines.

IronLB

The Problem

IronFunctions requires a load balancer to route requests to IronFunctions nodes. The problem is if we just use a regular load balancer, the requests will go to all the nodes which is very suboptimal since every machine will need to store all of the image functions, and we can't take advantage of hot/streaming containers.

The Solution

If we route requests for a particular function to a subset of
machines, we get the following benefits:

  • reducing image pulls
  • reducing disk space for image cache
  • streaming inputs to running containers, AKA hot containers (Fix app updating #214)

See iron-io/functions#151

We can extend an existing load balancer like Vulcand
to solve the problem. At a minimum, the load balancer(s) should be able to route function X to a fixed set of nodes (say 3 by default).

lb-drawing

Usage

Like any other load balancer, user will start X number of IronLB nodes to route traffic to IronFunctions nodes.
The logic to route traffic to specific nodes will be baked in so there shouldn't be much more configuration than
telling the load balancers where to route traffic.

Will be delivered via a docker image.

High Level Implementation

The Docker image will start the LB and etcd.

  • For each request, get app_name and path (can be obtained from URL).
  • Check etcd for app_name.path, if exists, send traffic to one node from the list received. Else continue:
  • Use consistent hash or similar to route request based on app_name.path to MAX_NODES (3 default) nodes
    • How to consistent hash to multiple nodes?
  • Store node app_name.path -> set of IPs in etcd
  • Send traffic to one of the nodes.

Try to implement via a Vulcand middleware.
Not sure if that's possible, I don't see a way to route to a specific server, will have to dig in. Otherwise,
fork Vulcand and add this feature to the "Backend" that handles the servers.

Also consider https://github.com/containous/traefik instead of Vulcan.

First Deliverable

Working MVP for use with IronFunctions.

Future Improvements - not part of initial scope

  • Additional configuration may be how many nodes to route a functions traffic too, for instance, for really high load
    functions, you may want to say that a particular function can go to 10 nodes, instead of the default 3.
  • If we knew stats on particular routes, we could start with putting all requests to a function on one node and
    increase as traffic increases.
@rdallman rdallman self-assigned this Jul 27, 2017
@rdallman
Copy link
Contributor Author

In gitlab by @treeder on May 11, 2017, 11:23

changed title from {-API gateway / smart load balancer-} to {+MVP Smart load balancer / API Gateway+}

@rdallman
Copy link
Contributor Author

In gitlab by @treeder on May 11, 2017, 11:23

changed milestone to %3

@rdallman
Copy link
Contributor Author

In gitlab by @treeder on May 17, 2017, 12:06

assigned to @rdallman

@rdallman
Copy link
Contributor Author

In gitlab by @treeder on May 17, 2017, 12:08

I like TJ's idea of function server returning info about it's current state to tell LB's to backoff and scale.

return 503 with json body explaining status.

cc @rdallman

@rdallman
Copy link
Contributor Author

In gitlab by @treeder on May 18, 2017, 13:09

changed milestone to %1

@rdallman
Copy link
Contributor Author

In gitlab by @treeder on May 30, 2017, 14:55

added ~6 label

@rdallman
Copy link
Contributor Author

In gitlab by @carimura on May 31, 2017, 08:36

removed ~6 label

@rdallman
Copy link
Contributor Author

In gitlab by @rdallman on May 31, 2017, 16:41

gonna not edit the body to leave those ideas laying around, but there were some deviations taken. have a working version that works so long as there's only 1 LB, mostly just need to fix that and then beef things up a bit (posting pr soon-ish). going to add a checklist of things to do as a second brain:

  • integrate lb with data store, to store a list of fn nodes (can share db with fn -- right now it's just in RAM). each lb polls list of nodes every 1s-ish so that multiple lbs all have same list of nodes [and same hash function so routing is same] -- this doesn't (shouldn't?) have to be perfect, 1s window should be plenty. possibly covered by swapping to more robust lb/proxy code.
  • move routing logic into robust lb that handles connection draining, re-routing failed in flight requests, circuit breaking. tbd as if we don't need many more features then it could be better to just implement what we actually want (better maintainability, possible initial robustness hit)
  • tinker with load shedding ratios, and add ability to drop a request altogether (504 it) if wait times are really high. since we aren't limiting concurrency now we can probably get pretty aggressive, need to account for hot vs not still though and don't want to look it up in lb (when can we make them all hot? ;))

@rdallman
Copy link
Contributor Author

In gitlab by @treeder on Jun 13, 2017, 10:14

closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant