Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Anti affinity rules for function pods without need to create separate Profile for each function #783

Open
1 of 2 tasks
angel-ivanov opened this issue Apr 8, 2021 · 6 comments

Comments

@angel-ivanov
Copy link

angel-ivanov commented Apr 8, 2021

Expected Behaviour

Current Behaviour

Are you a GitHub Sponsor (Yes/No?)

Check at: https://github.com/sponsors/openfaas

  • Yes
  • No

List All Possible Solutions and Workarounds

Which Solution Do You Recommend?

Steps to Reproduce (for bugs)

Context

Your Environment

  • FaaS-CLI version ( Full output from: faas-cli version ):

  • Docker version docker version (e.g. Docker 17.0.05 ):

  • What version and distriubtion of Kubernetes are you using? kubectl version

  • Operating System and version (e.g. Linux, Windows, MacOS):

  • Link to your project or a code example to reproduce issue:

  • What network driver are you using and what CIDR? i.e. Weave net / Flannel

First of all, hello everyone!

I am looking for a way to make the function pods get spread evenly across all available k8s nodes. I was looking for information and tinkering with what is available so far, but I don't see a way to implement this functionality without having separate Profile for each function.

Looking in OpenFaas docs, I understood that I need to create a Profile which contains the anti affinity rule, and then I can link the function to this profile with annotation, and then openfaas will put the anti affinity rule from the profile into the deployment spec of the function.

According again to Profiles docs, I think this is somehow done in order to reuse a single profile for multiple functions, based on the function's requirements. Well in my case, if i have 3 node k8s cluster, and I want the function to scale up to 3 pods, then I want each of the function pod to be running on a different node.

Every function has a label with unique ID, and if my tool was creating the k8s deployment, I could simply put this label with it's ID in the anti affinity pod rule, however I cannot set this directly in the function and the only way using a Profile. To me it seems that currently I cannot create a single Profile with such anti affinity rule and then pass it to every function, because every function has unique ID and the anti affinity rules in kubernetes require specifying both the key and value of the label.

I was wondering if anybody from the community here had the same problem to solve and have found solution without having to create a separate profile for each function.

It is my first time creating an issue in order to ask such thing, but since this is not an actual bug, I've skipped the issue template, I hope that's not a problem.

Regards,
Angel!

@LucasRoesler
Copy link
Member

The first thing to note is that (anti)-affinity operators support: In, NotIn, Exists, DoesNotExist, see these k8s docs this means that you do not need to specify a unique value here.

Second, every function has a label faas_function with the function name, which might be useful for the "profile per function" workflow that you mention. Offhand this is probably the easiest and clearest way to achieve what you want. If you really truely need this capability, then that is probably you best bet right now.

Last, I would be very curious to know what the actual use case is. This feels like something that you really shouldn't care about. I don't mean to discount that there are use-cases that might really need this. But if the goal is to help reserve resources for each instance (e.g. CPU or GPU) then resource request/limits are going to be a better way to express this. A concrete example of what you are trying to do and why it is needed would be very helpful to help guide toward a better solution.

I could see reasons to keep "stacks" of functions co-located (for example to reduce latency of requests between them, e.g. with a cache). In this case, using affinity with the In operator and a unique label in your stack.yaml is probably a good approach. A similar approach would also allow you to avoid locating a group of functions on a node that already has a group of functions scheduled on it, see these k8s docs

@angel-ivanov
Copy link
Author

Hi Lucas, thanks for your answer. I've checked the docs and I saw the options to use In, NotIn, Exists, DoesNotExist, but they are still not something that will help me the achieve what I want.

The actual use case is that if I have 3 node k8s cluster, I want the pods to be spread evenly. If one node goes down, the function invocations can continue working without disruption. If for some reason all 3 pods of the function are living on the same node and the node goes down, there is down time introduced even if k8s is about to bring the pods up on some of the living nodes. I think the use case is pretty simple and I was looking for a way to not having yet another entity to manage.

@LucasRoesler
Copy link
Member

I like your use case/example.

I wonder if we could add support for this in the Profiles https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/ Depending on how your cluster is deployed you might also be able to use this https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/#cluster-level-default-constraints

Then it should be possible to have a single Profile for all of your functions and achieve the even spread between nodes.

Those features require k8s 1.19, but this should be available in all clusters and should also do exactly what you want https://kubernetes.io/docs/reference/scheduling/policies/#:~:text=ServiceSpreadingPriority I haven't used it, but I think it requires configuring an KubeSchedulerConfiguration object, see https://kubernetes.io/docs/reference/scheduling/config/

@alexellis
Copy link
Member

alexellis commented Apr 12, 2021

Welcome to the community. I see Lucas has been able to give you some suggestions.

I was wondering if anybody from the community here had the same problem to solve and have found solution without having to create a separate profile for each function.

I've repopulated the issue template that you deleted. This is a required part of the community participation and we would like you to fill it out with all the details we ask for.

How many replicas and nodes do you expect to have on average?

What work have you done to determined that the spread of functions across your nodes is uneven? How did the function replicas spread when you scaled from 1 to 3 replicas in your 3 node cluster?

Related to this - how often, have you observed node failures in production?

I'm curious what workarounds you have considered already? A couple that came to mind were:

  • Using the asynchronous invocations so that if a function is unavailable, it could be deferred or run later
  • Using a container based back-end for your cluster like AWS Fargate or GKE Autopilot, where each Pod is actually running on its own virtual node
  • What would happen if a function was unavailable for a short period of time whilst being rescheduled?
  • Could you set a scheduling policy for your entire cluster (without changing OpenFaaS) which set the Scheduling Policy suggested by Lucas?

More importantly, you should apply whatever strategy you take to not just the functions that (we create via faas-netes), but to any components you deploy, including all the OpenFaaS core services in the helm chart. If you can find a way to update your default scheduler, that may be a quick win that gives you the result you're looking for.

@aslanpour
Copy link

aslanpour commented Jun 14, 2021

Hi everyone,

Sharing my similar problems!
In my use case, I am considering a K3s cluster of Raspberry Pi 3s as nodes (and a Pi 4 as master) that are running Serverless functions representing IoT applications, say in Smart Agriculture and Smart City. Our nodes are also powered by batteries. Given nodes' resource and energy limitations, our goal is to dynamically manage the resource and energy usage thereof. The functions themselves and the traffic are identified as the major players.
Functions: we find it a scheduling problem that can be dealt with using affinity, toleration, etc supported in OpenFaaS Profile.
Traffic: we feel traffic splitting/shifting using Service Meshes in OpenFaaS can help, like Linkerd workshop. The istio-related docs miss traffic shifting stuff.

However, the challenge is that the above solutions work only when we are dealing with different functions or different versions of a function, not different replicas of a function. In fact, we need to treat replicas (Pod) of a function (Deployment) differently so that we can schedule the X-th replica on a particular node or that we can route Y% of traffic to particular replicas based on our custom metrics.

I feel such challenges come from the stateless deployment followed by Serverless platforms. For instance, traffic splitting for replicas of a function could probably achieve if functions were deployed as StatefulSet so they can communicate with EndPointSlice while, in a stateless Deployment object, this EndPointSlice won't work desirably (I'm not sure about what I am saying here).

Going through documents, I find it much feasible to benefit from topology-based designs where we can schedule X-th replica of a function on a certain zone, region, etc or we can give different weights to each zone, region, etc. so the replicas scheduled in that zone will receive Y% of the traffic (invocations). To implement such policies, I see, as Lucas said, PodTopologySpreadConstraints can help with the scheduling problem and Topology Aware Hint (and EndpointSlice) can help with traffic management.
Supporting such features by OpenFaaS will help our project with dynamic resource-efficient scheduling and routing solutions. Alternatively, how about a tutorial/workshop for a general Kubernetes-based (not OpenFaas specific) way to use such features so you do not have to keep updating the OpenFaas Profile to support new features? Coz, I will need specifications such as priorityClassName or schedulerName as well, for certain purposes.

Any suggestion is appreciated.
Thanks,
Mohammad

@alexellis
Copy link
Member

Coming back to this later on, have you considered simply creating one profile per function?

Why did you conclude that this would not work? Do you have any data?

For Pod Topology Spread Constraints, we've had another request for this, but only for the core components deployed via helm #856

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants