Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] load_balancer algorithm weaknesses #3297

Closed
spacetourist opened this issue Feb 2, 2024 · 7 comments · May be fixed by #3351
Closed

[BUG] load_balancer algorithm weaknesses #3297

spacetourist opened this issue Feb 2, 2024 · 7 comments · May be fixed by #3351
Assignees
Labels

Comments

@spacetourist
Copy link
Contributor

OpenSIPS version you are running

version: opensips 3.2.11 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 14e4858f4
main.c compiled on 00:00:00 May 27 2021 with gcc 11

Describe the bug

The load_balancer algorithm is not correctly accounting for existing calls on the destinations.

When selecting a destination the algorithm is one of the following, in my case I'm using relative so we include the CPU score:

if( flags & LB_FLAGS_RELATIVE ) {
if( dst->rmap[l].max_load )
av = 100 - (100 * lb_dlg_binds.get_profile_size(res[k]->profile, &dst->profile_id) / dst->rmap[l].max_load);
} else {
av = dst->rmap[l].max_load - lb_dlg_binds.get_profile_size(res[k]->profile, &dst->profile_id);
}

The max_load value is loaded via freeswitch HEARTBEAT data using this code:

if (psz < dst->fs_sock->stats.max_sess) {
dst->rmap[ri].max_load =
(dst->fs_sock->stats.id_cpu / (float)100) *
(dst->fs_sock->stats.max_sess -
(dst->fs_sock->stats.sess - psz));
} else {
dst->rmap[ri].max_load =
(dst->fs_sock->stats.id_cpu / (float)100) *
dst->fs_sock->stats.max_sess;
}

This means that the max_load score is the max number of sessions configured on the server minus the number of sessions that already exist minus the number of dialogs that this OpenSIPs instance has allocated to it (all reduced by the CPU availability score).

This calculation means that we are including the dialog profile counts twice - both during the max_load calculation and the destination selection calculation, most of those sessions will be double counted as the Session-Count value in the heartbeats will include these dialogs.

Furthermore when no calls have been allocated to the destination instance from this OpenSIPs all instances will get the same score of 100 as the calculation is inevitably: 100 - (100 * 0 / max_load) - this means that in my environment if one instance has 100 available channels it will be just as likely to be allocated the call as an instance with 1000 channels available.

Expected behavior

My goal is to get proportional load balancing working using this module such that incoming calls get spread evenly over all the available freeswitch instances of varying sizes.

In my opinion a more perfect calculation would account for existing sessions on the destinations as well as any which have been allocated since the last heartbeat. Something like:

100 - ( 100 * ( FS_Session_Count+Profile_dialogs_since_last_heartbeat / FS_Max_Sessions - CPU load )

This would need the max sessions and current session data to be added to the lb_resource_map struct to make it available to the balancer. I'm not clear regarding how easy it would be to get the dialogs added to the profile since the last heartbeat was processed for that instance - perhaps that's too complex.

Modifying the max_load calculation in favour of a system utilisation figure would probably be enough of an improvement for my situation:

dst->rmap[ri].utilisation = (dst->fs_sock->stats.id_cpu / (float)100) * 
   ( 100 * dst->fs_sock->stats.max_sess - dst->fs_sock->stats.sess / dst->fs_sock->stats.max_sess );

The new value utilisation should equal the percentage of channels utilised reduced by the CPU utilisation. This would remove the dialog profiles from the calculation and allow the destinations to be scored proportionally. As we have reduced the capacity figures to a load percentage this might work well with the random destination flag such that we resolve to an eventually even distribution.

--

Please note this report is a work in progress as I gather information on the module and is meant as a discussion point rather than a call for a solution at this time!

@spacetourist
Copy link
Contributor Author

The more I look into this the more I'm beginning to think that the solution will actually work well enough once I have more callers passing through the load balancer. I'm testing with only a few calls using the load balancer and had been expecting that the module would select the least loaded freeswitch instance for the first call however now that I understand how the calculations work this would actually come into effect once we have a reasonable number of load balanced sessions active - at that point the 0/div issue wouldn't be a factor and the load would eventually balance out.

My main concern is the sheer volume of calls I'm having to balance, my system has extreme peaks in call traffic and at the peaks the calls must be balanced as effectively as possible. I'm sure there are some improvements possible and I'll continue to review over the next week to see if i can identify worthwhile improvements which take into account the load of the backend servers in a more accurate manner. Perhaps this could be best addressed by simply accounting for the total channel availability when the load balancer returns several instances reporting 100% availability - a simple sort by max_load desc before picking the first instance would possibly address this.

Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@spacetourist
Copy link
Contributor Author

Apologies for the radio silence on this issue, I have finally found some time to more closely analyse the algorithm and how it would operate in my environment where I am dealing with untracked sessions on the FreeSWITCH media servers. To recap, I have several OpenSIPs instances feeding into the same bank of media servers and I do not wish to split the backend servers - the aim is to use them all as a pool of channels available to all OpenSIPs instances.

My issue with the existing algorithm options are that they weigh in too heavily on the profile size associated with each instance. This makes sense and would work great if all inbound calls were processed by a single instance as sessions added and removed from the profiles between heartbeats would be tracked in real time and produce a reasonably even distribution of calls to the media servers. Unfortunately when these are all zero values thew calculation is out of the window and initial calls will be allocated randomly until we are tracking several sessions on each backend instance.

To illustrate this, here is a table showing the existing relative algorithm for a sample set of five instances:

image

For this calculation there are 100 calls distributed to each FS instance via the local OpenSIPs however the actual system channel load is between 1100 and 700, we'd want the instance with 700 to get more sessions until it balances. This just about works however you can see that the calculation spread is very tight so instances 3-5 will share the additional calls until the next heartbeat comes in. If in that period I see 50 calls allocated to each of those instances they are still the lightest load but now get none of the calls due to the profile value being highly weighted:

image

At this point the calls would be going to the busiest instances.

With an objective of achieving even load distribution across the backend instances I propose an alternative algorithm option which is more simple and pays most attention to the max session utilisation. The basic calculation would be:

100 - ( 100 * current_sessions / max_sessions )

This would provide a capacity based score which would rank the least loaded servers (from a channel perspective) as the best (highest scoring) targets. It makes sense to incorporate the CPU score in the same way as performed currently:

( 100 - ( 100 * current_sessions / max_sessions ) ) * CPU Idle factor

This method works great on a call by call basis however without profile counting this quickly becomes out of sync between heartbeats. For my solution I have the heartbeats arriving from all backend media servers every second. At present the environment may receive upwards to 600/cps which would quickly cause some wild imbalances - the same target will be taken for all calls until the next heartbeat arrives.

To mitigate that issue I would want to count the sessions distributed locally to each instance between heartbeats. This is obviously imperfect as there will be no awareness of the distribution choices of the other OpenSIPs instances however with the heartbeats arriving from many backend instances these will be spread across the individual seconds and should overall produce a reasonably balanced distribution and mitigate the existing profile counting weaknesses.

I'd like to offer up a PR which allows this, before proceeding it would be great to hear back from @liviuchircu or @bogdan-iancu in case you really hate this idea within the released module. Here is what I would implement:

  • new flag (XOR with relative) which indicates that we want this mode of operation
  • modify lb_update_max_loads() to capture the following values to new properties of lb_resource_map:
    • max_sessions
    • current_sessions
    • cpu_idle
  • modify get_dst_load() to score based on ( 100 - ( 100 * current_sessions / max_sessions ) ) * CPU Idle factor
  • modify lb_route() to increment dst->rmap[l].current_sessions every time an instance is selected as the target

This runtime option should achieve the following:

  • Calls routed according to percentage of channels free, reduced by CPU usage
  • Busy servers will incorporate local distributions into each routing decision through incrementing the session counter locally
  • Heartbeats will reset any local modifications to the current sessions resulting in an eventually even distribution as call peaks diminish

Hopefully that all makes sense, I look forward to hearing your thoughts, if this sounds viable please let me know and I'll get to work on a PR for review. Otherwise please let me know your concerns and I'll have a rethink

@github-actions github-actions bot removed the stale label Mar 12, 2024
Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@spacetourist
Copy link
Contributor Author

Hi @liviuchircu

I've finally found some time and gone ahead and created a prototype PR #3351

The initial testing looks to be working as I anticipated however I will find some more time to put some load onto it next week to see how it performs compared to the existing strategies (possibly an excuse to try out SIPssert!).

Please let me know if there are any issues with progress so far, thanks and happy Easter 🥚

@stale stale bot removed the stale label Mar 28, 2024
Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Apr 15, 2024
Copy link

Marking as closed due to lack of progress for more than 30 days. If this issue is still relevant, please re-open it with additional details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants