Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly reduce file descriptors consumption #3085

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sobomax
Copy link
Contributor

@sobomax sobomax commented May 8, 2023

Summary

Significantly reduce file descriptors consumption (by 30-50%). From 8,000 to 5,800 using the example below.

Details

Reduce number of sockets created by closing off one half of the socketpair in child and parent after forking. This results in 30% decrease of the total number of sockets allocated, from 8k to some 5.8k on my test config with 10 workers and 10 sockets.

Usual scenario involving shared pipe / socketpair in parent / child in POSIX / Linux is that two descriptors returned by the system call are then split with first descriptor used by parent and then the second is closed off by parent and the second used by a child and the first is closed by the child (or vice versa). OpenSIPS breaks this rule by keeping both descriptors open in both parent and child, effectively wasting up descriptor space. This could lead to hitting a system-wide limit or some excessive system load due to enormous size of the descriptor table.

Solution

pt[N].ipc_pipe[2] is replaced with just a single int. Initially set to be the parent's end of the pipe, in the child after fork() we do dup2 of the second fd using parent's end as a destination. This closes master's end (in child) and ensures FD numbers for IPC are the same in both parent and child, so that pt[N].ipc_pipe does not need to differ (helping with debugging for example). Then the original child's pipe is closed off in child. In master we simply close the child's end once fork is done.

Compatibility

  1. All modules needs to be recompiled!
  2. The child won't be able to schedule IPC calls to itself like the below (drouting.c):
        /* if child 1, send a job for itself to run the data loading after
         * the init sequance is done */
        if ( (rank==1) && ipc_send_rpc( process_no, rpc_dr_reload_data, NULL)<0) {
                LM_CRIT("failed to RPC the data loading\n");
                return -1;
        }

As it only has access to read-only end of the pipe.

139423 <... getpid resumed>)            = 139423
139423 dup2(26, 27 <unfinished ...>
139423 <... dup2 resumed>)              = 27
139423 close(26 <unfinished ...>
139423 <... close resumed>)             = 0
139423 epoll_ctl(10, EPOLL_CTL_ADD, 27, {events=EPOLLIN, data={u32=465969072, u64=140205378379696}}) = 0
139423 write(27, "\4\0\0\0\0\0\0\0A\212&\31\204\177\0\0\0\0\0\0\0\0\0\0", 24 <unfinished ...>
139423 <... write resumed>)             = -1 EBADF (Bad file descriptor)

This seems to be copied and pasted into other places too:

$ find ./ -name \*.c | xargs grep ipc_send_rpc | grep -w process_no
./modules/mi_script/mi_script.c:        if (ipc_send_rpc(job->process_no, mi_script_async_resume_job, job) < 0) {
./modules/qrouting/qrouting.c:  if (rank == 1 && ipc_send_rpc(process_no, rpc_qr_reload, NULL) < 0) {
./modules/userblacklist/userblacklist.c:        if ( (rank==1) && ipc_send_rpc( process_no, rpc_reload_sources, NULL)<0) {
./modules/drouting/drouting.c:  if ( (rank==1) && ipc_send_rpc( process_no, rpc_dr_reload_data, NULL)<0) {
./modules/dialplan/dialplan.c:  if (ipc_send_rpc( process_no, dp_rpc_data_load, NULL)<0) {
./modules/sql_cacher/sql_cacher.c:      if ((rank == 1) && ipc_send_rpc(process_no, cache_init_load, NULL) < 0) {
./modules/usrloc/ul_mod.c:              if (ipc_send_rpc( process_no, ul_rpc_data_load, NULL)<0) {
./modules/cgrates/cgrates.c:                    if (ipc_send_rpc(process_no, cgrc_conn_rpc, c) < 0)
./modules/cgrates/cgrates_engine.c:             if (ipc_send_rpc(process_no, cgrc_reconn_rpc, c) < 0)

The solution might be to create special API to request sending a wake up API at module creation time. Or perhaps have a ping-pong RPC API? The module would just call main proc and get it to return the call ASAP?

OpenSIPS Configs Tested

opensips1.cfg

mpath="modules/"

loadmodule "proto_udp.so"
loadmodule "proto_tcp.so"
loadmodule "dialog/dialog.so"

loadmodule "sipmsgops/sipmsgops.so"
loadmodule "uac/uac.so"
loadmodule "uac_auth/uac_auth.so"
loadmodule "signaling/signaling.so"
loadmodule "auth/auth.so"

loadmodule "sl/sl.so"
loadmodule "tm/tm.so"
loadmodule "rr/rr.so"
loadmodule "maxfwd/maxfwd.so"
loadmodule "rtpproxy/rtpproxy.so"
loadmodule "textops/textops.so"

modparam("uac_auth", "credential", "mightyuser:VoIPTests.NET:s3cr3tpAssw0Rd")
modparam("auth", "username_spec", "$var(username)")
modparam("auth", "password_spec", "$var(password)")
modparam("auth", "calculate_ha1", 1)

modparam("rtpproxy", "rtpproxy_sock", "cunix:/tmp/p.sock")
modparam("rtpproxy", "rtpproxy_disable_tout", 1)

listen=udp:192.168.23.1:5060
listen=udp:[::1]:5060
listen=tcp:127.0.0.1:12340
listen=tcp:127.0.0.1:12341
listen=tcp:127.0.0.1:12342
listen=tcp:127.0.0.1:12343
listen=tcp:127.0.0.1:12344
listen=tcp:127.0.0.1:12345
listen=tcp:127.0.0.1:12346
listen=tcp:127.0.0.1:12347
listen=tcp:127.0.0.1:12348
listen=tcp:127.0.0.1:12349
udp_workers=10

route {
    xlog("OpenSIPS received a request $rm ($ci) from $si\n");
    ## initial sanity checks -- messages with
    ## max_forwards==0, or excessively long requests
    if (!mf_process_maxfwd_header(10)) {
        sl_send_reply(483, "Too Many Hops");
        exit;
    };

    ## shield us against retransmits
    if (!t_newtran()) {
        sl_reply_error();
        exit;
    };

    if (is_method("INVITE")) {
        if (!has_totag()) {
            $var(username)="mightyuser";
            $var(password)="s3cr3tpAssw0Rd";
            if (!pv_www_authorize("VoIPTests.NET")) {
                $var(challenge_using) = "md5,md5-sess,sha-256,sha-256-sess,sha-512-256,sha-512-256-sess";
                www_challenge("VoIPTests.NET", "auth-int,auth", $var(challenge_using));
                exit;
            }
            consume_credentials();
        }
        t_reply(100, "Trying");
        if (rtpproxy_offer("r")) {
            t_on_reply("1");
            if (!has_totag()) {

                t_on_failure("invite");
                create_dialog();
            } else {
                t_on_failure("re_invite");
            }
        };
    };

    if (is_method("BYE")) {
        xlog("    calling rtpproxy_unforce()\n");
        rtpproxy_unforce();
    };

    record_route();

    if (loose_route()) {
        t_relay();
        exit;
    };

    if ($sp == 5061) {
        $rp = 5062;
    } else {
        $rp = 5061;
    };
    t_relay();
    ##rtpproxy_stream2uac("ringback", "10");
    exit;
}

onreply_route[1]
{
    xlog("OpenSIPS received a reply $rs/$rm ($ci) from $si\n");
    if ($rs =~ "(183)|2[0-9][0-9]") {
        xlog("  calling search()\n");
        if(!search("^Content-Length:[ ]*0")) {
            xlog("    calling rtpproxy_answer()\n");
            rtpproxy_answer("r");
            ##rtpproxy_stream2uas("ringback", "10");
        };
    };
}

failure_route[invite]
{
    xlog("OpenSIPS handling $rm ($ci) failure in from $si in failure_route[invite]\n");
    if (t_check_status("40[17]")) {
        $var(accept_algs) = "md5-sess,sha-256-sess";
        if (uac_auth($var(accept_algs))) {
            xlog("    uac_auth() SUCCESS\n");
            t_on_failure("uac_auth_fail");
            t_relay();
            exit;
        } else {
            xlog("    uac_auth() FAILURE\n");
        }
    }
    xlog("    calling rtpproxy_unforce()\n");
    rtpproxy_unforce();
}

failure_route[re_invite]
{
    xlog("OpenSIPS handling $rm ($ci) failure in from $si in failure_route[re_invite]\n");
    if (t_was_cancelled()) {
        exit;
    }

    if (t_check_status("40[17]")) {
        $var(accept_algs) = "md5-sess,sha-256-sess";
        if (uac_auth($var(accept_algs))) {
            xlog("    uac_auth() SUCCESS\n");
            t_on_failure("uac_auth_fail");
            t_relay();
            exit;
        } else {
            xlog("    uac_auth() FAILURE\n");
        }
    }
}

failure_route[uac_auth_fail]
{
    xlog("OpenSIPS handling $rm ($ci) failure in from $si in failure_route[uac_auth_fail]\n");
    xlog("    calling rtpproxy_unforce()\n");
    rtpproxy_unforce();
}

opensips2.cfg

debug_mode=yes
log_level=3
xlog_level=3
log_stderror=yes
log_facility=LOG_LOCAL0
udp_workers=4

#dns_try_ipv6=yes
socket=udp:0.0.0.0:5060   # CUSTOMIZE ME
socket=tcp:0.0.0.0:5060
#socket=bin:0.0.0.0:5160

mpath="modules/"

loadmodule "proto_tcp.so"
loadmodule "proto_udp.so"
loadmodule "signaling.so"
loadmodule "sl/sl.so"

loadmodule "db_mysql/db_mysql.so"

loadmodule "tm/tm.so"

modparam("tm", "fr_timeout", 5)
modparam("tm", "fr_inv_timeout", 30)
modparam("tm", "restart_fr_on_each_reply", 0)
modparam("tm", "onreply_avp_mode", 1)

loadmodule "sipmsgops/sipmsgops.so"
loadmodule "httpd/httpd.so"
modparam("httpd", "ip", "192.168.23.158")
modparam("httpd", "port", 8899)
loadmodule "mi_http/mi_http.so"
loadmodule "mi_html/mi_html.so"

loadmodule "drouting/drouting.so"
modparam("drouting", "db_url", "mysql://root:whocares?@127.0.0.1/opensips")



route{
        xlog("new request");
}

of the socketpair in child after forking. This results
in 30% decreate of the total number of sockets allocated,
from 8k to some 5.8k on my test config with 10 workers and
10 sockets.
@sobomax sobomax requested a review from bogdan-iancu May 8, 2023 22:39
@sobomax sobomax marked this pull request as draft May 11, 2023 18:29
@bogdan-iancu
Copy link
Member

Hi @sobomax , thanks for tackling this issue. I agree that things may be improved here, that some file descriptors may be closed in vary processes. BUT, I'm not convinced (yet to do deeper thinking on the code here) that the implementation is correct. In the pt table, you keep now only one FD for either writing or reading - I guess the idea is that the child proc only reads while the other just write. But the pt table is in shm memory, so not sure how pt[_proc_no].ipc_pipe will hold only one fd to be used both for write and read by all procs

@sobomax
Copy link
Contributor Author

sobomax commented May 12, 2023

Hi @sobomax , thanks for tackling this issue. I agree that things may be improved here, that some file descriptors may be closed in vary processes. BUT, I'm not convinced (yet to do deeper thinking on the code here) that the implementation is correct. In the pt table, you keep now only one FD for either writing or reading - I guess the idea is that the child proc only reads while the other just write. But the pt table is in shm memory, so not sure how pt[_proc_no].ipc_pipe will hold only one fd to be used both for write and read by all procs

@bogdan-iancu thanks for a review! Please re-read the detailed description above. The answer to your question is that the child proc does "dup2(read_env_fd, write_env_fd), that would make "read end" FD the same as "write end" (and close original "write end" in child in the process). That's how we get away with using the same FD on both read and write side. :)

@sobomax
Copy link
Contributor Author

sobomax commented Nov 9, 2023

Just for posteriority, update here. The problem with the patch right now turned out to be some rare and somewhat hacking use of the IPC API by a few modules, which use it to get code executed by the child process after initialization has been completed. This basically breaks pattern, so that the code running in the child is trying to make IPC call to itself. This becomes impossible, since with this patch child process no longer has access to the write end of the IPC pipe. :-(

static int child_init(int _rank)
{
[...]
        /* _rank==1 is used even when fork is disabled */
        if (_rank==1 && rr_persist == RRP_LOAD_FROM_SQL) {
                /* if cache is used, populate domains from DB */
                if (ipc_send_rpc( process_no, ul_rpc_data_load, NULL)<0) {
                        LM_ERR("failed to fire RPC for data load\n");
                        return -1;
                }
        }
}

This has the effect of having ul_rpc_data_load() call to be dispatched after reactor starts. I am considering few options to get this resolved. Simplest one, perhaps, would be to detect this condition in the ipc_send_rpc() and just put that call into the incoming call queue to be dispatched without doing any real I/O on the socket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants