Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 42a1f96809d0dfb72e1abaad3923761eba4c6fe2
Merge: dc1317b fca6e10
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Aug 8 11:53:16 2014 -0700

    Merge branch 'dev'

commit fca6e10a83eb592135fd47bc73600c7a955ca2b5
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Aug 7 15:43:00 2014 -0700

    Release 1.0.19-1 hotfix

commit dc1317b5668200bf0947dcac21a4d95959d333b3
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Aug 4 10:01:31 2014 -0700

    indexer: Include errno.h directly

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 064c9cb1bddbab9e6d54ba301facfae7e1992455
Author: Ilya Nelkenbaum <ilyan@mellanox.com>
Date:   Mon Jul 28 15:48:09 2014 +0300

    rsocket: Segmentation fault fix in case of multiple connections

    In case of more than 16 rsocket connections
    are established, "svc->rss" buffer is reallocated
    with more memory. Index 0 is reserved for the service's
    communication socket, and this is not taken in count
    when data is copied from old buffer location to
    new one.

    Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit a7287adaea52d21cd2d50f1621f8eda37c4c3c90
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 22 23:24:53 2014 -0700

    udpong: Fix client_recv error check

    We only want to report an error if it's not EGAIN.  The if
    statement is reversed.  Correct it.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 806de778b1fe665dee2f62c7bf7211ab9bd2d53f
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jul 16 15:49:16 2014 -0700

    Release 1.0.19

commit 8f53f2a5d3cb5d6c30fe5695b48268ea1bbe2ff0
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jul 16 13:44:56 2014 -0700

    riostream: Only verify last data transfer

    Data verification will fail when running the bandwidth
    tests or the transfer count is > 1.  The issue is that
    subsequent writes by the initiator side will overwrite
    the data in the target buffer before the receiver can
    verify that it is correct.

    To fix this, only verify that the data in the buffer
    is correct after the last transfer has completed.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit c4f8e22a6d078fa914cd4102d65fa854587e1248
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jul 7 08:40:44 2014 -0700

    Revert "Revert "rsocket: Change keepalive to 0-byte RDMA write""

    This reverts commit a34703c53259845dd20450a87eb6747030e23e8b.

    0-byte RDMA writes appears to be working correctly with
    HCAs from 2 different vendors.  The original problem that
    was reported turned out to be a user error.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit fa85dc408e28afd67b81c3a590fd874ef6fdc63a
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Jul 3 13:45:52 2014 -0700

    rsocket: Update correct rsocket keepalive time

    When the keepalive time of an rsocket is updated, the
    updated information is forwarded to the keepalive service
    thread.  However, the thread updates the time for the
    wrong service as shown:

    tcp_svc_timeouts[svc->cnt] = rs_get_time() + msg.rs->keepalive_time;

    The index into tcp_svc_timeouts should correspond to the
    rsocket being updated, not the last one in the list.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 1695abfa9f6bf429a5aa07117310c4ad87d4b3ae
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Jul 3 13:55:39 2014 -0700

    rsocket: Fix removing rsocket from service thread

    When removing an rsocket from a service thread, we replace
    the removed service with the one at the end of the service list.
    This keeps the array tightly packed.  However, rs_svc_rm_rs
    decrements the rsocket count before doing the swap.  The result
    is that the entry at the end of the list gets dropped off.
    Defer decrementing the count until the swap has been made.

    In this case, the cnt value is a valid index into the array,
    because we start at index 1.  Index 0 is used internally by
    the service thread.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 9085562c22189850e1f16b9a9955f11e79caac06
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jul 2 15:37:10 2014 -0700

    rsocket: Fix crash resulting from keepalive timeout

    The following crash was reported by Hal Rosenstock,
    <hal@mellanox.com>, with keepalive enabled.  The crash
    occurs in the keepalive thread attempting to send a
    keepalive message.

    report:
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0x7fffecf08700 (LWP 6013)]
    rs_post_write (rs=<value optimized out>, sgl=0x0, nsge=0, wr_data=3758096385,
        flags=0, addr=0, rkey=0) at src/rsocket.c:1660
    1660            return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad));
    Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
    (gdb)
    (gdb) p/x rs
    $1 = value has been optimized out

    So I added in the following to debug:
    1660    if (rs == NULL)
    1661    abort();
    1662    if (rs->cm_id == NULL)
    1663    abort();
    1664    if (rs->cm_id->qp == NULL)
    1665    abort();
    1666            return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad));
    1667    }

    And saw in gdb:

    Program received signal SIGABRT, Aborted.
    [Switching to Thread 0x7fffecf08700 (LWP 8096)]
    0x00000030d50328a5 in raise () from /lib64/libc.so.6
    Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
    (gdb)
    (gdb) bt
    #0  0x00000030d50328a5 in raise () from /lib64/libc.so.6
    #1  0x00000030d5034085 in abort () from /lib64/libc.so.6
    #2  0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0,
        nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665
    #3  0x00007ffff058193d in tcp_svc_send_keepalive (arg=0x7ffff0789f20)
        at src/rsocket.c:4245
    #4  tcp_svc_run (arg=0x7ffff0789f20) at src/rsocket.c:4279
    #5  0x00000030d5807851 in start_thread () from /lib64/libpthread.so.0
    #6  0x00000030d50e890d in clone () from /lib64/libc.so.6
    (gdb) fr 2
    #2  0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0,
        nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665
    1665    abort();

    So qp is NULL somehow...
    :end report

    There is an issue if an rsocket is closed without going through
    the rshutdown.

    int rshutdown(int socket, int how)
    {
    	...
    	if (rs->opts & RS_OPT_SVC_ACTIVE)
    		rs_notify_svc(&tcp_svc, rs, RS_SVC_REM_KEEPALIVE);

    We remove the rsocket from the keepalive thread in rshutdown.

    int rclose(int socket)
    {
    	...
    		if (rs->state & rs_connected)
    			rshutdown(socket, SHUT_RDWR);
    	...
    	rs_free(rs);

    rclose will call shutdown only if we're connected.  However, if the
    keepalive failed, the socket will be in an error state.  So,
    no call to rshutdown, which will leave the freed rsocket on
    the keepalive thread's list.

    The fix is to to have rclose remove an rsocket from being processed
    by a service thread if it is still active.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 310f630ac87f1deee1534ab405d5b771b801c25d
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 22:52:40 2014 -0700

    example/rdma_xclient/server: Update XRC support in sample programs

    Update rdma_xclient and rdma_xserver sample programs to test
    XRC data transfers.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 5662340a12429f8882be36d8787924be91a1cb74
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 22:56:43 2014 -0700

    rdmacm: Update addrinfo with XRC support

    Remove internal defines, and use libibverbs exported values
    instead.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 995eb0c90c1a0967179fe3f523861e15300d3dfa
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 17:47:22 2014 -0700

    rdmacm: Add support for XRC QPs

    Export a new extended create QP call.  Add support for XRC
    QPs.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 05eabc5335b95ab9d0d6a6132092fac6e1af1cc5
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 17:14:13 2014 -0700

    rdmacm: Add support for allocating XRC SRQs

    Add extended SRQ creation call, to support allocating
    XRC SRQs.  Use the rdma_cm_id qp type field to
    determine which type of SRQ should be allocated.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 89a782a52a48db38d917084233006fb91cbd0694
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 16:46:34 2014 -0700

    rdmacm: Add functionality to allocate an XRCD

    XRC QPs and SRQs are associated by an XRC domain.  Provide a
    call to allocate an XRCD, similar to how the rdmacm allocates
    a PD for the user.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit f916b9b6bfbcd86b5326d84c0dfa106ddc9c907c
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 16:17:30 2014 -0700

    build: Add build support for XRC

    Modify autotools to check for and require a libibverbs
    version that includes XRC and extension support.

    Remove any code used to support older versions of
    libibverbs.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 0cd1e9b0e7a2d438a0f1004e6c6ff1b6785c4038
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Jul 1 13:30:42 2014 -0700

    librdmacm: Use SRQ in rdma_create_qp

    If an application has allocated an SRQ on an rdma_cm_id, use
    it when creating a QP.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 3e1fc1cfad65c83a05c8550d8e359c8b9223d859
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jun 25 12:56:18 2014 -0700

    librdmacm: Remove NULL checks after calling alloca

    alloca doesn't return a NULL pointer on failure.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit a34703c53259845dd20450a87eb6747030e23e8b
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Jun 20 17:44:26 2014 -0700

    Revert "rsocket: Change keepalive to 0-byte RDMA write"

    This reverts commit 0f2c76e81ecf1470cf152600c08c421e7e82b00e.

    Testing has shown that this does not always result in the
    keep-alive message working correctly, such that a broken
    connection is reported as having failed.  The reason for this
    behavior is unknown, but revert the patch until the issue has
    been resolved.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 7b1eb6407f1f7a953673ab23a2d75f8a3cd8dbb9
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Thu Jun 19 13:08:02 2014 -0400

    librdmacm: In ucma_convert_path, fix selector values

    Intent is for the selectors to be equal to (exactly) rather than less than.
    Selector for exactly is value of 2 rather than 1.

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 7f0fbf984a5140efb76f93fef1f35202c617249d
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Thu Jun 19 11:54:11 2014 -0400

    rsocket: Add support for RDMA_ROUTE option in rgetsockopt

    Create as many ibv_path_data structs from the RDMA route
    ibv_sa_path_rec struct for the rsocket based on how
    many fit into the supplied buffer.

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 106899eccc5fa61dd5e69c90bc0651ccd57e725f
Merge: 6c7d6d3 0f2c76e
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jun 18 11:56:42 2014 -0700

    Merge branch 'dev'

commit 0f2c76e81ecf1470cf152600c08c421e7e82b00e
Author: Susan K. Coulter <markus@cj-fe1.lanl.gov>
Date:   Mon Jun 16 10:28:08 2014 -0700

    rsocket: Change keepalive to 0-byte RDMA write

    Signed-off-by: Susan K. Coulter <markus@cj-fe1.lanl.gov>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6c7d6d3038524c275ecfb7468b4455fe2cc39a19
Author: Doug Ledford <dledford@redhat.com>
Date:   Wed Jun 18 10:45:23 2014 -0700

    rdma_server: handle IBV_SEND_INLINE correctly

    Not all RDMA devices support IBV_SEND_INLINE.  At least some of those
    that don't will ignore the flag passed to rdma_post_send and attempt to
    send the command by using an sge entry instead.  Because we don't
    register the send memory, this fails.  The proper way to deal with the
    fact that IBV_SEND_INLINE is not guaranteed is to check the returned
    value in our cap struct to see if we have support for inline data, and
    if not, fall back to non-inline sends and to register the send memory
    region.

    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 9fe390a793203a13b0507472848e1e7da8c75bed
Author: Doug Ledford <dledford@redhat.com>
Date:   Wed Jun 18 10:44:49 2014 -0700

    rdma_client: handle IBV_SEND_INLINE correctly

    Not all RDMA devices support IBV_SEND_INLINE.  At least some of those
    that don't will ignore the flag passed to rdma_post_send and attempt to
    send the command by using an sge entry instead.  Because we don't
    register the send memory, this fails.  The proper way to deal with the
    fact that IBV_SEND_INLINE is not guaranteed is to check the returned
    value in our cap struct to see if we have support for inline data, and
    if not, fall back to non-inline sends and to register the send memory
    region.

    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 2c2e44e144f17c2cef4af052ec91a680c9a81fb9
Author: Doug Ledford <dledford@redhat.com>
Date:   Wed Jun 18 10:44:28 2014 -0700

    rdma_server: use perror, unwind allocs on failure

    Our main test function prints out errno directly, which is hard to read
    as it's not decoded at all.  Instead, use perror() to make failures more
    readable.  Also redo the failure flow so that we can do a simple unwind
    at the end of the function and just jump to the right unwind spot on
    error.

    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 1bc834aeca99a4dd0c5bea733e2735f148b4418c
Author: Doug Ledford <dledford@redhat.com>
Date:   Wed Jun 18 10:44:13 2014 -0700

    rdma_client: use perror, unwind allocs on failure

    Our main test function prints out errno directly, which is hard to read
    as it's not decoded at all.  Instead, use perror() to make failures more
    readable.  Also redo the failure flow so that we can do a simple unwind
    at the end of the function and just jump to the right unwind spot on
    error.

    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 05fc15b44805a23a4e8562d1953074243950dfbe
Author: Doug Ledford <dledford@redhat.com>
Date:   Wed Jun 18 10:43:04 2014 -0700

    cmtime: rework program to be multithread

    When using very large numbers of connections (10,000 was in use here),
    we ran into a problem where when we resolved a performance problem in
    the kernel cma.c code, we suddenly developed a new problem.  That new
    problem turned out to be the fact that with the underlying kernel issue
    resolved, 10,000 connect requests would flood the server side of the
    test and the cmtime application would respond as quickly as possible.
    However, the client side would not bother to check any of the returns
    until after having sent all 10,000 connect requests.  When the kernel
    had a serializing performance problem, this was OK.  When it was fixed,
    this caused a general slowdown in connect operations due to overruns in
    the event processing.  This patch causes the client side to fire off
    threads that will handle responses to connect requests as they come in
    instead of allowing them to backlog uncontrollably.  Times for a 10,000
    connect run changed from this:

    [root@rdma-dev-01 ~]# more
    3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+.output
    ib1:
    step              total ms     max ms     min us  us / conn
    create id    :       46.64       0.10       1.00       4.66
    bind addr    :       89.61       0.04       7.00       8.96
    resolve addr :       50.63      26.18   23976.00       5.06
    resolve route:      565.44     538.77   26736.00      56.54
    create qp    :     4028.31       5.70     326.00     402.83
    connect      :    50077.42   49990.49   90734.00    5007.74
    disconnect   :     5277.25    4850.35  380017.00     527.72
    destroy      :       42.15       0.04       2.00       4.21

    ib0:
    step              total ms     max ms     min us  us / conn
    create id    :       34.82       0.04       1.00       3.48
    bind addr    :       25.94       0.02       1.00       2.59
    resolve addr :       48.18      25.01   22779.00       4.82
    resolve route:      501.28     476.26   25071.00      50.13
    create qp    :     3274.12       6.05     257.00     327.41
    connect      :    55549.64   55490.32   62150.00    5554.96
    disconnect   :     5263.64    4851.18  375628.00     526.36
    destroy      :       47.20       0.07       2.00       4.72

    to this:

    [root@rdma-dev-01 ~]# more
    3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+-fixed-cmtime.output
    ib1:
    step              total ms     max ms     min us  us / conn
    create id    :       34.45       0.08       1.00       3.44
    bind addr    :       88.41       0.04       7.00       8.84
    resolve addr :       33.59       4.65     612.00       3.36
    resolve route:      618.68       0.61      97.00      61.87
    create qp    :     4024.03       6.30     341.00     402.40
    connect      :     6983.35    6886.33    8509.00     698.33
    disconnect   :     5066.47     230.34     831.00     506.65
    destroy      :       37.02       0.03       2.00       3.70

    ib0:
    step              total ms     max ms     min us  us / conn
    create id    :       42.61       0.14       1.00       4.26
    bind addr    :       27.05       0.03       2.00       2.70
    resolve addr :       40.65      10.73     869.00       4.06
    resolve route:      626.75       0.60     103.00      62.68
    create qp    :     3334.50       6.48     273.00     333.45
    connect      :     6310.29    6251.59   13298.00     631.03
    disconnect   :     5111.12     365.87     867.00     511.11
    destroy      :       36.57       0.02       2.00       3.66

    with this patch.

    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6551bab0b75b1f2499d97c2384cd3ac723da625f
Author: Hal Rosenstock <hal@mellanox.com>
Date:   Wed Jun 18 09:55:06 2014 -0700

    rsocket: Use malloc instead of calloc

    No need to clear allocated memory as immediately followed by
    memcpy which covers the allocated memory.

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 2a0944dc5e0e64290b8dfca332e6d5645c25b12e
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue May 27 11:43:05 2014 -0700

    librdmacm: Update rdma_accept man page

    Document NULL conn_param parameter for rdma_accept.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 386b97e807917a8ca7f6d12d66e34dc9441f7502
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu May 22 16:13:08 2014 -0700

    indexer: Free index_map resources when cleared

    Free memory allocated for index map entries when they are no
    longer in use.  To handle this, count the number of entries
    stored by the index map item arrays and release the arrays when
    no items are being tracked.

    This reduces valgrind noise.

    Problem reported by: Hannes Weisbach <hannes_weisbach@gmx.net>

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 397b1a79f077c2fd1ae35be15bc3a7d8918800f1
Author: Patrick MacArthur <pmacarth@iol.unh.edu>
Date:   Tue Apr 29 21:30:08 2014 -0700

    rstream: fix "-T resolve" detection

    Signed-off-by: Patrick MacArthur <pmacarth@iol.unh.edu>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit b3758e215f0abbea0d48996ef9b95f01530a4210
Author: shamir rabinovitch <shamir.rabinovitch@oracle.com>
Date:   Tue Apr 29 19:57:36 2014 -0700

    librdmacm: Fix verbs leak due to reentrancy issue

    Any call to ucma_init_device must be done under lock.

    Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 1291d9c7b52e829057458dad0e0ddd5aa9821a2a
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Apr 16 22:01:51 2014 -0700

    rsocket: Relax requirement for minimal inline data

    Inline data support is optional.  Allow rsockets to work
    with devices that do not support inline data, provided
    that they do support RDMA writes with immediate data.
    This allows rsockets to work over Intel TrueScale HCA.

    Patch derived from work by: Amir Hanania

    Signed-off-by: Amir Hanania <amir.hanania@intel.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit dfb5886db5975d209be6b31656c95b0d9c608195
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Apr 16 22:33:38 2014 -0700

    rsocket: Modify when control messages are available

    Rsockets currently tracks how many control messages (i.e.
    entries in the send queue) that are available using a
    single ctrl_avail counter.  Seems simple enough.

    However, control messages currently require the use of
    inline data.  In order to support control messages that
    do not use inline data, we need to associate each
    control message with a specific data buffer.  This will
    become easier to manage if we modify how we track when
    control messages are available.

    We replace the single ctrl_avail counter with two new
    counters.  The new counters conceptually treat control
    messages as if each message had its own sequence number.
    The sequence number will then be able to correspond to
    a specific data buffer in a follow up patch.

    ctrl_seqno will be used to indicate the current control
    message being sent.  ctrl_max_seqno will track the
    highest control message that may be sent.

    A side effect of this change is that we will be able to
    see how many control messages have been sent.  This also
    separates the updating of the control count on the
    sending  side, versus the receiving side.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 5ac6f3eab852606575f9affa515ec77b978a001c
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Apr 17 08:37:47 2014 -0700

    rsocket: Dedicate a fixed number of SQEs for control messages

    The number of SQEs allocated for control messages is set
    to 1 of 2 constant values (either 4 or 2).  A default
    value is used unless the size of the SQ is below a certain
    threshold (16 entries).  This results in additional code
    complexity, and it is highly unlikely that the SQ would
    ever be allocated smaller than 16 entries.

    Simplify the code to use a single constant value for the
    number of SQEs allocated for control messages.  This will
    also help in subsequent patches that will need to deal
    with HCAs that do not support inline data.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit d62a52590741da993c5ac3c39c82601c273175d9
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Apr 16 21:42:06 2014 -0700

    rsocket: Check max inline data after creating QP

    The ipath provider will ignore the max_inline_size
    specified as input into ibv_create_qp and instead
    return the size that it supports (which is 0) on
    output.

    Update the actual inline size returned from create QP,
    and check that it meets the minimum requirement for
    rsockets.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 8ce5823e02b6a38fd5ed7e11a1bb586847dbcb03
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Apr 29 20:11:35 2014 -0700

    librdmacm: Make ucma_init_all static

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 23ffef06cf462c4c5ac4ec5880b96c8719b64774
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Apr 9 12:19:25 2014 -0700

    librdmacm: Support lazy initialization

    librdmacm currently opens a device context per configured HCA. This is
    usually done in rdma_create_event_channel() or first time whenever
    ucma_init() is called. If a process is only going to use one of the
    configured HCAs/RDMA IPs then the remaining device contexts are not
    used/required. Opening a device context on each device apriori limits the
    maximum number of processes that can be supported on a node to the maximum
    number of open context supported per HCA regardless of number of HCAs present
    in the system.

    Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 984b1e3c189db9d156ea429c1726bd8739893247
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Mar 6 13:42:31 2014 -0800

    rsocket: Fix sbuf_bytes_avail counter 'overrun' with iwarp

    Reported-by: Jonas Pfefferle1 <JPF@zurich.ibm.com>

    "The problem is that on the client side sbuf_bytes_avail overflows
    in rs_poll_cq.  And from what I debugged so far there are 2
    completions for every send and this is because I use iWarp hardware
    which does not support write with immediate so there is one completion
    for the write and one for the send (both go into the default case
    and add the length to sbuf_bytes_avail)."

    To avoid the issue, we flag send message operations that are used
    in place of immediate data.  Other send message operations are
    not affected.  The completion code can then check whether the
    completion is for a send message which was paired with an RDMA
    write transaction and adjust the behavior accordingly.  Additionally,
    such send messages only carry the opcode in their WR_ID, with the
    data portion zeroed.  This avoids adding the length value twice.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit a2340d891eaa3f8a766a627bb4402ea85bcec6cb
Author: Hal Rosenstock <hal@mellanox.com>
Date:   Wed Mar 5 12:51:54 2014 -0800

    riostream: Add AF_IB support

    Allow the user to specify GID addresses (AF_IB) with riostream

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 8e760a4486f776df4f6728326dc7e8aed4a18971
Author: Hal Rosenstock <hal@mellanox.com>
Date:   Tue Mar 4 17:06:47 2014 -0800

    rsocket: Return EBADF on bad rsocket fd

    Eliminates potential seg faults when passed an invalid rsocket.

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 3c19c968a240a2c50809373f9aa90bdf3454f6b1
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Mar 4 16:59:20 2014 -0800

    man/rsocket: Enhance riomap documentation

    Document that the user must set IOMAPSIZE in order to
    use the riomap call.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 176e6e961d17c51ae1f2dad5a2f50546e3a2ecf4
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jan 27 12:10:55 2014 -0800

    librdmacm 1.0.18

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit b4603c864860e5e35379458cd1c0a42bb983af59
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jan 27 11:30:34 2014 -0800

    udaddy: Remove support for port space IB

    UD support for the IB port space requires that the application
    use rdma_create_ep, rather than rdma_create_id.  However, using
    rdma_create_ep results in address and route resolution being
    performed synchronously as part of the rdma_create_ep call.
    Since udaddy is an example, we want to show how it can be used
    with asynchronous events.  So, rather than update udaddy to
    use rdma_create_ep in order to support the IB port space, it
    would be better to remove that support.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit df7ecde0da9df4af5d8bc3e1ca472e2e5ec9095b
Author: Susan K. Coulter <markus@cj-fe2.lanl.gov>
Date:   Fri Jan 17 14:31:42 2014 -0800

    rsocket: Add keepalive logic

    Actually send and receive keepalive messages if keepalive is
    enabled on an rsocket.

    Signed-off-by: Susan K. Coulter <markus@cj-fe2.lanl.gov>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit da2b7c1cde16df7936273b1ebd38e7c25856c843
Author: Or Gerlitz <ogerlitz@mellanox.com>
Date:   Tue Dec 3 16:51:07 2013 -0800

    librdmacm: Add directives on binding to IPv6 any address to man pages

    Explain how to bind to IPv6 any address in the man pages for the examples

    Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit ea5851998c11b8211170179a6d924d4935fec0a1
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Nov 26 13:16:19 2013 -0800

    librdmacm: Check 'init' under mutex

    ucma_ib_init() does a quick check that access to ibacm has
    been initialized.  This check is done outside of the
    acm_lock mutex.  We need to check init again inside of
    holding the mutex to ensure that we don't run the
    initialization code twice.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit b70a390d8bd8a679571f06ab82e42d68a99bc7d2
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Nov 18 13:12:04 2013 -0800

    rping: Fix server reporting error on exit

    Commit e57196c71ddd850e14f3e66355f02786e4914f72
    rping: added checks to the return values functions
    resulted in the rping server always reporting that
    it failed.  Fix this by only failing in the case of
    an unexpected termination, and not the result of
    the client completing.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit c38d43aa2d5dc39dd98f813749dfa496875ad2e1
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Nov 11 10:24:54 2013 -0800

    Retrieve SGID after calling rdma_bind_addr

    A change was made to rdma_bind_addr when AF_IB is enabled
    to only retrieve the resulting bound address.  Previously,
    rdma_bind_addr would retrieve the corresponding SGID as
    well.  This breaks some apps which were checking the
    SGID after binding to an IP address.  Revert to the
    previous behavior of also retrieving the SGID after
    calling rdma_bind_addr.

    Tested-by: Christoph Lameter <cl@linux.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit faafeac08920a37994da19d72fd7ba1e64281f83
Author: Guy Shapiro <guysh@mellanox.com>
Date:   Tue Nov 5 19:52:20 2013 +0200

    librdmacm: Some fixes to man pages

    Fix the man pages of rdma_destroy_ep & rdma_destroy_qp to the correct return value (void).

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 41900ddd3b09ed0625a721b014692b8c5c6f7246
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Mon Nov 4 07:56:08 2013 -0500

    [librdmacm] Makefile.am: Add missing riostream man page to man_MANS

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 86520b86ffb45d3caf6e5bd94271f99deef0a5f9
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Aug 16 15:15:12 2013 -0700

    rsockets: Handle race between rshutdown and rpoll

    Multi-threaded applications which call rpoll and rshutdown
    simultaneously can hang.  Ceph developers reported an issue
    with the rsocket implementation.  Ceph calls rpoll in
    one thread, and while that thread is blocked in rpoll,
    a second thread may cann rshutdown on the socket.  In
    normal sockets, this results in the poll call unblocking
    (since a call to read on the socket will no longer block).
    however, rsockets does not free the thread blocked on the
    rpoll call.

    To fix this, we add some additional state checking to
    protect against threads calling rpoll and rshutdown
    simultaneously.  We also have the rshutdown call
    transition the QP into an error state.  This causes all
    posted receives to complete as flushed, which results
    in unblocking the thread in rpoll (to process the flushed
    receives).

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6152fb2ea9f4e331c63c00810ee4b920e6f1af2d
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Wed Sep 11 15:37:11 2013 -0400

    [librdmacm] man/rstream.1: Update man page to be consistent with rstream -h

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 77cab40df7f29bdc718a4a6da74c6145bf81468a
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Wed Sep 11 14:44:32 2013 -0400

    [librdmacm] rstream.c: Indicate when specified address family is unknown

    Signed-off-by: Hal Rosenstock >hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 05ea9d16da8808e464750fa976ba3d6151df0a54
Author: Hal Rosenstock <hal@dev.mellanox.co.il>
Date:   Wed Sep 11 14:44:28 2013 -0400

    [librdmacm] man/rdma_create_id.3: Add RDMA_PS_IB port space description

    Signed-off-by: Hal Rosenstock <hal@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit a53376c3c7887c52cf5b311b0b96cfa405a49d31
Author: Yan Droneaud <ydroneaud@opteya.com>
Date:   Tue Aug 27 11:37:54 2013 -0700

    examples: Add cmtime to .gitignore

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 78dd0371cdad6bf27e98903ba66cebc01f52f6d5
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Aug 22 15:29:15 2013 -0700

    rsocket: Update rsocket man page

    Update fork support and RDMA_ROUTE socket option.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 5a5ec3458c67b1b431a18a0acbc950ef4e31f87f
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Aug 22 12:00:54 2013 -0700

    cmtime: Add retry support for address and route resolution

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit b031fead061eb0d2874be8f259c84e21433e4505
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Aug 22 11:54:56 2013 -0700

    cmtime: Allow user to specify timeout values

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit afd49dcc2bb13052075e07a7593f6593b43606ce
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Aug 22 11:30:33 2013 -0700

    cmtime: Add ability to time rdma_bind_addr calls

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 2949a92960546b75c647bcf14fec1f4369fd17fa
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Aug 5 10:57:43 2013 -0700

    cmtime: Add example program that times rdma cm calls

    cmtime is a new sample program that measures how long it
    takes for each step in the connection process to complete.
    It can be used to analyze the performance of the various
    CM steps.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 8fd079abb8b2835908017f74ac70781d84e1e163
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Jul 26 09:52:55 2013 -0700

    rstream: Use rsocket option to set route directly

    If we're using GID addressing, rdma_getaddrinfo can return
    routing data directly.  Add an option for the user to
    indicate that rdma_getaddrinfo should be called in place of
    getaddrinfo.  And if routing data is available, call
    rsetsockopt to set the route.

    This helps test rsockets when ibacm and AF_IB support are
    available.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 21c703e5a594283cf119ce1286831df5d1483b34
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Aug 2 14:18:06 2013 -0700

    rsocket: Return 0 on success for SOL_RDMA options

    The processing of SOL_RDMA does not set the return value in
    the case of successfully handled options.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit e33755decd339712fc57fbe25bed704d24e8621a
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jun 10 12:33:20 2013 -0700

    rsockets: Add ability to set the IB route directly

    Add an RDMA specific rsocket option that allows the user
    to program the RDMA route directly.  This is useful
    for apps that have path record data available, e.g. from
    ibacm.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit f77079d79becf4476cb75ea5c816aae70724116e
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Sat Jul 20 19:22:55 2013 -0700

    examples: Add support for native IB addressing to samples

    Allow the user to specify GID addresses (AF_IB) into
    udaddy and rstream.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit ca353a3f985135504c429f82bf5a342ec26d11d4
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Jul 18 13:26:15 2013 -0700

    rsockets: Support native IB addressing on connected rsockets

    Update rsockets to support AF_IB addresses on connected rsockets.
    Support for datagram rsockets is more difficult as a result of
    using real UDP sockets for QP resolution, so that support is
    deferred.  For connected sockets, we need to update internal
    checks to handle AF_IB.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit a8becf33bbbb363cb2e0f2b45456bc82b345c453
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Sun Jul 28 11:20:54 2013 +0200

    [4/4] Declare 'server_port' as an unsigned variable

    Change the data type of the 'server_port' variable from signed to
    unsigned such that the cast in the fscanf() call can be removed.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>

commit eee05e6604a60b007249f97613d3bb513c07c20d
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Sun Jul 28 11:19:48 2013 +0200

    [3/4] rsocket: Remove the unused variable 'ret'

    The variable 'ret' is assigned a value but that value is never used.
    This triggers the following compiler warning:

    src/rsocket.c:3720:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]

    Hence remove this variable.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>

commit 9e758e0655242bb02aea5ec28fe4eeac2ec655f7
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Sun Jul 28 11:19:15 2013 +0200

    [2/4] cma: Remove the unused variable 'id_priv'

    The variable 'id_priv' is assigned a value but is never used.
    This triggers the following compiler warning:

    src/cma.c:1178:25: warning: variable 'id_priv' set but not used [-Wunused-but-set-variable]

    Hence remove this variable.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>

commit 2a31c855fc95d04370db56de5b35d8271e577f6f
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Sun Jul 28 11:18:36 2013 +0200

    [1/4] acm: Remove the unused variable 'pri_path'

    The variable 'pri_path' is assigned a value but is never used.
    This triggers the following compiler warning:

    src/acm.c:301:26: warning: variable 'pri_path' set but not used [-Wunused-but-set-variable]

    Hence remove this variable.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>

commit c8be3cfde6902e490fadd6a51206c1bcba3e3aa2
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jun 10 10:57:56 2013 -0700

    init: Remove USE_IB_ACM configuration option

    When the librdmacm is configured, it sets the USE_IB_ACM option
    if infininband/acm.h is found.  We can remove this option with
    very little overhead, which would allow a user to install
    ACM after installing the librdmacm, and the librdmacm would be
    able to make use of ACM.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6efb57780ca142ea4e3b0feebef554849047f79f
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jun 10 11:07:12 2013 -0700

    acm: Define needed ACM protocol messages

    The librdmacm needs message definitions used to communicate
    with the ibacm.  It currently pulls these from infiniband/acm.h,
    which is installed by ibacm.  This creates an install order
    dependency on ibacm.  However, work on the scalable SA has
    the ibacm using the librdmacm (via rsockets) for communication
    between the different SSA components.

    To resolve this issue, have the librdmacm define the message
    structures that it needs to communicate with ibacm.  The
    librdmacm already defines some ACM messages through configuration
    checks.  We just expand that capability, which isolates the librdmacm
    package from the ibacm package.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit c8173d50d1a8c2bbfb0c4459e05d3941175676b2
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Aug 29 15:02:54 2012 -0700

    cmatose: Allow user to specify address format

    Provide an option for the user to indicate the type of
    addresses used as input.  Support hostname, IPv4, IPv6,
    and GIDs.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 704f54358a1f74229cd9e982b530ca8327c7658e
Author: Yann Droneaud <ydroneaud@opteya.com>
Date:   Tue Jul 16 16:03:42 2013 -0700

    Remove executable mode bit on text files

    Source code and man page should not be executable.

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 3eb1704b2e11413077933d6d3a963d81d508bdf8
Author: Yann Droneaud <ydroneaud@opteya.com>
Date:   Tue Jul 16 23:59:52 2013 +0200

    Open files with "close on exec" flag

    File opened by librdmacm are not supposed to be inherited across
    exec*(), most of the files are of no use for another program, and
    others cannot be used without the associated memory mapping.

    This patch changes fopen() open() and socket() to always set
    close on exec flag.

    This patch also add checks to configure to guess if fopen() supports
    "e" flag. If O_CLOEXEC and SOCK_CLOEXEC are supported, fopen() should
    support "e". If not supported, its discarded according to POSIX. Many
    operating systems have support for fopen("e").

    You might find more information about close on exec in the following articles:

    - "Excuse me son, but your code is leaking !!!" by Dan Walsh
      http://danwalsh.livejournal.com/53603.html

    - "Secure File Descriptor Handling" by Ulrich Drepper
      http://udrepper.livejournal.com/20407.html

    Note: this patch won't set close on exec flag on file descriptors
    created by the kernel for completion channel and such.
    This is addressed by another kernel patch.

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit d53cd79c3bde6186bda6822a04708b9d2666f8ae
Author: Yann Droneaud <ydroneaud@opteya.com>
Date:   Tue Jul 16 23:59:50 2013 +0200

    Add .gitignore rules

    Add the list of files/patterns to be exclueded from git status output.
    Additionally it will prevent such files/patterns to be added and committed.

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit e9ef6c2e2d8141dd5c32472918b8c087f745524b
Author: Yann Droneaud <ydroneaud@opteya.com>
Date:   Tue Jul 16 23:59:49 2013 +0200

    configure: Use automake's option "subdir-objects"

    Following advice in "Autotool Mythbuster" [1], option subdir-objects
    can be used to have Makefiles create object files in the same
    directory than theirs source files.

    It reduces clobbering in the build directory.

    [1] "Autotool Mythbuster", by Diego Elio "Flameeyes" Petten`o
    http://www.flameeyes.eu/autotools-mythbuster/automake/nonrecursive.html

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 3edfff79d98f72b754278c854f871c4a22a7ce3c
Author: Yann Droneaud <ydroneaud@opteya.com>
Date:   Tue Jul 16 23:59:48 2013 +0200

    configure: Apply updates proposed by autoupdate

    'autoupdate' is a tool to help developer to update configure.ac.

    This patch applies a few fixes as suggested by autoupdate.

    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit f49ac33aaab147e5b126a75565f57e596600f372
Author: Jeff Squyres <jsquyres@cisco.com>
Date:   Tue Jul 16 23:59:47 2013 +0200

    autogen.sh: Use autoreconf in autogen.sh

    The old sequence of Autotools commands listed in autogen.sh is no
    longer correct.  Instead, just use the single "autoreconf" command,
    which will invoke all the Right Autotools commands in the correct
    order.

    Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 9d2f1b068e6fcd62853fe013c7cc4316dcb3fc4b
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Tue Jul 16 23:59:46 2013 +0200

    Makefile.am: Fix an automake warning

    Fix the following automake warning message:

        Makefile.am:1: `INCLUDES' is the old name for `AM_CPPFLAGS' (or `*_CPPFLAGS')

    A quote from the automake manual:

        INCLUDES
            This does the same job as AM_CPPFLAGS (or any per-target _CPPFLAGS variable
            if it is used). It is an older name for the same functionality. This
            variable is deprecated; we suggest using AM_CPPFLAGS and per-target
            _CPPFLAGS instead.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 715965b7231cd97d302e24c9e8ac89b2a57a57ab
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Tue Jul 16 23:59:45 2013 +0200

    Add "foreign" option to AM_INIT_AUTOMAKE

    Switch to the modern form of the AM_INIT_AUTOMAKE macro and tell
    automake that the librdmacm package does not follow the GNU
    standards. This change makes it possible to use 'autoreconf' for the
    librdmacm package.

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit ef095323918acac8fdc5386ebb7877fb5d34e5e3
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu May 2 13:47:51 2013 -0700

    lib: Rename configure.in to configure.ac

    Update to latest autotools naming.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit faae8c5db396985a40dc56ad6f82f89a16b8e9f1
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Apr 11 10:05:29 2013 -0700

    rsocket: Add support for iWarp

    iWarp does not support RDMA writes with immediate data.
    Instead of sending messages using immediate data, allow
    the rsocket protocol to exchange messages using sends.

    The rsocket protocol remains the same.  RDMA writes are
    used for data transfers, with send messages used to transfer
    rsocket protocol messages.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 0d6ca1300d88377ae7f9162457e64c541a4630eb
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Apr 12 14:41:52 2013 -0700

    rsocket: Merge usage of wr_id between stream and datagram svcs

    The rsocket data streaming and datagram services use different
    formats for the wr_id.  Although some differences are needed,
    we can make them more similar.  This will be useful when the
    wr_id is used for iwarp support, plus eliminates use of wr_id
    bits that aren't actually needed.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit e57928b701ded6c5417b5ac0c153a239bf947612
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Mar 5 17:18:11 2013 -0800

    librdmacm: Release 1.0.17

commit 24590bc96d8871d80124d68d182c915d7efcc9e6
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Tue Feb 19 20:03:58 2013 -0800

    librdmacm/rsocket: Fix resetting O_NONBLOCK after calling shutdown

    Shutdown switches an rsocket from nonblocking to blocking to
    ensure that all data has been sent.  After completing all
    transfers, it should switch back to nonblocking; this handles
    partial shutdown situations, where only half the connection
    is shut down.  However, the code uses the value of '1' to
    set the nonblocking flag, rather than O_NONBLOCK.  Fix this.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit be2a2a44663282cda1a60e05c3b85275c732acc6
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Feb 4 16:52:18 2013 -0800

    librdmacm/rstream: Reduce default transfer count

    1 million ping-pong transfers takes over 3 seconds to
    complete, and I'm impatient.  Reduce the default number of
    transfers for small messsages to speed up running
    performance tests, especially when running over slower
    connections, like TCP sockets or over a WAN.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 69fadb50636d98de57c9069b83adf6d2c5c77fc6
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Feb 1 17:17:34 2013 -0800

    librdmacm: Work-around kernel bug returning uid = 0

    Older kernels have a bug where it can report an event with the
    uid set to 0.  The librdmacm crashes when casting the uid to
    an rdma_cm_id and dereferencing the NULL pointer.

    There are a limited number of events where this can occur and
    in most cases it's safe to simply discard the event.  (This is
    what the kernel does anyway.)  However, it's possible for us
    to process an RDMA_CM_EVENT_ESTABLISHED event with the uid
    set to 0.  (See kernel commit 418edaaba96e58112b15c82b4907084e2a9caf42.)

    Although it's rare for this to occur, it does in fact happen
    in practice.  To work-around the kernel bug, when the uid of an
    established event is set to 0, we first try to locate the correct
    user space id based on related data before discarding the event.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 75e5b5b17d8a478b4fad5d9ee700edb943b050ba
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jan 28 14:56:25 2013 -0800

    librdmacm: Define ucma_ib_init when IB_ACM is disabled

    ucma_ib_init is only defined if IB_ACM is enabled, which is
    determined by looking for the infiniband/acm.h header file.
    Define ucma_ib_init when IB_ACM is disabled.

    Problem reportedy by Suresh Shelvapille <suri@baymicrosystems.com>

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 1f6088f85af3c60ba4d57de1d8f1098e06761237
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Jan 21 15:28:39 2013 -0800

    rsockets: Update rsocket man page

    Update man page to include recently added rsocket options
    and undocumented configuration file.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 56e1a7cd4904fbfde59adbdfedd5374e5bde2e87
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Jan 9 14:54:47 2013 -0800

    rsockets: Add support for existing UDP apps

    Support for existing UDP applications is done via the rspreload
    library.  However, when the preload library is loaded, socket
    calls used by rsockets get intercepted and converted into
    rsocket calls.

    The preload library was able to handle this for TCP rsockets
    by using a per thread variable and checking for recursive calls
    coming from rsockets back into the preload library.  The preload
    library would direct such calls to the real socket calls.

    The problem is more complex for UDP rsockets, which can invoke
    socket calls from an internal rsocket thread.  The result is that
    the preload library intercepts socket calls that originate from
    the rsocket library which are not recursive.

    Although, this is really a problem with the preload library,
    the simplest solution is for rsockets to fully initialize the
    library when allocating the first rsocket, versus deferring
    initialization until required.  The preload library can then
    detect the recursive calls.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6047e1991e95b96b1992f39a466457e584c01226
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Dec 5 15:58:03 2012 -0800

    examples/udpong: Add test program for rsocket datagrams

    Add a sample test program to test datagram rsockets.  Move
    common routines used by udpong and other test programs into
    a common source file.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit e6e93ed4231976eeab707b31e283be0a7acff6db
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Fri Nov 9 10:26:38 2012 -0800

    rsocket: Add datagram support

    Add datagram support through the rsocket API.

    Datagram support is handled through an entirely different protocol and
    internal implementation than streaming sockets.  Unlike connected rsockets,
    datagram rsockets are not necessarily bound to a network (IP) address.
    A datagram socket may use any number of network (IP) addresses, including
    those which map to different RDMA devices.  As a result, a single datagram
    rsocket must support using multiple RDMA devices and ports, and a datagram
    rsocket references a single UDP socket, plus zero or more UD QPs.

    Rsockets uses headers inserted before user data sent over UDP sockets to
    resolve remote UD QP numbers.  When a user first attempts to send a datagram
    to a remote address (IP and UDP port), rsockets will take the following steps:

    1. Store the destination address into a lookup table.
    2. Resolve which local network address should be used when sending
       to the specified destination.
    3. Allocate a UD QP on the RDMA device associated with the local address.
    4. Send the user's datagram to the remote UDP socket.

    A header is inserted before the user's datagram.  The header specifies the
    UD QP number associated with the local network address (IP and UDP port) of
    the send.

    A service thread is used to process messages received on the UDP socket.  This
    thread updates the rsocket lookup tables with the remote QPN and path record
    data.  The service thread forwards data received on the UDP socket to an
    rsocket QP.  After the remote QPN and path records have been resolved, datagram
    communication between two nodes are done over the UD QP.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit c6bfc1c5b15e6207188a97e8a5df0405cfd2587f
Author: Or Gerlitz <ogerlitz@mellanox.com>
Date:   Sun Dec 2 12:04:23 2012 +0000

    [librdmacm] Fixed build problem due to missing macro

    rsocket.c wasn't passing compilation as of missing definition for the
    container_of macro, fix it. Reported-by: Eyal Salamon <esalomon@mellanox.com>

    Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit ab0d488c1e3ba7658f61a4d8da022b5afc17737f
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Nov 5 11:53:03 2012 -0800

    rsocket: Remove fscanf build warnings

    Cast fscanf return values to (void) to indicate that we don't
    care if the call fails.  In the case of a failure, we simply
    fall back to using default values.

    Problem reported by Or Gerlitz <ogerlitz@mellanox.com>

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 7d92d0106f50e0371256e74863963a0e2e99a5c8
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Oct 24 10:23:52 2012 -0700

    riostream: Add example program for using iomap routines.

    riostream is based on rstream, but uses the new riomap, riounmap,
    and riowrite calls instead.  It runs a series of latency and
    bandwidth tests using remote iomapped memory.

    riostream is limited to using zero copy transfers at the
    receiving side only at this time.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit bb9fcba81acdfe34ea5df3bb23a45e0a486207da
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Sun Oct 21 14:16:03 2012 -0700

    rsocket: Add APIs for direct data placement

    We introduce rsocket extensions for supporting direct
    data placement (also known as zero copy).  Direct data
    placement avoids data copies into network buffers when
    sending or receiving data.  This patch implements zero
    copies on the receive side, but adds some basic framework for
    supporting it on the sending side.

    Integrating zero copy support into the existing socket APIs
    is difficult to achieve when the sockets are set as
    nonblocking.  Any such implementation is likely to be unusable
    in practice.  The problem stems from the fact that socket
    operations are synchronous in nature.  Support for asynchronous
    operations is limited to connection establishment.

    Therefore we introduce new calls to handle direct data placement.
    The use of the new calls is optional and does not affect the
    use of the existing calls.  An attempt is made to have the new
    routines integrate naturally with the existing APIs.  The new
    functions are: riomap, riounmap, and riowrite.  The basic operation
    can be described as follows:

    1. App A calls riomap to register a data buffer with the local
       RDMA device.  Riomap returns an off_t offset value that
       corresponds to the registered data buffer.  The app may
       select the offset value.
    2. Rsockets will transmit an internal message to the remote
       peer with information about the registration.  This exchange
       is hidden from the applications.
    3. App A sends a notification message to app B indicating that
       the remote iomapped buffer is now available to receive data.
    4. App B calls riowrite to transmit data directly into the
       riomapped data buffer.
    5. App B sends a notification message to app A indicating that
       data is available in the mapped buffer.
    6. After all transfers are complete, app A calls riounmap to
       deregister its data buffer.

    Riomap and riounmap are functionally equivalent to RDMA
    memory registration and deregistration routines.  They are loosely
    based on the mmap and munmap APIs.

    off_t riomap(int socket, void *buf, size_t len,
    	     int prot, int flags, off_t offset)

    Riomap registers an application buffer with the RDMA hardware
    associated with an rsocket.  The buffer is registered either for
    local only access (PROT_NONE) or for remote write access (PROT_WRITE).
    When registered for remote access, the buffer is mapped to a given
    offset.  The offset is either provided by the user, or if the user
    selects -1 for the offset, rsockets selects one.  The remote peer may
    access an iomapped buffer directly by specifying the correct offset.
    The mapping is not guaranteed to be available until after the remote
    peer receives a data transfer initiated after riomap has completed.

    int riounmap(int socket, void *buf, size_t len)

    Riounmap removes the mapping between a buffer and an rsocket.

    size_t riowrite(int socket, const void *buf, size_t count,
    		off_t offset, int flags)

    Riowrite allows an application to transfer data over an rsocket
    directly into a remotely iomapped buffer.  The remote buffer is specified
    through an offset parameter, which corresponds to a remote iomapped buffer.
    From the sender's perspective, riowrite behaves similar to rwrite.  From
    a receiver's view, riowrite transfers are silently redirected into a pre-
    determined data buffer.  Data is received automatically, and the receiver
    is not informed of the transfer.  However, iowrite data is still considered
    part of the data stream, such that iowrite data will be written before a
    subsequent transfer is received.  A message sent immediately after
    initiating an iowrite may be used to notify the receiver of the iowrite.

    It should be noted that the current implementation primarily focused
    on being functional for evaluation purposes.  Some checks have been
    deferred for subsequent patches, and performance is currently limited
    by linear lookups.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit d2e96e99bf1fc3d14e33c741502cb689c810a27b
Author: Roland Dreier <roland@purestorage.com>
Date:   Tue Oct 16 19:44:39 2012 +0000

    rdma_xserver/client: Fix man page formatting

    Putting 'r' at the beginning of a line in the nroff source for man pages
    is confusing to nroff because lines that start with a single quote
    character ' or a dot character . are treated as control lines, which is
    not what's intended here.  Some of the man page text ends up left out of
    the formatted output.

    Fix this by just wrapping the text slightly differently in the source
    (which doesn't matter since nroff reflows the text anyway).  Also add a
    missing ".TP" so that the -p and -c options are not run together in the
    formatted output.

    Signed-off-by: Roland Dreier <roland@purestorage.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 507cc241e8b212c3cf3ed0ffb04e37095bbf8bb3
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Mon Oct 8 10:33:21 2012 -0700

    librdmacm: Disable ACM support if ibacm.port is not found

    The librdmacm will try to connect port 6125 if ibacm.port is
    not found.  The problem is that some other service or application
    could be using that port and respond with garbage.  Rather
    than falling back to a hard coded port number, if ibacm.port
    is not found, simply disable ACM support.

    This has the effect of removing support for older versions
    of ibacm, unless the port file is created manually.

    Patch created based on feedback from Doug Ledford and Florian
    Weimer from RedHat.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit e57196c71ddd850e14f3e66355f02786e4914f72
Author: Dotan Barak <dotanb@dev.mellanox.co.il>
Date:   Tue Oct 9 12:27:52 2012 +0000

    [5/5,librdmacm] rping: added checks to the return values functions

    This will make rping to exit with return value other than zero in case of an
    error.

    Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
    Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 6c56dc404c999daa16a039f59b0160ab983acc98
Author: Dotan Barak <dotanb@dev.mellanox.co.il>
Date:   Tue Oct 9 12:27:51 2012 +0000

    [4/5,librdmacm] rstream: added missing return is accept() failed

    Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
    Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 41d6547bede80581b384b49bb35eac4fe089d08c
Author: Dotan Barak <dotanb@dev.mellanox.co.il>
Date:   Tue Oct 9 12:27:50 2012 +0000

    [3/5,librdmacm] rstream: initialize return value in server_connect()

    If use_async == 0 and rs_accept() passes (i.e. non negative value), then
    the return value from the function was uninitialized.

    Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
    Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 1f1a03dae14cbb25a43b1b56aa5ae689776edc11
Author: Dotan Barak <dotanb@dev.mellanox.co.il>
Date:   Tue Oct 9 12:27:49 2012 +0000

    [2/5,librdmacm] rsocket: added missing break

    Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
    Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
    Signed-off-by: Sean  Hefty <sean.hefty@intel.com>

commit eddbe8f0abc3d0f69755f0e510df2a7f21412c0b
Author: Dotan Barak <dotanb@dev.mellanox.co.il>
Date:   Tue Oct 9 12:27:48 2012 +0000

    [1/5,librdmacm] rsocket: add missing va_end() after calling va_end()

    Not doing so, may lead to resource leak.

    Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
    Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 8a92d0c3c8ce5f513dff974912143f6b0283f8e3
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Oct 4 12:01:50 2012 -0700

    ucmatose: Remove connect parameter passed into rdma_accept

    Pass in NULL for conn_param into rdma_accept to indicate
    that the passive side will use the values specified by the
    active side.

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 714af39b2bc2cc54dd2391a0df2c7e54856bc9c7
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Thu Oct 4 11:49:59 2012 -0700

    ucmatose: Fix number of connections to disconnect

    When ucmatose aborts because of issues trying to connect
    to the server, it moves to disconnecting all connections.
    However, not all connections may have been established.
    The result is that ucmatose will hang in disconnect_events.
    Fix this by setting the number of times that we need to
    disconnect to the number of times that we successfully
    connect.

    This problem is based on a report by Doug Ledford
    <dledford@redhat.com>

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>

commit 860b1a8784f1846be759eec46770cc723991479c
Author: Sean Hefty <sean.hefty@intel.com>
Date:   Wed Oct 3 15:05:20 2012 -0700

    rping: Reduce retry_count to fit in 3-bits

    retry_count is a 3 bit value on IB, reduce it from
    10 to 7.

    A value of 10 prevents rping from working over the Intel
    IB HCA.  Problem reported by Doug Ledford <dledford@redhat.com>

    The retry_count is also not set when calling rdma_accept.
    Rather than passing different values into rdma_accept than
    what was specified by the remote side, use the values given
    in the connection request.

    Signed-off-by: …
  • Loading branch information
shefty committed Aug 9, 2014
1 parent 485ded5 commit d3488da
Show file tree
Hide file tree
Showing 20 changed files with 725 additions and 678 deletions.
10 changes: 3 additions & 7 deletions librdmacm/configure.ac
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
dnl Process this file with autoconf to produce a configure script.

AC_PREREQ([2.63])
AC_INIT([librdmacm],[1.0.18],[linux-rdma@vger.kernel.org])
AC_INIT([librdmacm],[1.0.19-1],[linux-rdma@vger.kernel.org])
AC_CONFIG_SRCDIR([src/cma.c])
AC_CONFIG_AUX_DIR(config)
AC_CONFIG_MACRO_DIR(config)
Expand Down Expand Up @@ -40,14 +40,10 @@ dnl Checks for libraries
AC_CHECK_LIB(pthread, pthread_mutex_init, [],
AC_MSG_ERROR([pthread_mutex_init() not found. librdmacm requires libpthread.]))
if test "$disable_libcheck" != "yes"; then
AC_CHECK_LIB(ibverbs, ibv_get_device_list, [],
AC_MSG_ERROR([ibv_get_device_list() not found. librdmacm requires libibverbs.]))
AC_CHECK_LIB(ibverbs, ibv_cmd_open_xrcd, [],
AC_MSG_ERROR([ibv_cmd_open_xrcd() not found. librdmacm requires libibverbs 1.1.8 or later.]))
fi

AC_CHECK_MEMBER(struct ibv_path_record.service_id, [],
AC_DEFINE(DEFINE_PATH_RECORD, 1, [adding path record definition]),
[#include <infiniband/sa.h>])

dnl Check for gcc atomic intrinsics
AC_MSG_CHECKING(compiler support for atomics)
AC_LINK_IFELSE([AC_LANG_PROGRAM([[int i = 0;]],
Expand Down
227 changes: 189 additions & 38 deletions librdmacm/examples/cmtime.c
Original file line number Diff line number Diff line change
Expand Up @@ -84,10 +84,27 @@ struct node {
int retries;
};

struct list_head {
struct list_head *prev;
struct list_head *next;
struct rdma_cm_id *id;
};

struct work_list {
pthread_mutex_t lock;
pthread_cond_t cond;
struct list_head list;
};

#define INIT_LIST(x) ((x)->prev = (x)->next = (x))

static struct work_list req_work;
static struct work_list disc_work;
static struct node *nodes;
static struct timeval times[STEP_CNT][2];
static int connections = 100;
static int left[STEP_CNT];
static volatile int started[STEP_CNT];
static volatile int completed[STEP_CNT];
static struct ibv_qp_init_attr init_qp_attr;
static struct rdma_conn_param conn_param;

Expand All @@ -96,6 +113,59 @@ static struct rdma_conn_param conn_param;
#define start_time(s) gettimeofday(&times[s][0], NULL)
#define end_time(s) gettimeofday(&times[s][1], NULL)

static inline void __list_delete(struct list_head *list)
{
struct list_head *prev, *next;
prev = list->prev;
next = list->next;
prev->next = next;
next->prev = prev;
INIT_LIST(list);
}

static inline int __list_empty(struct work_list *list)
{
return list->list.next == &list->list;
}

static inline int list_empty(struct work_list *work_list)
{
pthread_mutex_lock(&work_list->lock);
return work_list->list.next == &work_list->list;
pthread_mutex_unlock(&work_list->lock);
}

static inline struct list_head *__list_remove_head(struct work_list *work_list)
{
struct list_head *list_item;

list_item = work_list->list.next;
__list_delete(list_item);
return list_item;
}

static inline struct list_head *list_remove_head(struct work_list *work_list)
{
struct list_head *list_item;
pthread_mutex_lock(&work_list->lock);
list_item = __list_remove_head(work_list);
pthread_mutex_unlock(&work_list->lock);
return list_item;
}

static inline void list_add_tail(struct work_list *work_list, struct list_head *req)
{
int empty;
pthread_mutex_lock(&work_list->lock);
empty = __list_empty(work_list);
req->prev = work_list->list.prev;
req->next = &work_list->list;
req->prev->next = work_list->list.prev = req;
pthread_mutex_unlock(&work_list->lock);
if (empty)
pthread_cond_signal(&work_list->cond);
}

static int zero_time(struct timeval *t)
{
return !(t->tv_sec || t->tv_usec);
Expand Down Expand Up @@ -140,28 +210,28 @@ static void show_perf(void)
static void addr_handler(struct node *n)
{
end_perf(n, STEP_RESOLVE_ADDR);
left[STEP_RESOLVE_ADDR]--;
completed[STEP_RESOLVE_ADDR]++;
}

static void route_handler(struct node *n)
{
end_perf(n, STEP_RESOLVE_ROUTE);
left[STEP_RESOLVE_ROUTE]--;
completed[STEP_RESOLVE_ROUTE]++;
}

static void conn_handler(struct node *n)
{
end_perf(n, STEP_CONNECT);
left[STEP_CONNECT]--;
completed[STEP_CONNECT]++;
}

static void disc_handler(struct node *n)
{
end_perf(n, STEP_DISCONNECT);
left[STEP_DISCONNECT]--;
completed[STEP_DISCONNECT]++;
}

static int req_handler(struct rdma_cm_id *id)
static void __req_handler(struct rdma_cm_id *id)
{
int ret;

Expand All @@ -176,17 +246,50 @@ static int req_handler(struct rdma_cm_id *id)
perror("failure accepting");
goto err;
}
return 0;
return;

err:
printf("failing connection request\n");
rdma_reject(id, NULL, 0);
return ret;
rdma_destroy_id(id);
return;
}

static void *req_handler_thread(void *arg)
{
struct list_head *work;
do {
pthread_mutex_lock(&req_work.lock);
if (__list_empty(&req_work))
pthread_cond_wait(&req_work.cond, &req_work.lock);
work = __list_remove_head(&req_work);
pthread_mutex_unlock(&req_work.lock);
__req_handler(work->id);
free(work);
} while (1);
return NULL;
}

static void *disc_handler_thread(void *arg)
{
struct list_head *work;
do {
pthread_mutex_lock(&disc_work.lock);
if (__list_empty(&disc_work))
pthread_cond_wait(&disc_work.cond, &disc_work.lock);
work = __list_remove_head(&disc_work);
pthread_mutex_unlock(&disc_work.lock);
rdma_disconnect(work->id);
rdma_destroy_id(work->id);
free(work);
} while (1);
return NULL;
}

static void cma_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
{
struct node *n = id->context;
struct list_head *request;

switch (event->event) {
case RDMA_CM_EVENT_ADDR_RESOLVED:
Expand All @@ -196,10 +299,15 @@ static void cma_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
route_handler(n);
break;
case RDMA_CM_EVENT_CONNECT_REQUEST:
if (req_handler(id)) {
rdma_ack_cm_event(event);
request = malloc(sizeof *request);
if (!request) {
perror("out of memory accepting connect request");
rdma_reject(id, NULL, 0);
rdma_destroy_id(id);
return;
} else {
INIT_LIST(request);
request->id = id;
list_add_tail(&req_work, request);
}
break;
case RDMA_CM_EVENT_ESTABLISHED:
Expand Down Expand Up @@ -235,12 +343,18 @@ static void cma_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
break;
case RDMA_CM_EVENT_DISCONNECTED:
if (!n) {
rdma_disconnect(id);
rdma_ack_cm_event(event);
rdma_destroy_id(id);
return;
}
disc_handler(n);
request = malloc(sizeof *request);
if (!request) {
perror("out of memory queueing disconnect request, handling synchronously");
rdma_disconnect(id);
rdma_destroy_id(id);
} else {
INIT_LIST(request);
request->id = id;
list_add_tail(&disc_work, request);
}
} else
disc_handler(n);
break;
case RDMA_CM_EVENT_DEVICE_REMOVAL:
/* Cleanup will occur after test completes. */
Expand Down Expand Up @@ -296,29 +410,67 @@ static void cleanup_nodes(void)
end_time(STEP_DESTROY);
}

static int process_events(int *left)
static void *process_events(void *arg)
{
struct rdma_cm_event *event;
int ret = 0;

while ((!left || *left) && !ret) {
while (!ret) {
ret = rdma_get_cm_event(channel, &event);
if (!ret) {
cma_handler(event->id, event);
} else {
perror("failure in rdma_get_cm_event in connect events");
perror("failure in rdma_get_cm_event in process_server_events");
ret = errno;
}
}

return ret;
return NULL;
}

static int run_server(void)
{
pthread_t req_thread, disc_thread;
struct rdma_cm_id *listen_id;
int ret;

INIT_LIST(&req_work.list);
INIT_LIST(&disc_work.list);
ret = pthread_mutex_init(&req_work.lock, NULL);
if (ret) {
perror("initializing mutex for req work");
return ret;
}

ret = pthread_mutex_init(&disc_work.lock, NULL);
if (ret) {
perror("initializing mutex for disc work");
return ret;
}

ret = pthread_cond_init(&req_work.cond, NULL);
if (ret) {
perror("initializing cond for req work");
return ret;
}

ret = pthread_cond_init(&disc_work.cond, NULL);
if (ret) {
perror("initializing cond for disc work");
return ret;
}

ret = pthread_create(&req_thread, NULL, req_handler_thread, NULL);
if (ret) {
perror("failed to create req handler thread");
return ret;
}

ret = pthread_create(&disc_thread, NULL, disc_handler_thread, NULL);
if (ret) {
perror("failed to create disconnect handler thread");
return ret;
}

ret = rdma_create_id(channel, &listen_id, NULL, hints.ai_port_space);
if (ret) {
perror("listen request failed");
Expand Down Expand Up @@ -351,6 +503,7 @@ static int run_server(void)

static int run_client(void)
{
pthread_t event_thread;
int i, ret;

ret = get_rdma_addr(src_addr, dst_addr, port, &hints, &rai);
Expand All @@ -365,6 +518,12 @@ static int run_client(void)
conn_param.private_data = rai->ai_connect;
conn_param.private_data_len = rai->ai_connect_len;

ret = pthread_create(&event_thread, NULL, process_events, NULL);
if (ret) {
perror("failure creating event thread");
return ret;
}

if (src_addr) {
printf("binding source address\n");
start_time(STEP_BIND);
Expand Down Expand Up @@ -395,11 +554,9 @@ static int run_client(void)
nodes[i].error = 1;
continue;
}
left[STEP_RESOLVE_ADDR]++;
started[STEP_RESOLVE_ADDR]++;
}
ret = process_events(&left[STEP_RESOLVE_ADDR]);
if (ret)
return ret;
while (started[STEP_RESOLVE_ADDR] != completed[STEP_RESOLVE_ADDR]) sched_yield();
end_time(STEP_RESOLVE_ADDR);

printf("resolving route\n");
Expand All @@ -415,11 +572,9 @@ static int run_client(void)
nodes[i].error = 1;
continue;
}
left[STEP_RESOLVE_ROUTE]++;
started[STEP_RESOLVE_ROUTE]++;
}
ret = process_events(&left[STEP_RESOLVE_ROUTE]);
if (ret)
return ret;
while (started[STEP_RESOLVE_ROUTE] != completed[STEP_RESOLVE_ROUTE]) sched_yield();
end_time(STEP_RESOLVE_ROUTE);

printf("creating qp\n");
Expand Down Expand Up @@ -450,11 +605,9 @@ static int run_client(void)
nodes[i].error = 1;
continue;
}
left[STEP_CONNECT]++;
started[STEP_CONNECT]++;
}
ret = process_events(&left[STEP_CONNECT]);
if (ret)
return ret;
while (started[STEP_CONNECT] != completed[STEP_CONNECT]) sched_yield();
end_time(STEP_CONNECT);

printf("disconnecting\n");
Expand All @@ -464,11 +617,9 @@ static int run_client(void)
continue;
start_perf(&nodes[i], STEP_DISCONNECT);
rdma_disconnect(nodes[i].id);
left[STEP_DISCONNECT]++;
started[STEP_DISCONNECT]++;
}
ret = process_events(&left[STEP_DISCONNECT]);
if (ret)
return ret;
while (started[STEP_DISCONNECT] != completed[STEP_DISCONNECT]) sched_yield();
end_time(STEP_DISCONNECT);

return ret;
Expand Down
Loading

0 comments on commit d3488da

Please sign in to comment.