Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
commit 42a1f96809d0dfb72e1abaad3923761eba4c6fe2 Merge: dc1317b fca6e10 Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Aug 8 11:53:16 2014 -0700 Merge branch 'dev' commit fca6e10a83eb592135fd47bc73600c7a955ca2b5 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Aug 7 15:43:00 2014 -0700 Release 1.0.19-1 hotfix commit dc1317b5668200bf0947dcac21a4d95959d333b3 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Aug 4 10:01:31 2014 -0700 indexer: Include errno.h directly Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 064c9cb1bddbab9e6d54ba301facfae7e1992455 Author: Ilya Nelkenbaum <ilyan@mellanox.com> Date: Mon Jul 28 15:48:09 2014 +0300 rsocket: Segmentation fault fix in case of multiple connections In case of more than 16 rsocket connections are established, "svc->rss" buffer is reallocated with more memory. Index 0 is reserved for the service's communication socket, and this is not taken in count when data is copied from old buffer location to new one. Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit a7287adaea52d21cd2d50f1621f8eda37c4c3c90 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 22 23:24:53 2014 -0700 udpong: Fix client_recv error check We only want to report an error if it's not EGAIN. The if statement is reversed. Correct it. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 806de778b1fe665dee2f62c7bf7211ab9bd2d53f Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jul 16 15:49:16 2014 -0700 Release 1.0.19 commit 8f53f2a5d3cb5d6c30fe5695b48268ea1bbe2ff0 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jul 16 13:44:56 2014 -0700 riostream: Only verify last data transfer Data verification will fail when running the bandwidth tests or the transfer count is > 1. The issue is that subsequent writes by the initiator side will overwrite the data in the target buffer before the receiver can verify that it is correct. To fix this, only verify that the data in the buffer is correct after the last transfer has completed. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit c4f8e22a6d078fa914cd4102d65fa854587e1248 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jul 7 08:40:44 2014 -0700 Revert "Revert "rsocket: Change keepalive to 0-byte RDMA write"" This reverts commit a34703c53259845dd20450a87eb6747030e23e8b. 0-byte RDMA writes appears to be working correctly with HCAs from 2 different vendors. The original problem that was reported turned out to be a user error. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit fa85dc408e28afd67b81c3a590fd874ef6fdc63a Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Jul 3 13:45:52 2014 -0700 rsocket: Update correct rsocket keepalive time When the keepalive time of an rsocket is updated, the updated information is forwarded to the keepalive service thread. However, the thread updates the time for the wrong service as shown: tcp_svc_timeouts[svc->cnt] = rs_get_time() + msg.rs->keepalive_time; The index into tcp_svc_timeouts should correspond to the rsocket being updated, not the last one in the list. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 1695abfa9f6bf429a5aa07117310c4ad87d4b3ae Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Jul 3 13:55:39 2014 -0700 rsocket: Fix removing rsocket from service thread When removing an rsocket from a service thread, we replace the removed service with the one at the end of the service list. This keeps the array tightly packed. However, rs_svc_rm_rs decrements the rsocket count before doing the swap. The result is that the entry at the end of the list gets dropped off. Defer decrementing the count until the swap has been made. In this case, the cnt value is a valid index into the array, because we start at index 1. Index 0 is used internally by the service thread. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 9085562c22189850e1f16b9a9955f11e79caac06 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jul 2 15:37:10 2014 -0700 rsocket: Fix crash resulting from keepalive timeout The following crash was reported by Hal Rosenstock, <hal@mellanox.com>, with keepalive enabled. The crash occurs in the keepalive thread attempting to send a keepalive message. report: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffecf08700 (LWP 6013)] rs_post_write (rs=<value optimized out>, sgl=0x0, nsge=0, wr_data=3758096385, flags=0, addr=0, rkey=0) at src/rsocket.c:1660 1660 return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad)); Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 (gdb) (gdb) p/x rs $1 = value has been optimized out So I added in the following to debug: 1660 if (rs == NULL) 1661 abort(); 1662 if (rs->cm_id == NULL) 1663 abort(); 1664 if (rs->cm_id->qp == NULL) 1665 abort(); 1666 return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad)); 1667 } And saw in gdb: Program received signal SIGABRT, Aborted. [Switching to Thread 0x7fffecf08700 (LWP 8096)] 0x00000030d50328a5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 (gdb) (gdb) bt #0 0x00000030d50328a5 in raise () from /lib64/libc.so.6 #1 0x00000030d5034085 in abort () from /lib64/libc.so.6 #2 0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0, nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665 #3 0x00007ffff058193d in tcp_svc_send_keepalive (arg=0x7ffff0789f20) at src/rsocket.c:4245 #4 tcp_svc_run (arg=0x7ffff0789f20) at src/rsocket.c:4279 #5 0x00000030d5807851 in start_thread () from /lib64/libpthread.so.0 #6 0x00000030d50e890d in clone () from /lib64/libc.so.6 (gdb) fr 2 #2 0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0, nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665 1665 abort(); So qp is NULL somehow... :end report There is an issue if an rsocket is closed without going through the rshutdown. int rshutdown(int socket, int how) { ... if (rs->opts & RS_OPT_SVC_ACTIVE) rs_notify_svc(&tcp_svc, rs, RS_SVC_REM_KEEPALIVE); We remove the rsocket from the keepalive thread in rshutdown. int rclose(int socket) { ... if (rs->state & rs_connected) rshutdown(socket, SHUT_RDWR); ... rs_free(rs); rclose will call shutdown only if we're connected. However, if the keepalive failed, the socket will be in an error state. So, no call to rshutdown, which will leave the freed rsocket on the keepalive thread's list. The fix is to to have rclose remove an rsocket from being processed by a service thread if it is still active. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 310f630ac87f1deee1534ab405d5b771b801c25d Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 22:52:40 2014 -0700 example/rdma_xclient/server: Update XRC support in sample programs Update rdma_xclient and rdma_xserver sample programs to test XRC data transfers. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 5662340a12429f8882be36d8787924be91a1cb74 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 22:56:43 2014 -0700 rdmacm: Update addrinfo with XRC support Remove internal defines, and use libibverbs exported values instead. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 995eb0c90c1a0967179fe3f523861e15300d3dfa Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 17:47:22 2014 -0700 rdmacm: Add support for XRC QPs Export a new extended create QP call. Add support for XRC QPs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 05eabc5335b95ab9d0d6a6132092fac6e1af1cc5 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 17:14:13 2014 -0700 rdmacm: Add support for allocating XRC SRQs Add extended SRQ creation call, to support allocating XRC SRQs. Use the rdma_cm_id qp type field to determine which type of SRQ should be allocated. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 89a782a52a48db38d917084233006fb91cbd0694 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 16:46:34 2014 -0700 rdmacm: Add functionality to allocate an XRCD XRC QPs and SRQs are associated by an XRC domain. Provide a call to allocate an XRCD, similar to how the rdmacm allocates a PD for the user. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit f916b9b6bfbcd86b5326d84c0dfa106ddc9c907c Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 16:17:30 2014 -0700 build: Add build support for XRC Modify autotools to check for and require a libibverbs version that includes XRC and extension support. Remove any code used to support older versions of libibverbs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 0cd1e9b0e7a2d438a0f1004e6c6ff1b6785c4038 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Jul 1 13:30:42 2014 -0700 librdmacm: Use SRQ in rdma_create_qp If an application has allocated an SRQ on an rdma_cm_id, use it when creating a QP. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 3e1fc1cfad65c83a05c8550d8e359c8b9223d859 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jun 25 12:56:18 2014 -0700 librdmacm: Remove NULL checks after calling alloca alloca doesn't return a NULL pointer on failure. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit a34703c53259845dd20450a87eb6747030e23e8b Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Jun 20 17:44:26 2014 -0700 Revert "rsocket: Change keepalive to 0-byte RDMA write" This reverts commit 0f2c76e81ecf1470cf152600c08c421e7e82b00e. Testing has shown that this does not always result in the keep-alive message working correctly, such that a broken connection is reported as having failed. The reason for this behavior is unknown, but revert the patch until the issue has been resolved. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 7b1eb6407f1f7a953673ab23a2d75f8a3cd8dbb9 Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Thu Jun 19 13:08:02 2014 -0400 librdmacm: In ucma_convert_path, fix selector values Intent is for the selectors to be equal to (exactly) rather than less than. Selector for exactly is value of 2 rather than 1. Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 7f0fbf984a5140efb76f93fef1f35202c617249d Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Thu Jun 19 11:54:11 2014 -0400 rsocket: Add support for RDMA_ROUTE option in rgetsockopt Create as many ibv_path_data structs from the RDMA route ibv_sa_path_rec struct for the rsocket based on how many fit into the supplied buffer. Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 106899eccc5fa61dd5e69c90bc0651ccd57e725f Merge: 6c7d6d3 0f2c76e Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jun 18 11:56:42 2014 -0700 Merge branch 'dev' commit 0f2c76e81ecf1470cf152600c08c421e7e82b00e Author: Susan K. Coulter <markus@cj-fe1.lanl.gov> Date: Mon Jun 16 10:28:08 2014 -0700 rsocket: Change keepalive to 0-byte RDMA write Signed-off-by: Susan K. Coulter <markus@cj-fe1.lanl.gov> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6c7d6d3038524c275ecfb7468b4455fe2cc39a19 Author: Doug Ledford <dledford@redhat.com> Date: Wed Jun 18 10:45:23 2014 -0700 rdma_server: handle IBV_SEND_INLINE correctly Not all RDMA devices support IBV_SEND_INLINE. At least some of those that don't will ignore the flag passed to rdma_post_send and attempt to send the command by using an sge entry instead. Because we don't register the send memory, this fails. The proper way to deal with the fact that IBV_SEND_INLINE is not guaranteed is to check the returned value in our cap struct to see if we have support for inline data, and if not, fall back to non-inline sends and to register the send memory region. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 9fe390a793203a13b0507472848e1e7da8c75bed Author: Doug Ledford <dledford@redhat.com> Date: Wed Jun 18 10:44:49 2014 -0700 rdma_client: handle IBV_SEND_INLINE correctly Not all RDMA devices support IBV_SEND_INLINE. At least some of those that don't will ignore the flag passed to rdma_post_send and attempt to send the command by using an sge entry instead. Because we don't register the send memory, this fails. The proper way to deal with the fact that IBV_SEND_INLINE is not guaranteed is to check the returned value in our cap struct to see if we have support for inline data, and if not, fall back to non-inline sends and to register the send memory region. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 2c2e44e144f17c2cef4af052ec91a680c9a81fb9 Author: Doug Ledford <dledford@redhat.com> Date: Wed Jun 18 10:44:28 2014 -0700 rdma_server: use perror, unwind allocs on failure Our main test function prints out errno directly, which is hard to read as it's not decoded at all. Instead, use perror() to make failures more readable. Also redo the failure flow so that we can do a simple unwind at the end of the function and just jump to the right unwind spot on error. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 1bc834aeca99a4dd0c5bea733e2735f148b4418c Author: Doug Ledford <dledford@redhat.com> Date: Wed Jun 18 10:44:13 2014 -0700 rdma_client: use perror, unwind allocs on failure Our main test function prints out errno directly, which is hard to read as it's not decoded at all. Instead, use perror() to make failures more readable. Also redo the failure flow so that we can do a simple unwind at the end of the function and just jump to the right unwind spot on error. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 05fc15b44805a23a4e8562d1953074243950dfbe Author: Doug Ledford <dledford@redhat.com> Date: Wed Jun 18 10:43:04 2014 -0700 cmtime: rework program to be multithread When using very large numbers of connections (10,000 was in use here), we ran into a problem where when we resolved a performance problem in the kernel cma.c code, we suddenly developed a new problem. That new problem turned out to be the fact that with the underlying kernel issue resolved, 10,000 connect requests would flood the server side of the test and the cmtime application would respond as quickly as possible. However, the client side would not bother to check any of the returns until after having sent all 10,000 connect requests. When the kernel had a serializing performance problem, this was OK. When it was fixed, this caused a general slowdown in connect operations due to overruns in the event processing. This patch causes the client side to fire off threads that will handle responses to connect requests as they come in instead of allowing them to backlog uncontrollably. Times for a 10,000 connect run changed from this: [root@rdma-dev-01 ~]# more 3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+.output ib1: step total ms max ms min us us / conn create id : 46.64 0.10 1.00 4.66 bind addr : 89.61 0.04 7.00 8.96 resolve addr : 50.63 26.18 23976.00 5.06 resolve route: 565.44 538.77 26736.00 56.54 create qp : 4028.31 5.70 326.00 402.83 connect : 50077.42 49990.49 90734.00 5007.74 disconnect : 5277.25 4850.35 380017.00 527.72 destroy : 42.15 0.04 2.00 4.21 ib0: step total ms max ms min us us / conn create id : 34.82 0.04 1.00 3.48 bind addr : 25.94 0.02 1.00 2.59 resolve addr : 48.18 25.01 22779.00 4.82 resolve route: 501.28 476.26 25071.00 50.13 create qp : 3274.12 6.05 257.00 327.41 connect : 55549.64 55490.32 62150.00 5554.96 disconnect : 5263.64 4851.18 375628.00 526.36 destroy : 47.20 0.07 2.00 4.72 to this: [root@rdma-dev-01 ~]# more 3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+-fixed-cmtime.output ib1: step total ms max ms min us us / conn create id : 34.45 0.08 1.00 3.44 bind addr : 88.41 0.04 7.00 8.84 resolve addr : 33.59 4.65 612.00 3.36 resolve route: 618.68 0.61 97.00 61.87 create qp : 4024.03 6.30 341.00 402.40 connect : 6983.35 6886.33 8509.00 698.33 disconnect : 5066.47 230.34 831.00 506.65 destroy : 37.02 0.03 2.00 3.70 ib0: step total ms max ms min us us / conn create id : 42.61 0.14 1.00 4.26 bind addr : 27.05 0.03 2.00 2.70 resolve addr : 40.65 10.73 869.00 4.06 resolve route: 626.75 0.60 103.00 62.68 create qp : 3334.50 6.48 273.00 333.45 connect : 6310.29 6251.59 13298.00 631.03 disconnect : 5111.12 365.87 867.00 511.11 destroy : 36.57 0.02 2.00 3.66 with this patch. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6551bab0b75b1f2499d97c2384cd3ac723da625f Author: Hal Rosenstock <hal@mellanox.com> Date: Wed Jun 18 09:55:06 2014 -0700 rsocket: Use malloc instead of calloc No need to clear allocated memory as immediately followed by memcpy which covers the allocated memory. Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 2a0944dc5e0e64290b8dfca332e6d5645c25b12e Author: Sean Hefty <sean.hefty@intel.com> Date: Tue May 27 11:43:05 2014 -0700 librdmacm: Update rdma_accept man page Document NULL conn_param parameter for rdma_accept. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 386b97e807917a8ca7f6d12d66e34dc9441f7502 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu May 22 16:13:08 2014 -0700 indexer: Free index_map resources when cleared Free memory allocated for index map entries when they are no longer in use. To handle this, count the number of entries stored by the index map item arrays and release the arrays when no items are being tracked. This reduces valgrind noise. Problem reported by: Hannes Weisbach <hannes_weisbach@gmx.net> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 397b1a79f077c2fd1ae35be15bc3a7d8918800f1 Author: Patrick MacArthur <pmacarth@iol.unh.edu> Date: Tue Apr 29 21:30:08 2014 -0700 rstream: fix "-T resolve" detection Signed-off-by: Patrick MacArthur <pmacarth@iol.unh.edu> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit b3758e215f0abbea0d48996ef9b95f01530a4210 Author: shamir rabinovitch <shamir.rabinovitch@oracle.com> Date: Tue Apr 29 19:57:36 2014 -0700 librdmacm: Fix verbs leak due to reentrancy issue Any call to ucma_init_device must be done under lock. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 1291d9c7b52e829057458dad0e0ddd5aa9821a2a Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Apr 16 22:01:51 2014 -0700 rsocket: Relax requirement for minimal inline data Inline data support is optional. Allow rsockets to work with devices that do not support inline data, provided that they do support RDMA writes with immediate data. This allows rsockets to work over Intel TrueScale HCA. Patch derived from work by: Amir Hanania Signed-off-by: Amir Hanania <amir.hanania@intel.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit dfb5886db5975d209be6b31656c95b0d9c608195 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Apr 16 22:33:38 2014 -0700 rsocket: Modify when control messages are available Rsockets currently tracks how many control messages (i.e. entries in the send queue) that are available using a single ctrl_avail counter. Seems simple enough. However, control messages currently require the use of inline data. In order to support control messages that do not use inline data, we need to associate each control message with a specific data buffer. This will become easier to manage if we modify how we track when control messages are available. We replace the single ctrl_avail counter with two new counters. The new counters conceptually treat control messages as if each message had its own sequence number. The sequence number will then be able to correspond to a specific data buffer in a follow up patch. ctrl_seqno will be used to indicate the current control message being sent. ctrl_max_seqno will track the highest control message that may be sent. A side effect of this change is that we will be able to see how many control messages have been sent. This also separates the updating of the control count on the sending side, versus the receiving side. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 5ac6f3eab852606575f9affa515ec77b978a001c Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Apr 17 08:37:47 2014 -0700 rsocket: Dedicate a fixed number of SQEs for control messages The number of SQEs allocated for control messages is set to 1 of 2 constant values (either 4 or 2). A default value is used unless the size of the SQ is below a certain threshold (16 entries). This results in additional code complexity, and it is highly unlikely that the SQ would ever be allocated smaller than 16 entries. Simplify the code to use a single constant value for the number of SQEs allocated for control messages. This will also help in subsequent patches that will need to deal with HCAs that do not support inline data. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit d62a52590741da993c5ac3c39c82601c273175d9 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Apr 16 21:42:06 2014 -0700 rsocket: Check max inline data after creating QP The ipath provider will ignore the max_inline_size specified as input into ibv_create_qp and instead return the size that it supports (which is 0) on output. Update the actual inline size returned from create QP, and check that it meets the minimum requirement for rsockets. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 8ce5823e02b6a38fd5ed7e11a1bb586847dbcb03 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Apr 29 20:11:35 2014 -0700 librdmacm: Make ucma_init_all static Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 23ffef06cf462c4c5ac4ec5880b96c8719b64774 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Apr 9 12:19:25 2014 -0700 librdmacm: Support lazy initialization librdmacm currently opens a device context per configured HCA. This is usually done in rdma_create_event_channel() or first time whenever ucma_init() is called. If a process is only going to use one of the configured HCAs/RDMA IPs then the remaining device contexts are not used/required. Opening a device context on each device apriori limits the maximum number of processes that can be supported on a node to the maximum number of open context supported per HCA regardless of number of HCAs present in the system. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 984b1e3c189db9d156ea429c1726bd8739893247 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Mar 6 13:42:31 2014 -0800 rsocket: Fix sbuf_bytes_avail counter 'overrun' with iwarp Reported-by: Jonas Pfefferle1 <JPF@zurich.ibm.com> "The problem is that on the client side sbuf_bytes_avail overflows in rs_poll_cq. And from what I debugged so far there are 2 completions for every send and this is because I use iWarp hardware which does not support write with immediate so there is one completion for the write and one for the send (both go into the default case and add the length to sbuf_bytes_avail)." To avoid the issue, we flag send message operations that are used in place of immediate data. Other send message operations are not affected. The completion code can then check whether the completion is for a send message which was paired with an RDMA write transaction and adjust the behavior accordingly. Additionally, such send messages only carry the opcode in their WR_ID, with the data portion zeroed. This avoids adding the length value twice. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit a2340d891eaa3f8a766a627bb4402ea85bcec6cb Author: Hal Rosenstock <hal@mellanox.com> Date: Wed Mar 5 12:51:54 2014 -0800 riostream: Add AF_IB support Allow the user to specify GID addresses (AF_IB) with riostream Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 8e760a4486f776df4f6728326dc7e8aed4a18971 Author: Hal Rosenstock <hal@mellanox.com> Date: Tue Mar 4 17:06:47 2014 -0800 rsocket: Return EBADF on bad rsocket fd Eliminates potential seg faults when passed an invalid rsocket. Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 3c19c968a240a2c50809373f9aa90bdf3454f6b1 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Mar 4 16:59:20 2014 -0800 man/rsocket: Enhance riomap documentation Document that the user must set IOMAPSIZE in order to use the riomap call. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 176e6e961d17c51ae1f2dad5a2f50546e3a2ecf4 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jan 27 12:10:55 2014 -0800 librdmacm 1.0.18 Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit b4603c864860e5e35379458cd1c0a42bb983af59 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jan 27 11:30:34 2014 -0800 udaddy: Remove support for port space IB UD support for the IB port space requires that the application use rdma_create_ep, rather than rdma_create_id. However, using rdma_create_ep results in address and route resolution being performed synchronously as part of the rdma_create_ep call. Since udaddy is an example, we want to show how it can be used with asynchronous events. So, rather than update udaddy to use rdma_create_ep in order to support the IB port space, it would be better to remove that support. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit df7ecde0da9df4af5d8bc3e1ca472e2e5ec9095b Author: Susan K. Coulter <markus@cj-fe2.lanl.gov> Date: Fri Jan 17 14:31:42 2014 -0800 rsocket: Add keepalive logic Actually send and receive keepalive messages if keepalive is enabled on an rsocket. Signed-off-by: Susan K. Coulter <markus@cj-fe2.lanl.gov> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit da2b7c1cde16df7936273b1ebd38e7c25856c843 Author: Or Gerlitz <ogerlitz@mellanox.com> Date: Tue Dec 3 16:51:07 2013 -0800 librdmacm: Add directives on binding to IPv6 any address to man pages Explain how to bind to IPv6 any address in the man pages for the examples Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit ea5851998c11b8211170179a6d924d4935fec0a1 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Nov 26 13:16:19 2013 -0800 librdmacm: Check 'init' under mutex ucma_ib_init() does a quick check that access to ibacm has been initialized. This check is done outside of the acm_lock mutex. We need to check init again inside of holding the mutex to ensure that we don't run the initialization code twice. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit b70a390d8bd8a679571f06ab82e42d68a99bc7d2 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Nov 18 13:12:04 2013 -0800 rping: Fix server reporting error on exit Commit e57196c71ddd850e14f3e66355f02786e4914f72 rping: added checks to the return values functions resulted in the rping server always reporting that it failed. Fix this by only failing in the case of an unexpected termination, and not the result of the client completing. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit c38d43aa2d5dc39dd98f813749dfa496875ad2e1 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Nov 11 10:24:54 2013 -0800 Retrieve SGID after calling rdma_bind_addr A change was made to rdma_bind_addr when AF_IB is enabled to only retrieve the resulting bound address. Previously, rdma_bind_addr would retrieve the corresponding SGID as well. This breaks some apps which were checking the SGID after binding to an IP address. Revert to the previous behavior of also retrieving the SGID after calling rdma_bind_addr. Tested-by: Christoph Lameter <cl@linux.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit faafeac08920a37994da19d72fd7ba1e64281f83 Author: Guy Shapiro <guysh@mellanox.com> Date: Tue Nov 5 19:52:20 2013 +0200 librdmacm: Some fixes to man pages Fix the man pages of rdma_destroy_ep & rdma_destroy_qp to the correct return value (void). Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 41900ddd3b09ed0625a721b014692b8c5c6f7246 Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Mon Nov 4 07:56:08 2013 -0500 [librdmacm] Makefile.am: Add missing riostream man page to man_MANS Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 86520b86ffb45d3caf6e5bd94271f99deef0a5f9 Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Aug 16 15:15:12 2013 -0700 rsockets: Handle race between rshutdown and rpoll Multi-threaded applications which call rpoll and rshutdown simultaneously can hang. Ceph developers reported an issue with the rsocket implementation. Ceph calls rpoll in one thread, and while that thread is blocked in rpoll, a second thread may cann rshutdown on the socket. In normal sockets, this results in the poll call unblocking (since a call to read on the socket will no longer block). however, rsockets does not free the thread blocked on the rpoll call. To fix this, we add some additional state checking to protect against threads calling rpoll and rshutdown simultaneously. We also have the rshutdown call transition the QP into an error state. This causes all posted receives to complete as flushed, which results in unblocking the thread in rpoll (to process the flushed receives). Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6152fb2ea9f4e331c63c00810ee4b920e6f1af2d Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Wed Sep 11 15:37:11 2013 -0400 [librdmacm] man/rstream.1: Update man page to be consistent with rstream -h Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 77cab40df7f29bdc718a4a6da74c6145bf81468a Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Wed Sep 11 14:44:32 2013 -0400 [librdmacm] rstream.c: Indicate when specified address family is unknown Signed-off-by: Hal Rosenstock >hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 05ea9d16da8808e464750fa976ba3d6151df0a54 Author: Hal Rosenstock <hal@dev.mellanox.co.il> Date: Wed Sep 11 14:44:28 2013 -0400 [librdmacm] man/rdma_create_id.3: Add RDMA_PS_IB port space description Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit a53376c3c7887c52cf5b311b0b96cfa405a49d31 Author: Yan Droneaud <ydroneaud@opteya.com> Date: Tue Aug 27 11:37:54 2013 -0700 examples: Add cmtime to .gitignore Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 78dd0371cdad6bf27e98903ba66cebc01f52f6d5 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Aug 22 15:29:15 2013 -0700 rsocket: Update rsocket man page Update fork support and RDMA_ROUTE socket option. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 5a5ec3458c67b1b431a18a0acbc950ef4e31f87f Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Aug 22 12:00:54 2013 -0700 cmtime: Add retry support for address and route resolution Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit b031fead061eb0d2874be8f259c84e21433e4505 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Aug 22 11:54:56 2013 -0700 cmtime: Allow user to specify timeout values Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit afd49dcc2bb13052075e07a7593f6593b43606ce Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Aug 22 11:30:33 2013 -0700 cmtime: Add ability to time rdma_bind_addr calls Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 2949a92960546b75c647bcf14fec1f4369fd17fa Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Aug 5 10:57:43 2013 -0700 cmtime: Add example program that times rdma cm calls cmtime is a new sample program that measures how long it takes for each step in the connection process to complete. It can be used to analyze the performance of the various CM steps. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 8fd079abb8b2835908017f74ac70781d84e1e163 Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Jul 26 09:52:55 2013 -0700 rstream: Use rsocket option to set route directly If we're using GID addressing, rdma_getaddrinfo can return routing data directly. Add an option for the user to indicate that rdma_getaddrinfo should be called in place of getaddrinfo. And if routing data is available, call rsetsockopt to set the route. This helps test rsockets when ibacm and AF_IB support are available. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 21c703e5a594283cf119ce1286831df5d1483b34 Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Aug 2 14:18:06 2013 -0700 rsocket: Return 0 on success for SOL_RDMA options The processing of SOL_RDMA does not set the return value in the case of successfully handled options. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit e33755decd339712fc57fbe25bed704d24e8621a Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jun 10 12:33:20 2013 -0700 rsockets: Add ability to set the IB route directly Add an RDMA specific rsocket option that allows the user to program the RDMA route directly. This is useful for apps that have path record data available, e.g. from ibacm. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit f77079d79becf4476cb75ea5c816aae70724116e Author: Sean Hefty <sean.hefty@intel.com> Date: Sat Jul 20 19:22:55 2013 -0700 examples: Add support for native IB addressing to samples Allow the user to specify GID addresses (AF_IB) into udaddy and rstream. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit ca353a3f985135504c429f82bf5a342ec26d11d4 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Jul 18 13:26:15 2013 -0700 rsockets: Support native IB addressing on connected rsockets Update rsockets to support AF_IB addresses on connected rsockets. Support for datagram rsockets is more difficult as a result of using real UDP sockets for QP resolution, so that support is deferred. For connected sockets, we need to update internal checks to handle AF_IB. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit a8becf33bbbb363cb2e0f2b45456bc82b345c453 Author: Bart Van Assche <bvanassche@acm.org> Date: Sun Jul 28 11:20:54 2013 +0200 [4/4] Declare 'server_port' as an unsigned variable Change the data type of the 'server_port' variable from signed to unsigned such that the cast in the fscanf() call can be removed. Signed-off-by: Bart Van Assche <bvanassche@acm.org> commit eee05e6604a60b007249f97613d3bb513c07c20d Author: Bart Van Assche <bvanassche@acm.org> Date: Sun Jul 28 11:19:48 2013 +0200 [3/4] rsocket: Remove the unused variable 'ret' The variable 'ret' is assigned a value but that value is never used. This triggers the following compiler warning: src/rsocket.c:3720:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable] Hence remove this variable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> commit 9e758e0655242bb02aea5ec28fe4eeac2ec655f7 Author: Bart Van Assche <bvanassche@acm.org> Date: Sun Jul 28 11:19:15 2013 +0200 [2/4] cma: Remove the unused variable 'id_priv' The variable 'id_priv' is assigned a value but is never used. This triggers the following compiler warning: src/cma.c:1178:25: warning: variable 'id_priv' set but not used [-Wunused-but-set-variable] Hence remove this variable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> commit 2a31c855fc95d04370db56de5b35d8271e577f6f Author: Bart Van Assche <bvanassche@acm.org> Date: Sun Jul 28 11:18:36 2013 +0200 [1/4] acm: Remove the unused variable 'pri_path' The variable 'pri_path' is assigned a value but is never used. This triggers the following compiler warning: src/acm.c:301:26: warning: variable 'pri_path' set but not used [-Wunused-but-set-variable] Hence remove this variable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> commit c8be3cfde6902e490fadd6a51206c1bcba3e3aa2 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jun 10 10:57:56 2013 -0700 init: Remove USE_IB_ACM configuration option When the librdmacm is configured, it sets the USE_IB_ACM option if infininband/acm.h is found. We can remove this option with very little overhead, which would allow a user to install ACM after installing the librdmacm, and the librdmacm would be able to make use of ACM. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6efb57780ca142ea4e3b0feebef554849047f79f Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jun 10 11:07:12 2013 -0700 acm: Define needed ACM protocol messages The librdmacm needs message definitions used to communicate with the ibacm. It currently pulls these from infiniband/acm.h, which is installed by ibacm. This creates an install order dependency on ibacm. However, work on the scalable SA has the ibacm using the librdmacm (via rsockets) for communication between the different SSA components. To resolve this issue, have the librdmacm define the message structures that it needs to communicate with ibacm. The librdmacm already defines some ACM messages through configuration checks. We just expand that capability, which isolates the librdmacm package from the ibacm package. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit c8173d50d1a8c2bbfb0c4459e05d3941175676b2 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Aug 29 15:02:54 2012 -0700 cmatose: Allow user to specify address format Provide an option for the user to indicate the type of addresses used as input. Support hostname, IPv4, IPv6, and GIDs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 704f54358a1f74229cd9e982b530ca8327c7658e Author: Yann Droneaud <ydroneaud@opteya.com> Date: Tue Jul 16 16:03:42 2013 -0700 Remove executable mode bit on text files Source code and man page should not be executable. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 3eb1704b2e11413077933d6d3a963d81d508bdf8 Author: Yann Droneaud <ydroneaud@opteya.com> Date: Tue Jul 16 23:59:52 2013 +0200 Open files with "close on exec" flag File opened by librdmacm are not supposed to be inherited across exec*(), most of the files are of no use for another program, and others cannot be used without the associated memory mapping. This patch changes fopen() open() and socket() to always set close on exec flag. This patch also add checks to configure to guess if fopen() supports "e" flag. If O_CLOEXEC and SOCK_CLOEXEC are supported, fopen() should support "e". If not supported, its discarded according to POSIX. Many operating systems have support for fopen("e"). You might find more information about close on exec in the following articles: - "Excuse me son, but your code is leaking !!!" by Dan Walsh http://danwalsh.livejournal.com/53603.html - "Secure File Descriptor Handling" by Ulrich Drepper http://udrepper.livejournal.com/20407.html Note: this patch won't set close on exec flag on file descriptors created by the kernel for completion channel and such. This is addressed by another kernel patch. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit d53cd79c3bde6186bda6822a04708b9d2666f8ae Author: Yann Droneaud <ydroneaud@opteya.com> Date: Tue Jul 16 23:59:50 2013 +0200 Add .gitignore rules Add the list of files/patterns to be exclueded from git status output. Additionally it will prevent such files/patterns to be added and committed. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit e9ef6c2e2d8141dd5c32472918b8c087f745524b Author: Yann Droneaud <ydroneaud@opteya.com> Date: Tue Jul 16 23:59:49 2013 +0200 configure: Use automake's option "subdir-objects" Following advice in "Autotool Mythbuster" [1], option subdir-objects can be used to have Makefiles create object files in the same directory than theirs source files. It reduces clobbering in the build directory. [1] "Autotool Mythbuster", by Diego Elio "Flameeyes" Petten`o http://www.flameeyes.eu/autotools-mythbuster/automake/nonrecursive.html Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 3edfff79d98f72b754278c854f871c4a22a7ce3c Author: Yann Droneaud <ydroneaud@opteya.com> Date: Tue Jul 16 23:59:48 2013 +0200 configure: Apply updates proposed by autoupdate 'autoupdate' is a tool to help developer to update configure.ac. This patch applies a few fixes as suggested by autoupdate. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit f49ac33aaab147e5b126a75565f57e596600f372 Author: Jeff Squyres <jsquyres@cisco.com> Date: Tue Jul 16 23:59:47 2013 +0200 autogen.sh: Use autoreconf in autogen.sh The old sequence of Autotools commands listed in autogen.sh is no longer correct. Instead, just use the single "autoreconf" command, which will invoke all the Right Autotools commands in the correct order. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 9d2f1b068e6fcd62853fe013c7cc4316dcb3fc4b Author: Bart Van Assche <bvanassche@acm.org> Date: Tue Jul 16 23:59:46 2013 +0200 Makefile.am: Fix an automake warning Fix the following automake warning message: Makefile.am:1: `INCLUDES' is the old name for `AM_CPPFLAGS' (or `*_CPPFLAGS') A quote from the automake manual: INCLUDES This does the same job as AM_CPPFLAGS (or any per-target _CPPFLAGS variable if it is used). It is an older name for the same functionality. This variable is deprecated; we suggest using AM_CPPFLAGS and per-target _CPPFLAGS instead. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 715965b7231cd97d302e24c9e8ac89b2a57a57ab Author: Bart Van Assche <bvanassche@acm.org> Date: Tue Jul 16 23:59:45 2013 +0200 Add "foreign" option to AM_INIT_AUTOMAKE Switch to the modern form of the AM_INIT_AUTOMAKE macro and tell automake that the librdmacm package does not follow the GNU standards. This change makes it possible to use 'autoreconf' for the librdmacm package. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit ef095323918acac8fdc5386ebb7877fb5d34e5e3 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu May 2 13:47:51 2013 -0700 lib: Rename configure.in to configure.ac Update to latest autotools naming. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit faae8c5db396985a40dc56ad6f82f89a16b8e9f1 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Apr 11 10:05:29 2013 -0700 rsocket: Add support for iWarp iWarp does not support RDMA writes with immediate data. Instead of sending messages using immediate data, allow the rsocket protocol to exchange messages using sends. The rsocket protocol remains the same. RDMA writes are used for data transfers, with send messages used to transfer rsocket protocol messages. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 0d6ca1300d88377ae7f9162457e64c541a4630eb Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Apr 12 14:41:52 2013 -0700 rsocket: Merge usage of wr_id between stream and datagram svcs The rsocket data streaming and datagram services use different formats for the wr_id. Although some differences are needed, we can make them more similar. This will be useful when the wr_id is used for iwarp support, plus eliminates use of wr_id bits that aren't actually needed. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit e57928b701ded6c5417b5ac0c153a239bf947612 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Mar 5 17:18:11 2013 -0800 librdmacm: Release 1.0.17 commit 24590bc96d8871d80124d68d182c915d7efcc9e6 Author: Sean Hefty <sean.hefty@intel.com> Date: Tue Feb 19 20:03:58 2013 -0800 librdmacm/rsocket: Fix resetting O_NONBLOCK after calling shutdown Shutdown switches an rsocket from nonblocking to blocking to ensure that all data has been sent. After completing all transfers, it should switch back to nonblocking; this handles partial shutdown situations, where only half the connection is shut down. However, the code uses the value of '1' to set the nonblocking flag, rather than O_NONBLOCK. Fix this. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit be2a2a44663282cda1a60e05c3b85275c732acc6 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Feb 4 16:52:18 2013 -0800 librdmacm/rstream: Reduce default transfer count 1 million ping-pong transfers takes over 3 seconds to complete, and I'm impatient. Reduce the default number of transfers for small messsages to speed up running performance tests, especially when running over slower connections, like TCP sockets or over a WAN. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 69fadb50636d98de57c9069b83adf6d2c5c77fc6 Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Feb 1 17:17:34 2013 -0800 librdmacm: Work-around kernel bug returning uid = 0 Older kernels have a bug where it can report an event with the uid set to 0. The librdmacm crashes when casting the uid to an rdma_cm_id and dereferencing the NULL pointer. There are a limited number of events where this can occur and in most cases it's safe to simply discard the event. (This is what the kernel does anyway.) However, it's possible for us to process an RDMA_CM_EVENT_ESTABLISHED event with the uid set to 0. (See kernel commit 418edaaba96e58112b15c82b4907084e2a9caf42.) Although it's rare for this to occur, it does in fact happen in practice. To work-around the kernel bug, when the uid of an established event is set to 0, we first try to locate the correct user space id based on related data before discarding the event. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 75e5b5b17d8a478b4fad5d9ee700edb943b050ba Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jan 28 14:56:25 2013 -0800 librdmacm: Define ucma_ib_init when IB_ACM is disabled ucma_ib_init is only defined if IB_ACM is enabled, which is determined by looking for the infiniband/acm.h header file. Define ucma_ib_init when IB_ACM is disabled. Problem reportedy by Suresh Shelvapille <suri@baymicrosystems.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 1f6088f85af3c60ba4d57de1d8f1098e06761237 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jan 21 15:28:39 2013 -0800 rsockets: Update rsocket man page Update man page to include recently added rsocket options and undocumented configuration file. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 56e1a7cd4904fbfde59adbdfedd5374e5bde2e87 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Jan 9 14:54:47 2013 -0800 rsockets: Add support for existing UDP apps Support for existing UDP applications is done via the rspreload library. However, when the preload library is loaded, socket calls used by rsockets get intercepted and converted into rsocket calls. The preload library was able to handle this for TCP rsockets by using a per thread variable and checking for recursive calls coming from rsockets back into the preload library. The preload library would direct such calls to the real socket calls. The problem is more complex for UDP rsockets, which can invoke socket calls from an internal rsocket thread. The result is that the preload library intercepts socket calls that originate from the rsocket library which are not recursive. Although, this is really a problem with the preload library, the simplest solution is for rsockets to fully initialize the library when allocating the first rsocket, versus deferring initialization until required. The preload library can then detect the recursive calls. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6047e1991e95b96b1992f39a466457e584c01226 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Dec 5 15:58:03 2012 -0800 examples/udpong: Add test program for rsocket datagrams Add a sample test program to test datagram rsockets. Move common routines used by udpong and other test programs into a common source file. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit e6e93ed4231976eeab707b31e283be0a7acff6db Author: Sean Hefty <sean.hefty@intel.com> Date: Fri Nov 9 10:26:38 2012 -0800 rsocket: Add datagram support Add datagram support through the rsocket API. Datagram support is handled through an entirely different protocol and internal implementation than streaming sockets. Unlike connected rsockets, datagram rsockets are not necessarily bound to a network (IP) address. A datagram socket may use any number of network (IP) addresses, including those which map to different RDMA devices. As a result, a single datagram rsocket must support using multiple RDMA devices and ports, and a datagram rsocket references a single UDP socket, plus zero or more UD QPs. Rsockets uses headers inserted before user data sent over UDP sockets to resolve remote UD QP numbers. When a user first attempts to send a datagram to a remote address (IP and UDP port), rsockets will take the following steps: 1. Store the destination address into a lookup table. 2. Resolve which local network address should be used when sending to the specified destination. 3. Allocate a UD QP on the RDMA device associated with the local address. 4. Send the user's datagram to the remote UDP socket. A header is inserted before the user's datagram. The header specifies the UD QP number associated with the local network address (IP and UDP port) of the send. A service thread is used to process messages received on the UDP socket. This thread updates the rsocket lookup tables with the remote QPN and path record data. The service thread forwards data received on the UDP socket to an rsocket QP. After the remote QPN and path records have been resolved, datagram communication between two nodes are done over the UD QP. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit c6bfc1c5b15e6207188a97e8a5df0405cfd2587f Author: Or Gerlitz <ogerlitz@mellanox.com> Date: Sun Dec 2 12:04:23 2012 +0000 [librdmacm] Fixed build problem due to missing macro rsocket.c wasn't passing compilation as of missing definition for the container_of macro, fix it. Reported-by: Eyal Salamon <esalomon@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit ab0d488c1e3ba7658f61a4d8da022b5afc17737f Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Nov 5 11:53:03 2012 -0800 rsocket: Remove fscanf build warnings Cast fscanf return values to (void) to indicate that we don't care if the call fails. In the case of a failure, we simply fall back to using default values. Problem reported by Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 7d92d0106f50e0371256e74863963a0e2e99a5c8 Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Oct 24 10:23:52 2012 -0700 riostream: Add example program for using iomap routines. riostream is based on rstream, but uses the new riomap, riounmap, and riowrite calls instead. It runs a series of latency and bandwidth tests using remote iomapped memory. riostream is limited to using zero copy transfers at the receiving side only at this time. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit bb9fcba81acdfe34ea5df3bb23a45e0a486207da Author: Sean Hefty <sean.hefty@intel.com> Date: Sun Oct 21 14:16:03 2012 -0700 rsocket: Add APIs for direct data placement We introduce rsocket extensions for supporting direct data placement (also known as zero copy). Direct data placement avoids data copies into network buffers when sending or receiving data. This patch implements zero copies on the receive side, but adds some basic framework for supporting it on the sending side. Integrating zero copy support into the existing socket APIs is difficult to achieve when the sockets are set as nonblocking. Any such implementation is likely to be unusable in practice. The problem stems from the fact that socket operations are synchronous in nature. Support for asynchronous operations is limited to connection establishment. Therefore we introduce new calls to handle direct data placement. The use of the new calls is optional and does not affect the use of the existing calls. An attempt is made to have the new routines integrate naturally with the existing APIs. The new functions are: riomap, riounmap, and riowrite. The basic operation can be described as follows: 1. App A calls riomap to register a data buffer with the local RDMA device. Riomap returns an off_t offset value that corresponds to the registered data buffer. The app may select the offset value. 2. Rsockets will transmit an internal message to the remote peer with information about the registration. This exchange is hidden from the applications. 3. App A sends a notification message to app B indicating that the remote iomapped buffer is now available to receive data. 4. App B calls riowrite to transmit data directly into the riomapped data buffer. 5. App B sends a notification message to app A indicating that data is available in the mapped buffer. 6. After all transfers are complete, app A calls riounmap to deregister its data buffer. Riomap and riounmap are functionally equivalent to RDMA memory registration and deregistration routines. They are loosely based on the mmap and munmap APIs. off_t riomap(int socket, void *buf, size_t len, int prot, int flags, off_t offset) Riomap registers an application buffer with the RDMA hardware associated with an rsocket. The buffer is registered either for local only access (PROT_NONE) or for remote write access (PROT_WRITE). When registered for remote access, the buffer is mapped to a given offset. The offset is either provided by the user, or if the user selects -1 for the offset, rsockets selects one. The remote peer may access an iomapped buffer directly by specifying the correct offset. The mapping is not guaranteed to be available until after the remote peer receives a data transfer initiated after riomap has completed. int riounmap(int socket, void *buf, size_t len) Riounmap removes the mapping between a buffer and an rsocket. size_t riowrite(int socket, const void *buf, size_t count, off_t offset, int flags) Riowrite allows an application to transfer data over an rsocket directly into a remotely iomapped buffer. The remote buffer is specified through an offset parameter, which corresponds to a remote iomapped buffer. From the sender's perspective, riowrite behaves similar to rwrite. From a receiver's view, riowrite transfers are silently redirected into a pre- determined data buffer. Data is received automatically, and the receiver is not informed of the transfer. However, iowrite data is still considered part of the data stream, such that iowrite data will be written before a subsequent transfer is received. A message sent immediately after initiating an iowrite may be used to notify the receiver of the iowrite. It should be noted that the current implementation primarily focused on being functional for evaluation purposes. Some checks have been deferred for subsequent patches, and performance is currently limited by linear lookups. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit d2e96e99bf1fc3d14e33c741502cb689c810a27b Author: Roland Dreier <roland@purestorage.com> Date: Tue Oct 16 19:44:39 2012 +0000 rdma_xserver/client: Fix man page formatting Putting 'r' at the beginning of a line in the nroff source for man pages is confusing to nroff because lines that start with a single quote character ' or a dot character . are treated as control lines, which is not what's intended here. Some of the man page text ends up left out of the formatted output. Fix this by just wrapping the text slightly differently in the source (which doesn't matter since nroff reflows the text anyway). Also add a missing ".TP" so that the -p and -c options are not run together in the formatted output. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 507cc241e8b212c3cf3ed0ffb04e37095bbf8bb3 Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Oct 8 10:33:21 2012 -0700 librdmacm: Disable ACM support if ibacm.port is not found The librdmacm will try to connect port 6125 if ibacm.port is not found. The problem is that some other service or application could be using that port and respond with garbage. Rather than falling back to a hard coded port number, if ibacm.port is not found, simply disable ACM support. This has the effect of removing support for older versions of ibacm, unless the port file is created manually. Patch created based on feedback from Doug Ledford and Florian Weimer from RedHat. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit e57196c71ddd850e14f3e66355f02786e4914f72 Author: Dotan Barak <dotanb@dev.mellanox.co.il> Date: Tue Oct 9 12:27:52 2012 +0000 [5/5,librdmacm] rping: added checks to the return values functions This will make rping to exit with return value other than zero in case of an error. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 6c56dc404c999daa16a039f59b0160ab983acc98 Author: Dotan Barak <dotanb@dev.mellanox.co.il> Date: Tue Oct 9 12:27:51 2012 +0000 [4/5,librdmacm] rstream: added missing return is accept() failed Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 41d6547bede80581b384b49bb35eac4fe089d08c Author: Dotan Barak <dotanb@dev.mellanox.co.il> Date: Tue Oct 9 12:27:50 2012 +0000 [3/5,librdmacm] rstream: initialize return value in server_connect() If use_async == 0 and rs_accept() passes (i.e. non negative value), then the return value from the function was uninitialized. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 1f1a03dae14cbb25a43b1b56aa5ae689776edc11 Author: Dotan Barak <dotanb@dev.mellanox.co.il> Date: Tue Oct 9 12:27:49 2012 +0000 [2/5,librdmacm] rsocket: added missing break Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit eddbe8f0abc3d0f69755f0e510df2a7f21412c0b Author: Dotan Barak <dotanb@dev.mellanox.co.il> Date: Tue Oct 9 12:27:48 2012 +0000 [1/5,librdmacm] rsocket: add missing va_end() after calling va_end() Not doing so, may lead to resource leak. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 8a92d0c3c8ce5f513dff974912143f6b0283f8e3 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Oct 4 12:01:50 2012 -0700 ucmatose: Remove connect parameter passed into rdma_accept Pass in NULL for conn_param into rdma_accept to indicate that the passive side will use the values specified by the active side. Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 714af39b2bc2cc54dd2391a0df2c7e54856bc9c7 Author: Sean Hefty <sean.hefty@intel.com> Date: Thu Oct 4 11:49:59 2012 -0700 ucmatose: Fix number of connections to disconnect When ucmatose aborts because of issues trying to connect to the server, it moves to disconnecting all connections. However, not all connections may have been established. The result is that ucmatose will hang in disconnect_events. Fix this by setting the number of times that we need to disconnect to the number of times that we successfully connect. This problem is based on a report by Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> commit 860b1a8784f1846be759eec46770cc723991479c Author: Sean Hefty <sean.hefty@intel.com> Date: Wed Oct 3 15:05:20 2012 -0700 rping: Reduce retry_count to fit in 3-bits retry_count is a 3 bit value on IB, reduce it from 10 to 7. A value of 10 prevents rping from working over the Intel IB HCA. Problem reported by Doug Ledford <dledford@redhat.com> The retry_count is also not set when calling rdma_accept. Rather than passing different values into rdma_accept than what was specified by the remote side, use the values given in the connection request. Signed-off-by: …
- Loading branch information