Skip to content
This repository has been archived by the owner on Sep 5, 2023. It is now read-only.

After fork, child process will crash in close invoking #751

Closed
qianlong-ql opened this issue Jan 21, 2021 · 24 comments
Closed

After fork, child process will crash in close invoking #751

qianlong-ql opened this issue Jan 21, 2021 · 24 comments

Comments

@qianlong-ql
Copy link

qianlong-ql commented Jan 21, 2021

#include <infiniband/verbs.h>

int main (int argc, char *argv[]) {
    if (argc < 3) {
        fprintf(stderr, USAGE_STR, argv[0]);
        exit(-1);
    }
    ibv_fork_init(); //to support fork

    char *addr = argv[1];
    char *port = argv[2];
    int ret;

    struct rpma_peer *peer = NULL;
    void *dst_ptr;
    struct rpma_mr_local *dst_mr;
    struct ibv_context *dev = NULL;

    ret = rpma_utils_get_ibv_context(addr, RPMA_UTIL_IBV_CONTEXT_LOCAL, &dev);
    if (ret)
        return ret;

    rpma_peer_new(dev, &peer);

    size_t dst_size = 320;
    dst_ptr = malloc_aligned(dst_size);
    rpma_mr_reg(peer, dst_ptr, dst_size, RPMA_MR_USAGE_READ_DST,
                      &dst_mr);

    int pid = fork();
    if (pid == 0) {
        FILE *fp = fopen("tmpfile", "w");
        fclose(fp);  //crash here
        _exit(0);
    } else if (pid < 0) {
        _exit(0);
    } else {
        int status;
        waitpid(pid, &status, 0);
    }

    return 0;
}

the up code invoke ibv_fork_init to support fork , after fork clild process will crash in fclose. crash will not happen if ibv_fork_init or rpma_mr_reg not invoked. I also find it's safe if ibv_fork_init and rpma_mr_reg invoke in different thread.

the backtrace like this:
#0 0x00007f9f0d2dd81d in _int_free () from /lib64/libc.so.6
#1 0x00007f9f0d2ca047 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2 0x0000000000402198 in main (argc=-1, argv=0x800076)

@grom72
Copy link
Contributor

grom72 commented Jan 26, 2021

Hi @qianlong-ql,
Thank you for reporting this issue.
It looks like the problem is in some library beneath the librpma as we cannot reproduce it in our environment.
Could you provide your environment spec OS, kernel, libibverbs?

@grom72
Copy link
Contributor

grom72 commented Jan 26, 2021

Hi @qianlong-ql
coudl you also check how your application behave when libibverbs API is used directly:

	struct ibv_pd *pd = ibv_alloc_pd(dev);
	struct ibv_mr *ibv_mr_ptr = ibv_reg_mr(pd, dst_ptr, dst_size,
			IBV_ACCESS_LOCAL_WRITE);
	if (ibv_mr_ptr == NULL) {
		fprintf(stderr, "ibv_reg_mr return an error\n");
		return -1;
	}

instead of rpma_peer_new() and rpma_mr_reg()

@qianlong-ql
Copy link
Author

I try replace rpma_peer_new & rpma_mr_reg to ibv_alloc_pd & ibv_reg_mr, Its also crash in fclose.
my current env:
nic-drivers-mellanox-rdma-2.0.1fib6fix-1.noarch
nic-libs-mellanox-rdma-2.0.1-2.x86_64
os : Linux iz8vb4s0jlsfk3k00ro17az 3.10.0-693.5.2.el7.ecs.2.x86_64 #1 SMP Fri Jul 13 12:21:44 CST 2018 x86_64 x86_64 x86_64 GNU/Linux

I try update rpm to nic-libs-mellanox-rdma-3.0.2-1.x86_64 & nic-drivers-mellanox-rdma-3.0.2-10.noarch, but not work

@grom72
Copy link
Contributor

grom72 commented Jan 26, 2021

Do we know that ibv_fork_init() and rpma_mr_reg() both return 0?

@qianlong-ql
Copy link
Author

Do we know that ibv_fork_init() and rpma_mr_reg() both return 0?

yes

@grom72
Copy link
Contributor

grom72 commented Jan 26, 2021

Can you run this code under valgrind-memcheck?

@qianlong-ql
Copy link
Author

It won't crash when run with valgrind, here is the output, I don't see any useful information

valgrind --leak-check=full --trace-children=yes --undef-value-errors=no --track-fds=yes --tool=memcheck ./server 200.1.15.2 1
==112158== Memcheck, a memory error detector
==112158== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==112158== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==112158== Command: ./server 200.1.15.2 1
==112158== 
==112221== 
==112221== FILE DESCRIPTORS: 4 open at exit.
==112221== Open file descriptor 4: /dev/infiniband/uverbs0
==112221==    at 0x5346A30: __open_nocancel (in /usr/lib64/libc-2.17.so)
==112221==    by 0x504ACE6: ibv_open_device@@IBVERBS_1.1 (device.c:604)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== Open file descriptor 2: /dev/pts/0
==112221==    <inherited from parent>
==112221== 
==112221== Open file descriptor 1: /dev/pts/0
==112221==    <inherited from parent>
==112221== 
==112221== Open file descriptor 0: /dev/pts/0
==112221==    <inherited from parent>
==112221== 
==112221== 
==112221== HEAP SUMMARY:
==112221==     in use at exit: 224,951 bytes in 117 blocks
==112221==   total heap usage: 217 allocs, 100 frees, 373,872 bytes allocated
==112221== 
==112221== 48 bytes in 1 blocks are possibly lost in loss record 18 of 44
==112221==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
==112221==    by 0x504A676: __ibv_exp_use_priv_env (device.c:548)
==112221==    by 0x504B06B: ibv_exp_use_priv_env (verbs_exp.h:3196)
==112221==    by 0x504B06B: ibv_open_device@@IBVERBS_1.1 (device.c:694)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== 77 bytes in 1 blocks are definitely lost in loss record 24 of 44
==112221==    at 0x4C2C291: realloc (vg_replace_malloc.c:836)
==112221==    by 0x52CEC7A: vasprintf (in /usr/lib64/libc-2.17.so)
==112221==    by 0x52AA4F6: asprintf (in /usr/lib64/libc-2.17.so)
==112221==    by 0x504BBB9: load_driver (init.c:254)
==112221==    by 0x504BBB9: load_drivers (init.c:311)
==112221==    by 0x504C571: ibverbs_get_device_list (init.c:625)
==112221==    by 0x5049F2F: count_devices (device.c:97)
==112221==    by 0x584CE6F: pthread_once (in /usr/lib64/libpthread-2.17.so)
==112221==    by 0x504A82B: ibv_get_device_list@@IBVERBS_1.1 (device.c:122)
==112221==    by 0x562A035: ucma_init (cma.c:249)
==112221==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==112221==    by 0x4E3C40B: rpma_info_new (info.c:53)
==112221==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==112221== 
==112221== 204 bytes in 24 blocks are possibly lost in loss record 31 of 44
==112221==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==112221==    by 0x52E3809: strdup (in /usr/lib64/libc-2.17.so)
==112221==    by 0x504A3FA: vsetenv (device.c:427)
==112221==    by 0x504A729: clone_env (device.c:477)
==112221==    by 0x504A729: __ibv_exp_use_priv_env (device.c:559)
==112221==    by 0x504B06B: ibv_exp_use_priv_env (verbs_exp.h:3196)
==112221==    by 0x504B06B: ibv_open_device@@IBVERBS_1.1 (device.c:694)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== 576 bytes in 24 blocks are possibly lost in loss record 36 of 44
==112221==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
==112221==    by 0x504A3E6: vsetenv (device.c:423)
==112221==    by 0x504A729: clone_env (device.c:477)
==112221==    by 0x504A729: __ibv_exp_use_priv_env (device.c:559)
==112221==    by 0x504B06B: ibv_exp_use_priv_env (verbs_exp.h:3196)
==112221==    by 0x504B06B: ibv_open_device@@IBVERBS_1.1 (device.c:694)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== 1,935 bytes in 24 blocks are possibly lost in loss record 40 of 44
==112221==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==112221==    by 0x52E3809: strdup (in /usr/lib64/libc-2.17.so)
==112221==    by 0x504A40F: vsetenv (device.c:431)
==112221==    by 0x504A729: clone_env (device.c:477)
==112221==    by 0x504A729: __ibv_exp_use_priv_env (device.c:559)
==112221==    by 0x504B06B: ibv_exp_use_priv_env (verbs_exp.h:3196)
==112221==    by 0x504B06B: ibv_open_device@@IBVERBS_1.1 (device.c:694)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== 2,816 bytes in 1 blocks are possibly lost in loss record 42 of 44
==112221==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
==112221==    by 0x608EE22: mlx5_alloc_context (mlx5.c:984)
==112221==    by 0x504AD81: ibv_open_device@@IBVERBS_1.1 (device.c:631)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== 202,520 bytes in 1 blocks are possibly lost in loss record 44 of 44
==112221==    at 0x4C2C089: calloc (vg_replace_malloc.c:762)
==112221==    by 0x504AD1D: ibv_open_device@@IBVERBS_1.1 (device.c:616)
==112221==    by 0x5627846: ucma_open_device (cma.c:296)
==112221==    by 0x5627846: ucma_init_device.part.2 (cma.c:314)
==112221==    by 0x5627A78: ucma_init_device (cma.c:459)
==112221==    by 0x5627A78: ucma_get_device (cma.c:454)
==112221==    by 0x5627B67: ucma_query_addr (cma.c:697)
==112221==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==112221==    by 0x5628175: rdma_bind_addr (cma.c:884)
==112221==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==112221==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==112221==    by 0x402140: main (server.c:405)
==112221== 
==112221== LEAK SUMMARY:
==112221==    definitely lost: 77 bytes in 1 blocks
==112221==    indirectly lost: 0 bytes in 0 blocks
==112221==      possibly lost: 208,099 bytes in 75 blocks
==112221==    still reachable: 16,775 bytes in 41 blocks
==112221==         suppressed: 0 bytes in 0 blocks
==112221== Reachable blocks (those to which a pointer was found) are not shown.
==112221== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==112221== 
==112221== For lists of detected and suppressed errors, rerun with: -s
==112221== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0)
==112158== 
==112158== FILE DESCRIPTORS: 3 open at exit.
==112158== Open file descriptor 2: /dev/pts/0
==112158==    <inherited from parent>
==112158== 
==112158== Open file descriptor 1: /dev/pts/0
==112158==    <inherited from parent>
==112158== 
==112158== Open file descriptor 0: /dev/pts/0
==112158==    <inherited from parent>
==112158== 
==112158== 
==112158== HEAP SUMMARY:
==112158==     in use at exit: 14,635 bytes in 26 blocks
==112158==   total heap usage: 216 allocs, 190 frees, 373,304 bytes allocated
==112158== 
==112158== 77 bytes in 1 blocks are definitely lost in loss record 12 of 22
==112158==    at 0x4C2C291: realloc (vg_replace_malloc.c:836)
==112158==    by 0x52CEC7A: vasprintf (in /usr/lib64/libc-2.17.so)
==112158==    by 0x52AA4F6: asprintf (in /usr/lib64/libc-2.17.so)
==112158==    by 0x504BBB9: load_driver (init.c:254)
==112158==    by 0x504BBB9: load_drivers (init.c:311)
==112158==    by 0x504C571: ibverbs_get_device_list (init.c:625)
==112158==    by 0x5049F2F: count_devices (device.c:97)
==112158==    by 0x584CE6F: pthread_once (in /usr/lib64/libpthread-2.17.so)
==112158==    by 0x504A82B: ibv_get_device_list@@IBVERBS_1.1 (device.c:122)
==112158==    by 0x562A035: ucma_init (cma.c:249)
==112158==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==112158==    by 0x4E3C40B: rpma_info_new (info.c:53)
==112158==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==112158== 
==112158== 160 (16 direct, 144 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 22
==112158==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==112158==    by 0x4E3DE11: rpma_peer_new (peer.c:193)
==112158==    by 0x402164: main (server.c:419)
==112158== 
==112158== 160 (16 direct, 144 indirect) bytes in 1 blocks are definitely lost in loss record 16 of 22
==112158==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==112158==    by 0x4E3D46D: rpma_mr_reg (mr.c:277)
==112158==    by 0x40219C: main (server.c:423)
==112158== 
==112158== LEAK SUMMARY:
==112158==    definitely lost: 109 bytes in 3 blocks
==112158==    indirectly lost: 288 bytes in 2 blocks
==112158==      possibly lost: 0 bytes in 0 blocks
==112158==    still reachable: 14,238 bytes in 21 blocks
==112158==         suppressed: 0 bytes in 0 blocks
==112158== Reachable blocks (those to which a pointer was found) are not shown.
==112158== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==112158== 
==112158== For lists of detected and suppressed errors, rerun with: -s
==112158== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

@grom72
Copy link
Contributor

grom72 commented Jan 27, 2021

Let's try to get more information with following option:
--undef-value-errors=yes
--malloc-fill=
--free-fill=
Additionally please try to use regular malloc( >1GiB ) - free() instead of fopen() fclose().

@qianlong-ql
Copy link
Author

replace fopen & fclose like below

        char *f = malloc(5 *1024 * 1024 * 1024);
        free(f);

and backtrace become

#0  0x00007f38317b4638 in _int_malloc () from /lib64/libc.so.6
#1  0x00007f38317b784c in malloc () from /lib64/libc.so.6
#2  0x0000000000402228 in main (argc=3, argv=0x7fff351976e8)

run valgrind with additional params and output:

 valgrind --leak-check=full --trace-children=yes --undef-value-errors=no --track-fds=yes --tool=memcheck --undef-value-errors=yes --malloc-fill= --free-fill= ./server 200.1.15.2 1
==218347== Memcheck, a memory error detector
==218347== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==218347== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==218347== Command: ./server 200.1.15.2 1
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7BF: is_overlap (vg_replace_strmem.c:131)
==218347==    by 0x4C2E7BF: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7DB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E828: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E876: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8C5: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8D3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Use of uninitialised value of size 8
==218347==    at 0x4C2E8F0: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Use of uninitialised value of size 8
==218347==    at 0x4C2E8F3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8FE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E90B: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7BF: is_overlap (vg_replace_strmem.c:131)
==218347==    by 0x4C2E7BF: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7DB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E828: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E876: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8C5: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8D3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Use of uninitialised value of size 8
==218347==    at 0x4C2E8F0: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Use of uninitialised value of size 8
==218347==    at 0x4C2E8F3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8FE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E90B: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x5627B5E: ucma_query_addr (cma.c:696)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x562A1E3: ucma_set_af_ib_support (cma.c:223)
==218347==    by 0x562A1E3: ucma_init (cma.c:270)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7BF: is_overlap (vg_replace_strmem.c:131)
==218347==    by 0x4C2E7BF: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7DB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E828: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E876: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8D3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8FE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E90B: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B31: ucma_query_addr (cma.c:693)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7BF: is_overlap (vg_replace_strmem.c:131)
==218347==    by 0x4C2E7BF: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E7DB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E828: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E876: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8D3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E8FE: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C2E90B: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==218347==    by 0x5627B4B: ucma_query_addr (cma.c:694)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x5627B5E: ucma_query_addr (cma.c:696)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x56279BC: ucma_get_device (cma.c:447)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x52A1AFE: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Use of uninitialised value of size 8
==218347==    at 0x529F1CB: _itoa_word (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52A3450: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x529F1D5: _itoa_word (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52A3450: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x52A349F: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x52A1BCB: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x52A1C4E: vfprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52CEF48: vsnprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA3D1: snprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x60A2142: __mlx5_query_device (verbs.c:67)
==218347==    by 0x60A2142: mlx5_query_device (verbs.c:83)
==218347==    by 0x562786B: ucma_init_device.part.2 (cma.c:318)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x4C29F09: malloc (vg_replace_malloc.c:309)
==218347==    by 0x5627885: ucma_init_device.part.2 (cma.c:324)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x5627898: ucma_init_device.part.2 (cma.c:330)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Conditional jump or move depends on uninitialised value(s)
==218347==    at 0x56278C5: ucma_init_device.part.2 (cma.c:330)
==218347==    by 0x5627A78: ucma_init_device (cma.c:459)
==218347==    by 0x5627A78: ucma_get_device (cma.c:454)
==218347==    by 0x5627B67: ucma_query_addr (cma.c:697)
==218347==    by 0x5628175: rdma_bind_addr2 (cma.c:867)
==218347==    by 0x5628175: rdma_bind_addr (cma.c:884)
==218347==    by 0x4E3C6B1: rpma_info_bind_addr (info.c:136)
==218347==    by 0x4E3E5B4: rpma_utils_get_ibv_context (rpma.c:53)
==218347==    by 0x402180: main (server.c:405)
==218347== 
==218347== Syscall param write(buf) points to uninitialised byte(s)
==218347==    at 0x5346CD0: __write_nocancel (in /usr/lib64/libc-2.17.so)
==218347==    by 0x5047004: ibv_cmd_dealloc_pd (cmd.c:200)
==218347==    by 0x60A25BD: mlx5_free_pd (verbs.c:283)
==218347==    by 0x5627F98: ucma_put_device (cma.c:477)
==218347==    by 0x5627F98: ucma_free_id (cma.c:522)
==218347==    by 0x5629F74: rdma_destroy_id (cma.c:654)
==218347==    by 0x4E3E5FC: rpma_utils_get_ibv_context (rpma.c:67)
==218347==    by 0x402180: main (server.c:405)
==218347==  Address 0x1fff0002a8 is on thread 1's stack
==218347==  in frame #1, created by ibv_cmd_dealloc_pd (cmd.c:194)
==218347== 
==218347== Syscall param write(buf) points to uninitialised byte(s)
==218347==    at 0x5346CD0: __write_nocancel (in /usr/lib64/libc-2.17.so)
==218347==    by 0x50471FC: ibv_cmd_reg_mr (cmd.c:267)
==218347==    by 0x60A2B3A: mlx5_reg_mr (verbs.c:468)
==218347==    by 0x504E14B: __ibv_common_reg_mr (verbs.c:295)
==218347==    by 0x504E253: ibv_reg_mr@@IBVERBS_1.1 (verbs.c:338)
==218347==    by 0x4E3DB11: rpma_peer_mr_reg (peer.c:113)
==218347==    by 0x4E3D4A6: rpma_mr_reg (mr.c:282)
==218347==    by 0x4021DC: main (server.c:423)
==218347==  Address 0x1fff000258 is on thread 1's stack
==218347==  in frame #2, created by mlx5_reg_mr (verbs.c:458)
==218347== 
--218442-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--218442-- si_code=1;  Faulting address: 0x5CBFD30;  sp: 0x1002ba9e30

valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==218442==    at 0x58055D12: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==218442==    by 0x5800EA1E: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==218442==    by 0x580A64C7: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==218442==    by 0x580FF23A: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 218442)
==218442==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==218442==    by 0x402227: main (server.c:434)
client stack range: [0x1FFEFFD000 0x1FFF000FFF] client SP: 0x1FFF000360
valgrind stack range: [0x1002AAA000 0x1002BA9FFF] top usage: 9104 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

==218347== 
==218347== FILE DESCRIPTORS: 3 open at exit.
==218347== Open file descriptor 2: /dev/pts/0
==218347==    <inherited from parent>
==218347== 
==218347== Open file descriptor 1: /dev/pts/0
==218347==    <inherited from parent>
==218347== 
==218347== Open file descriptor 0: /dev/pts/0
==218347==    <inherited from parent>
==218347== 
==218347== 
==218347== HEAP SUMMARY:
==218347==     in use at exit: 14,635 bytes in 26 blocks
==218347==   total heap usage: 216 allocs, 190 frees, 373,354 bytes allocated
==218347== 
==218347== 77 bytes in 1 blocks are definitely lost in loss record 12 of 22
==218347==    at 0x4C2C291: realloc (vg_replace_malloc.c:836)
==218347==    by 0x52CEC7A: vasprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x52AA4F6: asprintf (in /usr/lib64/libc-2.17.so)
==218347==    by 0x504BBB9: load_driver (init.c:254)
==218347==    by 0x504BBB9: load_drivers (init.c:311)
==218347==    by 0x504C571: ibverbs_get_device_list (init.c:625)
==218347==    by 0x5049F2F: count_devices (device.c:97)
==218347==    by 0x584CE6F: pthread_once (in /usr/lib64/libpthread-2.17.so)
==218347==    by 0x504A82B: ibv_get_device_list@@IBVERBS_1.1 (device.c:122)
==218347==    by 0x562A035: ucma_init (cma.c:249)
==218347==    by 0x562CE74: rdma_getaddrinfo (addrinfo.c:247)
==218347==    by 0x4E3C40B: rpma_info_new (info.c:53)
==218347==    by 0x4E3E4FB: rpma_utils_get_ibv_context (rpma.c:40)
==218347== 
==218347== 160 (16 direct, 144 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 22
==218347==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==218347==    by 0x4E3DE11: rpma_peer_new (peer.c:193)
==218347==    by 0x4021A4: main (server.c:419)
==218347== 
==218347== 160 (16 direct, 144 indirect) bytes in 1 blocks are definitely lost in loss record 16 of 22
==218347==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==218347==    by 0x4E3D46D: rpma_mr_reg (mr.c:277)
==218347==    by 0x4021DC: main (server.c:423)
==218347== 
==218347== LEAK SUMMARY:
==218347==    definitely lost: 109 bytes in 3 blocks
==218347==    indirectly lost: 288 bytes in 2 blocks
==218347==      possibly lost: 0 bytes in 0 blocks
==218347==    still reachable: 14,238 bytes in 21 blocks
==218347==         suppressed: 0 bytes in 0 blocks
==218347== Reachable blocks (those to which a pointer was found) are not shown.
==218347== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==218347== 
==218347== Use --track-origins=yes to see where uninitialised values come from
==218347== For lists of detected and suppressed errors, rerun with: -s
==218347== ERROR SUMMARY: 79 errors from 51 contexts (suppressed: 0 from 0)

@pbalcer
Copy link
Member

pbalcer commented Jan 28, 2021

hm, what is malloc_aligned() in your example?
Can you instead try dst_ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);

@ldorau
Copy link
Member

ldorau commented Jan 28, 2021

@qianlong-ql Hi, What OS-version does it occur on? I cannot reproduce it (using mmap() suggested above by @pbalcer).

@qianlong-ql
Copy link
Author

I clone a environment to reproduce this problem and send the addr & password to Tomasz Gromadzki by email.

@ldorau
Copy link
Member

ldorau commented Feb 2, 2021

OK, thanks, I have tested it. Could you download and save the source rpm nic-libs-mellanox-rdma-3.0.2-1.src.rpm in the same directory where the binary rpm is on this machine?

@qianlong-ql
Copy link
Author

nic-libs-mellanox-rdma-3.0.2-1.x86_64.rpm is not match with the driver on this machine. I recovery nic-libs-mellanox-rdma version to 2.0.1-2 and put the main source in directory /root/rpm_packet/nic-libs-mellanox-rdma-2.0.1

@ldorau
Copy link
Member

ldorau commented Feb 3, 2021

Thanks!

@ldorau
Copy link
Member

ldorau commented Feb 3, 2021

Hi @qianlong-ql
It seems to be an issue of glibc's malloc(). It is not an issue of librpma for sure.

  1. The simplest (but not the best!) workaround is to set pagesize to 2MB inside of malloc_aligned() instead of 4KB - it will not crash then.
  2. The best solution is to use mmap() with MAP_SHARED instead of malloc_aligned() as @pbalcer suggested above:
    dst_ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    because you should not use private memory for RDMA if you want to fork the process (malloc() and posix_memalign() use MAP_PRIVATE).

@qianlong-ql
Copy link
Author

Thanks, I got the key point that private memory shouldn't use for RDMA if fork used. But why is safe when pagesize set to 2MB.

@ldorau
Copy link
Member

ldorau commented Feb 3, 2021

It is not safe when pagesize is set to 2MB. It just does not crash, but I cannot guarantee that other things will work correctly.
The only safe way is to use mmap() with MAP_SHARED instead of malloc_aligned():
dst_ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);

ldorau added a commit to ldorau/rpma that referenced this issue Feb 9, 2021
ldorau added a commit to ldorau/rpma that referenced this issue Feb 9, 2021
ldorau added a commit to ldorau/rpma that referenced this issue Feb 10, 2021
ldorau added a commit to ldorau/rpma that referenced this issue Feb 10, 2021
ldorau added a commit to ldorau/rpma that referenced this issue Feb 10, 2021
The memory region passed to rpma_mr_reg() (ibv_mr_reg())
cannot be allocated from the heap (using malloc()
or posix_memalign()), but should be mapped using mmap().

Rationale:

If ibv_fork_init() was called, the memory region
passed to ibv_mr_reg() is marked by this function
with flag “do not copy on fork”
and after having called fork(), the child process
does not receive this range of virtual addresses.
If this memory region was allocated from the heap,
the child process receives a corrupted heap
with a “hole” of inaccessible addresses inside.
A memory allocator knows nothing about this “hole”
and if it tries to access (read or write)
that range of virtual addresses, it causes a segfault.

Ref: pmem#751
ldorau added a commit to ldorau/rpma that referenced this issue Feb 10, 2021
The memory region passed to rpma_mr_reg() (ibv_mr_reg())
cannot be allocated from the heap (using malloc()
or posix_memalign()), but should be mapped using mmap().

Rationale:

If ibv_fork_init() was called, the memory region
passed to ibv_mr_reg() is marked by this function
with flag “do not copy on fork”
and after having called fork(), the child process
does not receive this range of virtual addresses.
If this memory region was allocated from the heap,
the child process receives a corrupted heap
with a “hole” of inaccessible addresses inside.
A memory allocator knows nothing about this “hole”
and if it tries to access (read or write)
that range of virtual addresses, it causes a segfault.

Ref: pmem#751
ldorau added a commit to ldorau/rpma that referenced this issue Feb 10, 2021
The memory region passed to rpma_mr_reg() (ibv_mr_reg())
cannot be allocated from the heap (using malloc()
or posix_memalign()), but should be mapped using mmap().

Rationale:

If ibv_fork_init() was called, the memory region
passed to ibv_mr_reg() is marked by this function
with flag “do not copy on fork”
and after having called fork(), the child process
does not receive this range of virtual addresses.
If this memory region was allocated from the heap,
the child process receives a corrupted heap
with a “hole” of inaccessible addresses inside.
A memory allocator knows nothing about this “hole”
and if it tries to access (read or write)
that range of virtual addresses, it causes a segfault.

Ref: pmem#751
ldorau added a commit to ldorau/rpma that referenced this issue Feb 10, 2021
The memory region passed to rpma_mr_reg() (ibv_mr_reg())
cannot be allocated from the heap (using malloc()
or posix_memalign()), but should be mapped using mmap().

Rationale:

If ibv_fork_init() was called, the memory region
passed to ibv_mr_reg() is marked by this function
with flag “do not copy on fork”
and after having called fork(), the child process
does not receive this range of virtual addresses.
If this memory region was allocated from the heap,
the child process receives a corrupted heap
with a “hole” of inaccessible addresses inside.
A memory allocator knows nothing about this “hole”
and if it tries to access (read or write)
that range of virtual addresses, it causes a segfault.

Ref: pmem#751
ldorau added a commit to ldorau/rpma that referenced this issue Feb 10, 2021
The memory region passed to rpma_mr_reg() (ibv_mr_reg())
cannot be allocated from the heap (using malloc()
or posix_memalign()), but should be mapped using mmap().

Rationale:

If ibv_fork_init() was called, the memory region
passed to ibv_mr_reg() is marked by this function
with flag “do not copy on fork”
and after having called fork(), the child process
does not receive this range of virtual addresses.
If this memory region was allocated from the heap,
the child process receives a corrupted heap
with a “hole” of inaccessible addresses inside.
A memory allocator knows nothing about this “hole”
and if it tries to access (read or write)
that range of virtual addresses, it causes a segfault.

Ref: pmem#751
ldorau added a commit to ldorau/rpma that referenced this issue Feb 10, 2021
The memory region passed to rpma_mr_reg() (ibv_mr_reg())
cannot be allocated from the heap (using malloc()
or posix_memalign()), but should be mapped using mmap().

Rationale:

If ibv_fork_init() was called, the memory region
passed to ibv_mr_reg() is marked by this function
with flag “do not copy on fork”
and after having called fork(), the child process
does not receive this range of virtual addresses.
If this memory region was allocated from the heap,
the child process receives a corrupted heap
with a “hole” of inaccessible addresses inside.
A memory allocator knows nothing about this “hole”
and if it tries to access (read or write)
that range of virtual addresses, it causes a segfault.

Ref: pmem#751
ldorau added a commit to ldorau/rpma that referenced this issue Feb 11, 2021
The memory region passed to rpma_mr_reg() (ibv_mr_reg())
cannot be allocated from the heap (using malloc()
or posix_memalign()), but should be mapped using mmap().

Rationale:

If ibv_fork_init() was called, the memory region
passed to ibv_mr_reg() is marked by this function
with flag “do not copy on fork”
and after having called fork(), the child process
does not receive this range of virtual addresses.
If this memory region was allocated from the heap,
the child process receives a corrupted heap
with a “hole” of inaccessible addresses inside.
A memory allocator knows nothing about this “hole”
and if it tries to access (read or write)
that range of virtual addresses, it causes a segfault.

Ref: pmem#751
ldorau added a commit to ldorau/rpma that referenced this issue Feb 11, 2021
The memory region passed to rpma_mr_reg() (ibv_mr_reg())
cannot be allocated from the heap (using malloc()
or posix_memalign()), but should be mapped using mmap().

Rationale:

If ibv_fork_init() was called, the memory region
passed to ibv_mr_reg() is marked by this function
with flag “do not copy on fork”
and after having called fork(), the child process
does not receive this range of virtual addresses.
If this memory region was allocated from the heap,
the child process receives a corrupted heap
with a “hole” of inaccessible addresses inside.
A memory allocator knows nothing about this “hole”
and if it tries to access (read or write)
that range of virtual addresses, it causes a segfault.

Ref: pmem#751
ldorau added a commit to ldorau/rpma that referenced this issue Feb 11, 2021
The memory region passed to rpma_mr_reg() (ibv_mr_reg())
cannot be allocated from the heap (using malloc()
or posix_memalign()), but should be mapped using mmap().

Rationale:

If ibv_fork_init() was called, the memory region
passed to ibv_mr_reg() is marked by this function
with flag “do not copy on fork”
and after having called fork(), the child process
does not receive this range of virtual addresses.
If this memory region was allocated from the heap,
the child process receives a corrupted heap
with a “hole” of inaccessible addresses inside.
A memory allocator knows nothing about this “hole”
and if it tries to access (read or write)
that range of virtual addresses, it causes a segfault.

Ref: pmem#751
@ldorau
Copy link
Member

ldorau commented Feb 19, 2021

@qianlong-ql The fix #866 has been merged. Let us know, if it fixes this issue, please.

@ldorau
Copy link
Member

ldorau commented Mar 2, 2021

@qianlong-ql ping

@qianlong-ql
Copy link
Author

@ldorau I have been on vacation and I will test and let you know as soon as possible after my vacation

@ldorau
Copy link
Member

ldorau commented Mar 3, 2021

@ldorau I have been on vacation and I will test and let you know as soon as possible after my vacation

OK

@qianlong-ql
Copy link
Author

@ldorau The issue has been fixed

@ldorau
Copy link
Member

ldorau commented Mar 12, 2021

@ldorau The issue has been fixed

@qianlong-ql Thanks for confirmation! Closing ...

@ldorau ldorau closed this as completed Mar 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants