DAOS Container Creation Fails with Transport Layer Mercury Error #128048
Unanswered
sonigitkhushi
asked this question in
Programming Help
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
##Transport layer mercury error
Bug
Body
Hello everyone,
I'm currently facing a challenging issue while trying to create a DAOS container, and I'm hoping the community can help me resolve it. The command and error messages I'm seeing are as follows:
[root@client1 ~]# daos cont create test --label mycont
external ERR # [15688.017525] mercury->msg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/na/na_ofi.c:3047
na_ofi_msg_send(): fi_tsend() failed, rc: -2 (No such file or directory)
external ERR # [15688.018207] mercury->hg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/mercury_core.c:2727
hg_core_forward_na(): Could not post send for input buffer (NA_NOENTRY)
hg ERR src/cart/crt_hg.c:1104 crt_hg_req_send_cb(0x30248c0) [opc=0x1020004 (DAOS) rpcid=0x72d8b86300000000 rank:tag=0:0] RPC failed; rc: DER_HG(-1020): 'Transport layer mercury error'
mgmt ERR src/mgmt/cli_mgmt.c:882 dc_mgmt_pool_find() test: failed to get PS replicas from 1 servers, DER_HG(-1020): 'Transport layer mercury error'
pool ERR src/pool/cli.c:198 dc_pool_choose_svc_rank() 00000000:test: dc_mgmt_pool_find() failed, DER_HG(-1020): 'Transport layer mercury error'
pool ERR src/pool/cli.c:503 dc_pool_connect_internal() 00000000:test: cannot find pool service: DER_HG(-1020): 'Transport layer mercury error'
ERROR: daos: DER_HG(-1020): Transport layer mercury error
**System Details:
**Network Configuration:
[root@server1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:5e:c1:f9 brd ff:ff:ff:ff:ff:ff
inet 192.168.232.141/24 brd 192.168.232.255 scope global noprefixroute dynamic ens33
valid_lft 1437sec preferred_lft 1437sec
inet6 fe80::52df:220a:54d:624c/64 scope link noprefixroute
valid_lft forever preferred_lft forever
2.DAOS Client IP config:
[root@client1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:d1:5d:7f brd ff:ff:ff:ff:ff:ff
inet 192.168.232.140/24 brd 192.168.232.255 scope global noprefixroute dynamic ens33
valid_lft 1433sec preferred_lft 1433sec
inet6 fe80::34ae:4ee9:cdba:2bc/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ens36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:d1:5d:89 brd ff:ff:ff:ff:ff:ff
inet 192.168.142.131/24 brd 192.168.142.255 scope global noprefixroute dynamic ens36
valid_lft 1373sec preferred_lft 1373sec
inet6 fe80::8ad0:43ed:c41d:3b16/64 scope link noprefixroute
valid_lft forever preferred_lft forever
**Configuration Files:
1.DAOS Configuration File:
DAOS server configuration file.
name: daos_server
access_points: ['192.168.232.141']
port: 10001
transport_config:
allow_insecure: true
client_cert_dir: /etc/daos/certs/clients
ca_cert: /etc/daos/certs/daosCA.crt
cert: /etc/daos/certs/server.crt
key: /etc/daos/certs/server.key
provider: ofi+sockets
socket_dir: /var/run/daos_server
nr_hugepages: 4096
control_log_mask: DEBUG
control_log_file: /tmp/daos_server.log
helper_log_file: /tmp/daos_admin.log
engines:
targets: 8
nr_xs_helpers: 0
fabric_iface: ens33
fabric_iface_port: 31316
log_mask: INFO
log_file: /tmp/daos_engine_0.log
env_vars:
- CRT_TIMEOUT=30
storage:
-
scm_mount: /mnt/daos0
scm_class: ram
scm_size: 4
-
bdev_class: file
bdev_size: 4
bdev_list: ["0000:03:00.0"]
2.DAOS Agent file:
name: daos_server
access_points: ['192.168.232.141']
port: 10001
transport_config:
allow_insecure: true
ca_cert: /etc/daos/certs/daosCA.crt
cert: /etc/daos/certs/agent.crt
key: /etc/daos/certs/agent.key
log_file: /tmp/daos_agent.log
fabric_ifaces:
numa_node: 0
devices:
3.DAOS Control file:
#name: daos_server
#port: 10001
#hostlist: ['192.168.232.141']
#transport_config:
allow_insecure: true
ca_cert: /etc/daos/certs/daosCA.crt
cert: /etc/daos/certs/admin.crt
key: /etc/daos/certs/admin.key
Seeking Your Expertise:
I'm eager to hear your thoughts and suggestions. Any help to get my DAOS container up and running would be greatly appreciated!
Thank You!
Beta Was this translation helpful? Give feedback.
All reactions