Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

motr unit test dix-client-ut error #1055

Closed
yanqingfu opened this issue Sep 20, 2021 — with Board Genius Sync · 20 comments
Closed

motr unit test dix-client-ut error #1055

yanqingfu opened this issue Sep 20, 2021 — with Board Genius Sync · 20 comments

Comments

Copy link
Contributor

[root@ssc-vm-g3-rhev4-1169 cortx-motr]# scripts/m0 run-ut -t dix-client-ut
----- run_ut -t dix-client-ut -----
START Iteration: 1 out of 1
dix-client-ut
imask 0.00 sec 248 B
imask-apply 0.00 sec 272 B
imask-empty 0.00 sec 0 B
imask-infini 0.00 sec 608 B
imask-short 0.00 sec 80 B
imask-invalid 0.00 sec 56 B
pdclust-map 0.00 sec 4 KiB
meta-val-encdec 0.00 sec 720 B
meta-val-encdec-n 0.00 sec 6 KiB
layout-encdec 0.00 sec 1 KiB
meta-create motr[02887]: f630 FATAL [lib/assert.c:50:m0_panic] panic: Unit-test assertion failed: rc == 0 at dix_client_init() (dix/ut/client_ut.c:1053) [git: 2.0.0-307-38-g112f986] /var/motr/m0ut/m0trace.2887
Motr panic: Unit-test assertion failed: rc == 0 at dix_client_init() dix/ut/client_ut.c:1053 (errno: 4) (last failed: none) [git: 2.0.0-307-38-g112f986] pid: 2887 /var/motr/m0ut/m0trace.2887
/var/cortx/cortx-motr/motr/.libs/libmotr.so.2(m0_arch_backtrace+0x20)[0x7f3abcbfdd00]
/var/cortx/cortx-motr/motr/.libs/libmotr.so.2(m0_arch_panic+0xe6)[0x7f3abcbfdeb6]
/var/cortx/cortx-motr/motr/.libs/libmotr.so.2(+0x37a084)[0x7f3abcbec084]
/var/cortx/cortx-motr/ut/.libs/libmotr-ut.so.0(+0x286032)[0x7f3abe315032]
/var/cortx/cortx-motr/ut/.libs/libmotr-ut.so.0(+0x178189)[0x7f3abe207189]
/var/cortx/cortx-motr/ut/.libs/libmotr-ut.so.0(+0x178af9)[0x7f3abe207af9]
/var/cortx/cortx-motr/ut/.libs/libmotr-ut.so.0(m0_ut_run+0x263)[0x7f3abe314953]
/var/cortx/cortx-motr/ut/.libs/lt-m0ut(main+0x11c9)[0x404709]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f3abacc4555]
/var/cortx/cortx-motr/ut/.libs/lt-m0ut[0x404881]
/var/cortx/cortx-motr/utils/m0run: line 416: 2887 Aborted (core dumped) $(srcdir_path_of $binary) "$@"

@hessio
Copy link
Contributor

hessio commented Sep 20, 2021

Hi @andriytk do you know what is causing this issue?

@andriytk
Copy link
Contributor

No. Which commit id? Does it reproduce on the latest one? Does it 100% always reproduce?

@yanqingfu
Copy link
Contributor Author

yanqingfu commented Sep 20, 2021

tested on the main branch. reboot VM still has the error. tested on two VMs still has the error
CentOS Linux release 7.9.2009 (Core), 3.10.0-1160.el7.x86_64

@r-wambui
Copy link
Contributor

I have the same issue, some unit tests are failing on both VirtualBox VM and VMware.

1- unit-test cas-client
Screenshot (3)

@andriytk
Copy link
Contributor

cc @madhavemuri, @truptiatseagate

@johnbent
Copy link
Contributor

@JugalPatil , when you committed #1044, was this working for you?

@truptiatseagate
Copy link

@nkommuri, Would you pls help with this issue by having someone from Motr4 work upon it, during the sprin53 that starts tomorrow. CC: @zohebkhann

@nkommuri
Copy link

Sure Trupti, we will start working on this.

@cortx-admin
Copy link

Mehul Manharlal Joshi commented in Jira Server:

This looks like a duplicate of the following: https://jts.seagate.com/browse/EOS-24883

@cortx-admin
Copy link

Nagakishore Kommuri commented in Jira Server:

Yes [~744289]  this issue is exactly same as https://jts.seagate.com/browse/EOS-24883

 

Here is the backtrace...

(gdb) bt
#0 0x00007fbe52be6387 in raise () from /lib64/libc.so.6
#1 0x00007fbe52be7a78 in abort () from /lib64/libc.so.6
#2 0x00007fbe54b0bf95 in m0_arch_panic (c=c@entry=0x7ffde9a19490, ap=ap@entry=0x7ffde9a19368) at lib/user_space/uassert.c:131
#3 0x00007fbe54afa0d4 in m0_panic (ctx=ctx@entry=0x7ffde9a19490) at lib/assert.c:52
#4 0x00007fbe56222be2 in m0_ut_assertimpl (c=, str_c=str_c@entry=0x7fbe5622c56e "rc == 0",
file=file@entry=0x7fbe5623a1e3 "dix/ut/client_ut.c", lno=lno@entry=1053, func=func@entry=0x7fbe5623b6c0 <func.22796> "dix_client_init")
at ut/ut.c:636
#5 0x00007fbe56115039 in dix_client_init (cctx=0x7fbe568f67e0 <dix_ut_cctx>, xprt=, dbname=,
srv_ep_addr=0x7fbe562339fa "0@lo:12345:34:1", cl_ep_addr=0x7fbe56233557 "0@lo:12345:34:2") at dix/ut/client_ut.c:1053
#6 dixc_ut_init (sctx=0x7fbe565a79c0 <dix_ut_sctx>, cctx=0x7fbe568f67e0 <dix_ut_cctx>) at dix/ut/client_ut.c:1135
#7 ut_service_init () at dix/ut/client_ut.c:1176
#8 0x00007fbe561159a9 in dix_meta_create () at dix/ut/client_ut.c:1223
#9 0x00007fbe56222503 in run_test (max_name_len=43, test=0x6c67e8 <dix_client_ut+4808>) at ut/ut.c:390
#10 run_suite (max_name_len=43, suite=0x6c5520 <dix_client_ut>) at ut/ut.c:459
#11 tests_run_all (m=0x7fbe6089e5a0 <ut.12960>) at ut/ut.c:513
#12 m0_ut_run () at ut/ut.c:539
#13 0x00000000004046e9 in main (argc=3, argv=0x7ffde9a19bb8) at ut/m0ut.c:525

 

Hit the following assert...

rc = m0_rpc_client_start(cl_rpc_ctx);
M0_UT_ASSERT(rc == 0);

 

m0_rpc_client_start() returned -110.    and  here is the corresponding stacktrace from where -110 got propagated.   

 

#0 m0_rpc_conn_timedwait (conn=conn@entry=0x7fffede286a0 <dix_ut_cctx+16064>, states=states@entry=12, timeout=18446744073709551615)
at rpc/conn.c:703
#1 0x00007fffec0ae9fa in m0_rpc_conn_establish_sync (conn=conn@entry=0x7fffede286a0 <dix_ut_cctx+16064>,
abs_timeout=abs_timeout@entry=18446744073709551615) at rpc/conn.c:847
#2 0x00007fffec0b04ff in m0_rpc_conn_create (conn=conn@entry=0x7fffede286a0 <dix_ut_cctx+16064>, svc_fid=svc_fid@entry=0x0, ep=0x30c08c0,
rpc_machine=rpc_machine@entry=0x7fffede27ea8 <dix_ut_cctx+14024>, max_rpcs_in_flight=max_rpcs_in_flight@entry=10,
abs_timeout=abs_timeout@entry=18446744073709551615) at rpc/conn.c:829
#3 0x00007fffec0bfbf2 in m0_rpc_client_connect (conn=conn@entry=0x7fffede286a0 <dix_ut_cctx+16064>,
session=session@entry=0x7fffede28a50 <dix_ut_cctx+17008>, rpc_mach=rpc_mach@entry=0x7fffede27ea8 <dix_ut_cctx+14024>,
remote_addr=, svc_fid=svc_fid@entry=0x0, max_rpcs_in_flight=10, abs_timeout=18446744073709551615) at rpc/rpclib.c:120
#4 0x00007fffec0bff12 in m0_rpc_client_start (cctx=cctx@entry=0x7fffede24cd8 <dix_ut_cctx+1272>) at rpc/rpclib.c:199
#5 0x00007fffed643012 in dix_client_init (cctx=0x7fffede247e0 <dix_ut_cctx>, xprt=, dbname=,
srv_ep_addr=0x7fffed7619fa "0@lo:12345:34:1", cl_ep_addr=0x7fffed761557 "0@lo:12345:34:2") at dix/ut/client_ut.c:1052
#6 dixc_ut_init (sctx=0x7fffedad59c0 <dix_ut_sctx>, cctx=0x7fffede247e0 <dix_ut_cctx>) at dix/ut/client_ut.c:1135
#7 ut_service_init () at dix/ut/client_ut.c:1176
#8 0x00007fffed6439a9 in dix_meta_create () at dix/ut/client_ut.c:1223
#9 0x00007fffed750503 in run_test (max_name_len=43, test=0x6c67e8 <dix_client_ut+4808>) at ut/ut.c:390
#10 run_suite (max_name_len=43, suite=0x6c5520 <dix_client_ut>) at ut/ut.c:459
#11 tests_run_all (m=0x7ffff7dcc5a0 <ut.12960>) at ut/ut.c:513
#12 m0_ut_run () at ut/ut.c:539
#13 0x00000000004046e9 in main (argc=3, argv=0x7fffffffe578) at ut/m0ut.c:525

 

 

m0_rpc_conn_timedwait() returned 110 


return M0_RC(rc ?: conn->c_sm.sm_rc);


(gdb) n
718 return M0_RC(rc ?: conn->c_sm.sm_rc);
(gdb) p conn->c_sm.sm_rc
$6 = -110

 

 

 

 

@nkommuri
Copy link

Root cause : Reordering of the initialization sequence of transports
caused the issue. "sock" was put before "lnet", because of this dix_client_ut
in UT is trying to use very first transport(sock) while the server is
using the transport based on the endpoint format(lnet). That is why we
see ETIMEDOUT. Client cannot connect to server because they are using
different transport

@johnbent
Copy link
Contributor

Awesome thanks @nkommuri ! Will a PR be showing up soon to fix this?

@nkommuri
Copy link

Hi John, PR Raised.
#1069

@johnbent
Copy link
Contributor

Awesome!  Thanks much.

@yanqingfu
Copy link
Contributor Author

Hi John, PR Raised.
#1069

works

[root@ssc-vm-g3-rhev4-1169 cortx-motr]# scripts/m0 run-ut -t dix-client-ut
----- run_ut -t dix-client-ut -----
START Iteration: 1 out of 1
dix-client-ut
imask 0.00 sec 248 B
imask-apply 0.00 sec 272 B
imask-empty 0.00 sec 0 B
imask-infini 0.00 sec 608 B
imask-short 0.00 sec 80 B
imask-invalid 0.00 sec 56 B
pdclust-map 0.00 sec 4 KiB
meta-val-encdec 0.00 sec 720 B
meta-val-encdec-n 0.00 sec 6 KiB
layout-encdec 0.00 sec 1 KiB
meta-create 3.09 sec 115 MiB
create 3.12 sec 110 MiB
create-crow 2.02 sec 104 MiB
create-dgmode 8.10 sec 117 MiB
delete 4.05 sec 122 MiB
delete-crow 4.42 sec 151 MiB
delete-dgmode 4.15 sec 126 MiB
list 3.14 sec 111 MiB
put 0.99 sec 109 MiB
put-overwrite 6.24 sec 120 MiB
put-crow 2.05 sec 109 MiB
put-dgmode 2.42 sec 131 MiB
get 3.20 sec 112 MiB
get-resend 3.04 sec 109 MiB
get-dgmode 11.82 sec 663 MiB
next 7.19 sec 219 MiB
next-crow 2.07 sec 106 MiB
next-dgmode 3.01 sec 110 MiB
del 3.11 sec 118 MiB
del-dgmode 3.83 sec 151 MiB
null-value 4.09 sec 112 MiB
cctgs-lookup 6.10 sec 122 MiB
local-failures 3.07 sec 109 MiB
next-merge 0.00 sec 25 KiB
server-is-down 3.03 sec 104 MiB
[ time: 97.45 sec, mem: 3 GiB, leaked: 12 MiB ]

Time: 97.46 sec, Mem: 3 GiB, Leaked: 12 MiB, Asserts: 23918
Unit tests status: SUCCESS
END Iteration: 1 out of 1

utime 39.839140 stime 7.341012 maxrss 426628 nvcsw 183588 nivcsw 259
minflt 90232 majflt 52 inblock 584 oublock 6879464
rchar 3220978 wchar 78484 syscr 2189 syscw 268
read_bytes 299008 write_bytes 3522363392 cancelled_write_bytes 409600

@cortx-admin
Copy link

Nagakishore Kommuri commented in Jira Server:

Duplicated of EOS-24883

@stale
Copy link

stale bot commented Oct 1, 2021

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

@cortx-admin
Copy link

Nagakishore Kommuri commented in Jira Server:

Fixed through [EOS-24883] ./scripts/m0 run-ut -t cas-client FAILED without -ETIMEOUT - JIRA NSS (seagate.com)

 

@cortx-admin
Copy link

Nagakishore Kommuri commented in Jira Server:

Fixed through [EOS-24883] ./scripts/m0 run-ut -t cas-client FAILED without -ETIMEOUT - JIRA NSS (seagate.com)

@cortx-admin
Copy link

Nagakishore Kommuri commented in Jira Server:

Verified

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants