Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

Do rms_resource_free() in rso_prepare_to_stop() #1976

Merged
merged 2 commits into from
Jul 15, 2022

Conversation

huanghua78
Copy link

@huanghua78 huanghua78 commented Jul 12, 2022

  • In commit 3af2fc7, we do reverse session cleanup in prepare_to_stop().
    So, we must also do rms_resource_free() in rso_prepare_to_stop().
    This patch fixes the rm-ut:rmvsv UT failure:

Thread 29 (Thread 0x7f77553f4700 (LWP 1763408)):
0 0x00007f77adb9ada8 in nanosleep () from /lib64/libc.so.6
1 0x00007f77adb9acae in sleep () from /lib64/libc.so.6
2 0x00007f77afc34af2 in m0_debugger_invoke () at lib/user_space/uassert.c:144
3 0x00007f77afc34c04 in m0_arch_panic (c=c@entry=0x7f77b00fa700 <signal_panic>, ap=ap@entry=0x7f77553f2538) at lib/user_space/uassert.c:129
4 0x00007f77afc22ba7 in m0_panic (ctx=ctx@entry=0x7f77b00fa700 <signal_panic>) at lib/assert.c:52
5 0x00007f77afc34c48 in sigsegv (sig=11) at lib/user_space/ucookie.c:52
6
7 m0_rpc_conn_is_snd (conn=0x5f5f5f5f5f5f5f5f) at rpc/conn.c:282
8 0x00007f77afcbe0a1 in m0_rpc_post (item=item@entry=0x7f773d820130) at /home/520428/work/cortx-motr/rpc/rpc_internal.h:97
9 0x00007f77b136c498 in m0_rm_outgoing_send (outgoing=outgoing@entry=0x7f773d820050) at /home/520428/work/cortx-motr/rm/rm_fops.c:310
10 0x00007f77b136c7dd in outgoing_queue (other=0x7f77a419ce50, in=0x7f773f0e63d0, outreq=0x7f773d820050, owner=0x7f77a41ca0d0, otype=M0_ROT_REVOKE)
at /home/520428/work/cortx-motr/rm/rm_fops.c:351
11 m0_rm_request_out (otype=otype@entry=M0_ROT_REVOKE, in=in@entry=0x7f773f0e63d0, loan=loan@entry=0x7f77a419d000, credit=credit@entry=0x7f77553f3750,
other=other@entry=0x7f77a419ce50) at /home/520428/work/cortx-motr/rm/rm_fops.c:392
12 0x00007f77afca65df in revoke_send (credit=0x7f77a419d000, loan=0x7f77a419d000, in=0x7f773f0e63d0) at rm/rm.c:2633
13 incoming_check_with (rest=0x7f77553f36f0, in=0x7f773f0e63d0) at rm/rm.c:2437
14 incoming_check (in=0x7f773f0e63d0) at rm/rm.c:2060
15 owner_balance (o=o@entry=0x7f77a41ca0d0) at rm/rm.c:1995
16 0x00007f77afca6cdc in owner_liquidate (o=0x7f77a41ca0d0) at rm/rm.c:864
17 owner_finalisation_check (owner=0x7f77a41ca0d0) at rm/rm.c:551
18 owner_balance (o=o@entry=0x7f77a41ca0d0) at rm/rm.c:2002
19 0x00007f77afca6eb4 in owner_windup_locked (owner=owner@entry=0x7f77a41ca0d0) at rm/rm.c:927
20 0x00007f77afca6f38 in m0_rm_owner_windup (owner=owner@entry=0x7f77a41ca0d0) at rm/rm.c:936
21 0x00007f77afcaa309 in rms_resources_free (rtype=rtype@entry=0x7f773f0ed370) at rm/rm_service.c:183
22 0x00007f77afcaa3ee in rms_stop (service=0x7f773f0ed030) at rm/rm_service.c:207
23 0x00007f77afc9da6e in m0_reqh_service_stop (service=service@entry=0x7f773f0ed030) at reqh/reqh_service.c:419
24 0x00007f77b12f2fab in cs_service_fini (service=0x7f773f0ed030) at /home/520428/work/cortx-motr/motr/setup.c:1190
25 0x00007f77b12f31d0 in reqh_context_services_fini (cctx=0x7f77b183e1a8 <sctx+40>, rctx=0x7f77b183e4b8 <sctx+824>) at /home/520428/work/cortx-motr/motr/setup.c:1216
26 cs_level_leave (module=) at /home/520428/work/cortx-motr/motr/setup.c:2849
27 0x00007f77afc757d0 in m0_module_fini (module=module@entry=0x7f77b184cc00 <sctx+60032>, level=level@entry=-1) at module/module.c:154
28 0x00007f77b12f6fcd in m0_cs_fini (cctx=cctx@entry=0x7f77b183e1a8 <sctx+40>) at /home/520428/work/cortx-motr/motr/setup.c:3056
29 0x00007f77afcc1009 in m0_rpc_server_stop (sctx=sctx@entry=0x7f77b183e180 ) at rpc/rpclib.c:89
30 0x00007f77b136f4fa in rm_service_stop (sctx=0x7f77b183e180 ) at rm/ut/rm_service.c:84
31 rm_svc_server (tid=) at rm/ut/rm_service.c:84
32 0x00007f77afc29f7a in m0_thread_trampoline (arg=arg@entry=0x7f77b5e92008 <rm_ctxs+8>) at lib/thread.c:117
33 0x00007f77afc357bd in uthread_trampoline (arg=0x7f77b5e92008 <rm_ctxs+8>) at lib/user_space/uthread.c:98
34 0x00007f77af2c915a in start_thread () from /lib64/libpthread.so.0
35 0x00007f77adbcedd3 in clone () from /lib64/libc.so.6

  • Fix UT ha-state-ut:ha-poolversion-get failure.
    Actual pver is returned in case when nothing better (cleaner) can be found.
    This new mechanism is introduced in 2f43b29 .

Problem Statement

  • Problem statement

Design

  • For Bug, Describe the fix here.
  • For Feature, Post the link for design

Coding

Checklist for Author

  • Coding conventions are followed and code is consistent

Testing

Checklist for Author

  • Unit and System Tests are added
  • Test Cases cover Happy Path, Non-Happy Path and Scalability
  • Testing was performed with RPM

Impact Analysis

Checklist for Author/Reviewer/GateKeeper

  • Interface change (if any) are documented
  • Side effects on other features (deployment/upgrade)
  • Dependencies on other component(s)

Review Checklist

Checklist for Author

  • JIRA number/GitHub Issue added to PR
  • PR is self reviewed
  • Jira and state/status is updated and JIRA is updated with PR link
  • Check if the description is clear and explained

Documentation

Checklist for Author

  • Changes done to WIKI / Confluence page / Quick Start Guide

@cortx-admin
Copy link

Jenkins CI Result : Motr#1454

Motr Test Summary

Test ResultCountInfo
❌Failed2
📁

04motr-single-node/49motr-rpc-cancel
01motr-single-node/00userspace-tests

🏁Skipped32
📁

01motr-single-node/28sys-kvs
01motr-single-node/35m0singlenode
01motr-single-node/04initscripts
01motr-single-node/37protocol
02motr-single-node/51kem
02motr-single-node/20rpc-session-cancel
02motr-single-node/10pver-assign
02motr-single-node/21fsync-single-node
02motr-single-node/13dgmode-io
02motr-single-node/14poolmach
02motr-single-node/11m0t1fs
02motr-single-node/26motr-user-kernel-tests
02motr-single-node/08spiel
03motr-single-node/06conf
03motr-single-node/36spare-reservation
04motr-single-node/34sns-repair-1n-1f
04motr-single-node/08spiel-sns-repair-quiesce
04motr-single-node/28sys-kvs-kernel
04motr-single-node/11m0t1fs-rconfc-fail
04motr-single-node/08spiel-sns-repair
04motr-single-node/19sns-repair-abort
04motr-single-node/22sns-repair-ios-fail
05motr-single-node/18sns-repair-quiesce
05motr-single-node/12fwait
05motr-single-node/16sns-repair-multi
05motr-single-node/07mount-fail
05motr-single-node/15sns-repair-single
05motr-single-node/23sns-abort-quiesce
05motr-single-node/17sns-repair-concurrent-io
05motr-single-node/07mount
05motr-single-node/07mount-multiple
05motr-single-node/12fsync

✔️Passed41
📁

01motr-single-node/43m0crate
01motr-single-node/05confgen
01motr-single-node/06hagen
01motr-single-node/52motr-singlenode-sanity
01motr-single-node/01net
01motr-single-node/01kernel-tests
01motr-single-node/03console
01motr-single-node/02rpcping
02motr-single-node/07m0d-fatal
02motr-single-node/67fdmi-plugin-multi-filters
02motr-single-node/53clusterusage-alert
02motr-single-node/41motr-conf-update
03motr-single-node/61sns-repair-motr-1n-1f
03motr-single-node/72spiel-sns-motr-repair-quiesce
03motr-single-node/08spiel-multi-confd
03motr-single-node/69sns-repair-motr-quiesce
03motr-single-node/62sns-repair-motr-mf
03motr-single-node/70sns-failure-after-repair-quiesce
03motr-single-node/63sns-repair-motr-1k-1f
03motr-single-node/60sns-repair-motr-1f
03motr-single-node/66sns-repair-motr-abort-quiesce
03motr-single-node/24motr-dix-repair-lookup-insert-spiel
03motr-single-node/68sns-repair-motr-shutdown
03motr-single-node/64sns-repair-motr-ios-fail
03motr-single-node/71spiel-sns-motr-repair
03motr-single-node/24motr-dix-repair-lookup-insert-m0repair
03motr-single-node/04sss
03motr-single-node/65sns-repair-motr-abort
04motr-single-node/48motr-raid0-io
04motr-single-node/25m0kv
04motr-single-node/44motr-rm-lock-cc-io
04motr-single-node/45motr-rmw
05motr-single-node/23dix-repair-m0repair
05motr-single-node/43motr-sync-replication
05motr-single-node/42motr-utils
05motr-single-node/45motr-sns-repair-N-1
05motr-single-node/40motr-dgmode
05motr-single-node/23dix-repair-quiesce-m0repair
05motr-single-node/23spiel-dix-repair-quiesce
05motr-single-node/44motr-sns-repair
05motr-single-node/23spiel-dix-repair

Total75🔗

CppCheck Summary

   Cppcheck: No new warnings found 👍

@huanghua78
Copy link
Author

I am sorry, but why is the lnet test included in the UT?
lnet is not added in the transport list as default.
@madhavemuri @upendrapatwardhan

@huanghua78
Copy link
Author

I am sorry, but why is the lnet test included in the UT? lnet is not added in the transport list as default. @madhavemuri @upendrapatwardhan

From the log, I can see Lustre/Lnet is still used to build the lnet transport.
This should be disabled, I think.
If it is enabled, please make sure the UT pass.

@upendrapatwardhan
Copy link
Contributor

I am sorry, but why is the lnet test included in the UT? lnet is not added in the transport list as default. @madhavemuri @upendrapatwardhan

Please see the below comments.
#1799 (comment)

#1809 (comment)

Based on above inputs, even if lnet is not the default transport, the lnet UT is enabled if the lustre rpms are installed and m0tr.ko is available.

@cortx-admin
Copy link

Jenkins CI Result : Motr#1463

Motr Test Summary

Test ResultCountInfo
❌Failed2
📁

04motr-single-node/49motr-rpc-cancel
01motr-single-node/00userspace-tests

🏁Skipped32
📁

01motr-single-node/28sys-kvs
01motr-single-node/35m0singlenode
01motr-single-node/04initscripts
01motr-single-node/37protocol
02motr-single-node/51kem
02motr-single-node/20rpc-session-cancel
02motr-single-node/10pver-assign
02motr-single-node/21fsync-single-node
02motr-single-node/13dgmode-io
02motr-single-node/14poolmach
02motr-single-node/11m0t1fs
02motr-single-node/26motr-user-kernel-tests
02motr-single-node/08spiel
03motr-single-node/06conf
03motr-single-node/36spare-reservation
04motr-single-node/34sns-repair-1n-1f
04motr-single-node/08spiel-sns-repair-quiesce
04motr-single-node/28sys-kvs-kernel
04motr-single-node/11m0t1fs-rconfc-fail
04motr-single-node/08spiel-sns-repair
04motr-single-node/19sns-repair-abort
04motr-single-node/22sns-repair-ios-fail
05motr-single-node/18sns-repair-quiesce
05motr-single-node/12fwait
05motr-single-node/16sns-repair-multi
05motr-single-node/07mount-fail
05motr-single-node/15sns-repair-single
05motr-single-node/23sns-abort-quiesce
05motr-single-node/17sns-repair-concurrent-io
05motr-single-node/07mount
05motr-single-node/07mount-multiple
05motr-single-node/12fsync

✔️Passed41
📁

01motr-single-node/43m0crate
01motr-single-node/05confgen
01motr-single-node/06hagen
01motr-single-node/52motr-singlenode-sanity
01motr-single-node/01net
01motr-single-node/01kernel-tests
01motr-single-node/03console
01motr-single-node/02rpcping
02motr-single-node/07m0d-fatal
02motr-single-node/67fdmi-plugin-multi-filters
02motr-single-node/53clusterusage-alert
02motr-single-node/41motr-conf-update
03motr-single-node/61sns-repair-motr-1n-1f
03motr-single-node/72spiel-sns-motr-repair-quiesce
03motr-single-node/08spiel-multi-confd
03motr-single-node/69sns-repair-motr-quiesce
03motr-single-node/62sns-repair-motr-mf
03motr-single-node/70sns-failure-after-repair-quiesce
03motr-single-node/63sns-repair-motr-1k-1f
03motr-single-node/60sns-repair-motr-1f
03motr-single-node/66sns-repair-motr-abort-quiesce
03motr-single-node/24motr-dix-repair-lookup-insert-spiel
03motr-single-node/68sns-repair-motr-shutdown
03motr-single-node/64sns-repair-motr-ios-fail
03motr-single-node/71spiel-sns-motr-repair
03motr-single-node/24motr-dix-repair-lookup-insert-m0repair
03motr-single-node/04sss
03motr-single-node/65sns-repair-motr-abort
04motr-single-node/48motr-raid0-io
04motr-single-node/25m0kv
04motr-single-node/44motr-rm-lock-cc-io
04motr-single-node/45motr-rmw
05motr-single-node/23dix-repair-m0repair
05motr-single-node/43motr-sync-replication
05motr-single-node/42motr-utils
05motr-single-node/45motr-sns-repair-N-1
05motr-single-node/40motr-dgmode
05motr-single-node/23dix-repair-quiesce-m0repair
05motr-single-node/23spiel-dix-repair-quiesce
05motr-single-node/44motr-sns-repair
05motr-single-node/23spiel-dix-repair

Total75🔗

CppCheck Summary

   Cppcheck: No new warnings found 👍

@huanghua78
Copy link
Author

huanghua78 commented Jul 14, 2022

Hello @andriytk , can you please help check this failure? https://eos-jenkins.colo.seagate.com/job/Cortx-PR-Build/job/Motr/1463//testReport/junit/01motr-single-node/00userspace-tests

I verified this UT failure locally.
It was introduced recently in 2f43b29

@huanghua78
Copy link
Author

Hello @andriytk , can you please help check this failure? https://eos-jenkins.colo.seagate.com/job/Cortx-PR-Build/job/Motr/1463//testReport/junit/01motr-single-node/00userspace-tests

I verified this UT failure locally. It was introduced recently in 2f43b29

And this is the fix:

diff --git a/ha/ut/note.c b/ha/ut/note.c
index 531a055e6..7fc57a474 100644
--- a/ha/ut/note.c
+++ b/ha/ut/note.c
@@ -493,6 +493,7 @@ static void test_poolversion_get(void)
        struct m0_conf_pver    *pver2 = NULL;
        struct m0_conf_pver    *pver3 = NULL;
        struct m0_conf_pver    *pver4 = NULL;
+       struct m0_conf_pver    *pver5 = NULL;
        struct m0_reqh          reqh;
        struct m0_confc        *confc = m0_reqh2confc(&reqh);
        int                     rc;
@@ -542,10 +543,10 @@ static void test_poolversion_get(void)
        M0_UT_ASSERT(recd_disks(pver0) == 2);

        ha_state_accept_1(&disk_76, M0_NC_TRANSIENT);
-       rc = m0_conf_pver_get(confc, &pool4_fid, &pver3);
-       M0_UT_ASSERT(rc == -ENOENT);
-       M0_UT_ASSERT(pver3 == NULL);
-
+       rc = m0_conf_pver_get(confc, &pool4_fid, &pver5);
+       M0_UT_ASSERT(rc == 0);
+       M0_UT_ASSERT(pver5 == pver0);
+       M0_UT_ASSERT(pver5->pv_kind == M0_CONF_PVER_ACTUAL);

        rc = m0_conf_pver_get(confc, &pool56_fid, &pver3);
        M0_UT_ASSERT(rc == 0);
@@ -574,6 +575,7 @@ static void test_poolversion_get(void)
        m0_confc_close(&pver2->pv_obj);
        m0_confc_close(&pver3->pv_obj);
        m0_confc_close(&pver4->pv_obj);
+       m0_confc_close(&pver5->pv_obj);
        ha_ut_conf_fini(&reqh);
        m0_reqh_fini(&reqh);
 }

@andriytk , please verify.

@andriytk
Copy link
Contributor

Thanks @huanghua78! The fix for the UT looks good.

In commit 3af2fc7, we do reverse session cleanup in prepare_to_stop().
So, we must also do rms_resource_free() in rso_prepare_to_stop().
This patch fixes the rm-ut:rmvsv UT failure:

Thread 29 (Thread 0x7f77553f4700 (LWP 1763408)):
0  0x00007f77adb9ada8 in nanosleep () from /lib64/libc.so.6
1  0x00007f77adb9acae in sleep () from /lib64/libc.so.6
2  0x00007f77afc34af2 in m0_debugger_invoke () at lib/user_space/uassert.c:144
3  0x00007f77afc34c04 in m0_arch_panic (c=c@entry=0x7f77b00fa700 <signal_panic>, ap=ap@entry=0x7f77553f2538) at lib/user_space/uassert.c:129
4  0x00007f77afc22ba7 in m0_panic (ctx=ctx@entry=0x7f77b00fa700 <signal_panic>) at lib/assert.c:52
5  0x00007f77afc34c48 in sigsegv (sig=11) at lib/user_space/ucookie.c:52
6  <signal handler called>
7  m0_rpc_conn_is_snd (conn=0x5f5f5f5f5f5f5f5f) at rpc/conn.c:282
8  0x00007f77afcbe0a1 in m0_rpc_post (item=item@entry=0x7f773d820130) at /home/520428/work/cortx-motr/rpc/rpc_internal.h:97
9  0x00007f77b136c498 in m0_rm_outgoing_send (outgoing=outgoing@entry=0x7f773d820050) at /home/520428/work/cortx-motr/rm/rm_fops.c:310
10 0x00007f77b136c7dd in outgoing_queue (other=0x7f77a419ce50, in=0x7f773f0e63d0, outreq=0x7f773d820050, owner=0x7f77a41ca0d0, otype=M0_ROT_REVOKE)
   at /home/520428/work/cortx-motr/rm/rm_fops.c:351
11 m0_rm_request_out (otype=otype@entry=M0_ROT_REVOKE, in=in@entry=0x7f773f0e63d0, loan=loan@entry=0x7f77a419d000, credit=credit@entry=0x7f77553f3750,
   other=other@entry=0x7f77a419ce50) at /home/520428/work/cortx-motr/rm/rm_fops.c:392
12 0x00007f77afca65df in revoke_send (credit=0x7f77a419d000, loan=0x7f77a419d000, in=0x7f773f0e63d0) at rm/rm.c:2633
13 incoming_check_with (rest=0x7f77553f36f0, in=0x7f773f0e63d0) at rm/rm.c:2437
14 incoming_check (in=0x7f773f0e63d0) at rm/rm.c:2060
15 owner_balance (o=o@entry=0x7f77a41ca0d0) at rm/rm.c:1995
16 0x00007f77afca6cdc in owner_liquidate (o=0x7f77a41ca0d0) at rm/rm.c:864
17 owner_finalisation_check (owner=0x7f77a41ca0d0) at rm/rm.c:551
18 owner_balance (o=o@entry=0x7f77a41ca0d0) at rm/rm.c:2002
19 0x00007f77afca6eb4 in owner_windup_locked (owner=owner@entry=0x7f77a41ca0d0) at rm/rm.c:927
20 0x00007f77afca6f38 in m0_rm_owner_windup (owner=owner@entry=0x7f77a41ca0d0) at rm/rm.c:936
21 0x00007f77afcaa309 in rms_resources_free (rtype=rtype@entry=0x7f773f0ed370) at rm/rm_service.c:183
22 0x00007f77afcaa3ee in rms_stop (service=0x7f773f0ed030) at rm/rm_service.c:207
23 0x00007f77afc9da6e in m0_reqh_service_stop (service=service@entry=0x7f773f0ed030) at reqh/reqh_service.c:419
24 0x00007f77b12f2fab in cs_service_fini (service=0x7f773f0ed030) at /home/520428/work/cortx-motr/motr/setup.c:1190
25 0x00007f77b12f31d0 in reqh_context_services_fini (cctx=0x7f77b183e1a8 <sctx+40>, rctx=0x7f77b183e4b8 <sctx+824>) at /home/520428/work/cortx-motr/motr/setup.c:1216
26 cs_level_leave (module=<optimized out>) at /home/520428/work/cortx-motr/motr/setup.c:2849
27 0x00007f77afc757d0 in m0_module_fini (module=module@entry=0x7f77b184cc00 <sctx+60032>, level=level@entry=-1) at module/module.c:154
28 0x00007f77b12f6fcd in m0_cs_fini (cctx=cctx@entry=0x7f77b183e1a8 <sctx+40>) at /home/520428/work/cortx-motr/motr/setup.c:3056
29 0x00007f77afcc1009 in m0_rpc_server_stop (sctx=sctx@entry=0x7f77b183e180 <sctx>) at rpc/rpclib.c:89
30 0x00007f77b136f4fa in rm_service_stop (sctx=0x7f77b183e180 <sctx>) at rm/ut/rm_service.c:84
31 rm_svc_server (tid=<optimized out>) at rm/ut/rm_service.c:84
32 0x00007f77afc29f7a in m0_thread_trampoline (arg=arg@entry=0x7f77b5e92008 <rm_ctxs+8>) at lib/thread.c:117
33 0x00007f77afc357bd in uthread_trampoline (arg=0x7f77b5e92008 <rm_ctxs+8>) at lib/user_space/uthread.c:98
34 0x00007f77af2c915a in start_thread () from /lib64/libpthread.so.0
35 0x00007f77adbcedd3 in clone () from /lib64/libc.so.6

Signed-off-by: Hua Huang <hua.huang@seagate.com>
…found.

This new mechanism is introduced in 2f43b29 .

Signed-off-by: Hua Huang <hua.huang@seagate.com>
@huanghua78 huanghua78 force-pushed the rmsvc_prepare_to_stop branch from 9682dd9 to 8a9fe7c Compare July 14, 2022 11:12
@rkothiya rkothiya merged commit 90108b0 into Seagate:main Jul 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants