Skip to content

Commit

Permalink
CORTX-30993 - RCA: Due to recursive calls to be_op_state_change where…
Browse files Browse the repository at this point in the history
… gc callback of (Seagate#1820)

Problem : In m0_be_op_fini, when bos_tlink_fini is performed then its expected that bo_set_link should not have link for link for parent's m0_be_op::bo_children.

State seen at the time of crash:
Two gft_pd_io in progress state, with corresponding two bio in sched queue; crash is hit while performing the gc callback processing for gft whhose gft_pd_io is in progress state and bio is queued behind an active io.

Panic:
2022-04-24 11:19:15,672 - motr[00107]: e2e0 FATAL [lib/assert.c:50:m0_panic] panic: (!m0_list_link_is_in(link)) at m0_list_link_fini() (lib/list.c:178) [git: 2.0.0-670-27-g0012fe90] /etc/cortx/log/motr/0696b1d9e4744c59a92cb2bdded112ac/trace/m0d-0x7200000000000001:0x2e/m0trace.107
2022-04-24 11:19:15,672 - Motr panic: (!m0_list_link_is_in(link)) at m0_list_link_fini() lib/list.c:178 (errno: 0) (last failed: none) [git: 2.0.0-670-27-g0012fe90] pid: 107 /etc/cortx/log/motr/0696b1d9e4744c59a92cb2bdded112ac/trace/m0d-0x7200000000000001:0x2e/m0trace.107
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7f7514e79c83]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7f7514e79e59]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_panic+0x13d)[0x7f7514e6890d]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(+0x3895f6)[0x7f7514e6c5f6]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_be_op_fini+0x1f)[0x7f7514dae66f]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(+0x2cb826)[0x7f7514dae826]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2c4c5b)[0x7f7514da7c5b]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2cb826)[0x7f7514dae826]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2c300a)[0x7f7514da600a]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2c3119)[0x7f7514da6119]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x386f7f)[0x7f7514e69f7f]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x386ffa)[0x7f7514e69ffa]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(m0_chan_broadcast_lock+0x1d)[0x7f7514e6a08d]

Backtrace:
(gdb) bt
#0 0x00007f7512d8938f in raise () from /lib64/libc.so.6
Seagate#1 0x00007f7512d73dc5 in abort () from /lib64/libc.so.6
Seagate#2 0x00007f7514e79e63 in m0_arch_panic (c=c@entry=0x7f751531ade0 <__pctx.4611>, ap=ap@entry=0x7f74afffe390)
at lib/user_space/uassert.c:131
Seagate#3 0x00007f7514e6890d in m0_panic (ctx=ctx@entry=0x7f751531ade0 <__pctx.4611>) at lib/assert.c:52
Seagate#4 0x00007f7514e6c5f6 in m0_list_link_fini (link=) at lib/list.c:178
Seagate#5 0x00007f7514e70310 in m0_tlink_fini (d=d@entry=0x7f75152880a0 <bos_tl>, obj=obj@entry=0x56523e641a90) at lib/tlist.c:283
Seagate#6 0x00007f7514dae66f in bos_tlink_fini (amb=0x56523e641a90) at be/op.c:109
Seagate#7 m0_be_op_fini (op=0x56523e641a90) at be/op.c:109
Seagate#8 0x00007f7514dae826 in be_op_state_change (op=, state=state@entry=M0_BOS_DONE) at be/op.c:213
Seagate#9 0x00007f7514daea17 in m0_be_op_done (op=) at be/op.c:231
Seagate#10 0x00007f7514da7c5b in be_io_sched_cb (op=op@entry=0x56523e5f7870, param=param@entry=0x56523e5f7798) at be/io_sched.c:141
Seagate#11 0x00007f7514dae826 in be_op_state_change (op=op@entry=0x56523e5f7870, state=state@entry=M0_BOS_DONE) at be/op.c:213
Seagate#12 0x00007f7514daea17 in m0_be_op_done (op=op@entry=0x56523e5f7870) at be/op.c:231
Seagate#13 0x00007f7514da600a in be_io_finished (bio=bio@entry=0x56523e5f7798) at be/io.c:555
Seagate#14 0x00007f7514da6119 in be_io_cb (link=0x56523e61ac60) at be/io.c:587
Seagate#15 0x00007f7514e69f7f in clink_signal (clink=clink@entry=0x56523e61ac60) at lib/chan.c:135
Seagate#16 0x00007f7514e69ffa in chan_signal_nr (chan=chan@entry=0x56523e61ab58, nr=0) at lib/chan.c:154
Seagate#17 0x00007f7514e6a06c in m0_chan_broadcast (chan=chan@entry=0x56523e61ab58) at lib/chan.c:174
Seagate#18 0x00007f7514e6a08d in m0_chan_broadcast_lock (chan=chan@entry=0x56523e61ab58) at lib/chan.c:181
Seagate#19 0x00007f7514f4209a in ioq_complete (res2=, res=, qev=, ioq=0x56523e5de610)
at stob/ioq.c:587
Seagate#20 stob_ioq_thread (ioq=0x56523e5de610) at stob/ioq.c:669
Seagate#21 0x00007f7514e6f49e in m0_thread_trampoline (arg=arg@entry=0x56523e5de6e8) at lib/thread.c:117
Seagate#22 0x00007f7514e7ab11 in uthread_trampoline (arg=0x56523e5de6e8) at lib/user_space/uthread.c:98
Seagate#23 0x00007f751454915a in start_thread () from /lib64/libpthread.so.0
Seagate#24 0x00007f7512e4edd3 in clone () from /lib64/libc.so.6

RCA - Sequence of Events:

be_tx_group_format_seg_io_op_gc invoked for gft_pd_io_op of tx_group_fom_1 (last_child is false)
(gdb) p &((struct m0_be_group_format *)cb_gc_param)->gft_pd_io_op
$29 = (struct m0_be_op *) 0x56523e641a90

be_tx_group_format_seg_io_op_gc handling of gft_pd_io_op invokes m0_be_op_done for gft_tmp_op (no callbacks for gft_tmp_op) but now last_child is set true for parent as its both child (gft_tmp_op and gft_pd_io_op) op dones have been invoked

m0_be_op_done handling of gft_tmp_op invokes be_op_state_change with M0_BOS_DONE for parent(tgf_op)

During be_op_state_change processing for main parent tgf_op, m0_sm_state_set will update bo_sm state and it will unblock the tx_group_fom_1 by triggering op->bo_sm.sm_chan
This recursive callback processing happens in context of stob_ioq_thread which is initialized on M0_STOB_IOQ_NR_THREADS.
Here due to invocation of gft_tmp_op (i.e peer) child done processing from gft_pd_io_op child gc processing results in their parent early callback invocation.

Parent Callback Prcoseeing:
6. This now unblocks tx_group_fom_1 which will lead to m0_be_pd_io_put in m0_be_group_format_reset and and tx_group_fom_1 will move to TGS_OPEN.
So pd_io and tx_group_fom_1 is now ready for reuse.

Problem window:
7. problem will now occur in window if remaining gc callback processing of gft_pd_io_op
i.e.
m0_be_op_fini(&gft->gft_tmp_op);
m0_be_op_fini(op);
is being done if the pd_io and/or tx_group_fom_1 is reused with new context.

Solution:
Removal of gft_tmp_op altogether will ensure that parent callback processing never invoked ahead of its child callback processing
This way tx_group_fom will always be notifed of seg io completion only after all the relevent child calbback processing is completed and thereby
will avoid the crashes seen in the gc callback processing(be_tx_group_format_seg_io_op_gc) after m0_be_op_done(&gft->gft_tmp_op);
In proposed solution main parent op is made active at the start at the same place where gft_tmp_op was being activated in order to put this parent
into active state; there by making gft_tmp_op redundent and avoiding the out of order execution of child/parent callback executions;
RCA: Due to recursive calls to be_op_state_change where gc callback of gft_op i.e. child1 invokes done callback of gft_tmp_op i.e. child 2 which subsequently results in invocation of parent be_op_state_change. This results in group fom getting completed ahead of child op callback processing. so the subsequently crash is observed when group is reused before child callback processing is finished.

Signed-off-by: Vidyadhar Pinglikar vidyadhar.pinglikar@seagate.com
  • Loading branch information
vidyadhar-pinglikar authored and Mehul Joshi committed Jul 13, 2022
1 parent 6faff08 commit 9dfc351
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 9 deletions.
9 changes: 1 addition & 8 deletions be/tx_group_format.c
Original file line number Diff line number Diff line change
Expand Up @@ -540,10 +540,6 @@ static void be_tx_group_format_seg_io_finished(struct m0_be_op *op, void *param)

static void be_tx_group_format_seg_io_op_gc(struct m0_be_op *op, void *param)
{
struct m0_be_group_format *gft = param;

m0_be_op_done(&gft->gft_tmp_op);
m0_be_op_fini(&gft->gft_tmp_op);
m0_be_op_fini(op);
}

Expand Down Expand Up @@ -571,11 +567,8 @@ M0_INTERNAL void m0_be_group_format_seg_place(struct m0_be_group_format *gft,
gft_op = &gft->gft_pd_io_op;
M0_SET0(gft_op);
m0_be_op_init(gft_op);
M0_SET0(&gft->gft_tmp_op);
m0_be_op_init(&gft->gft_tmp_op);
m0_be_op_active(op);
m0_be_op_set_add(op, gft_op);
m0_be_op_set_add(op, &gft->gft_tmp_op);
m0_be_op_active(&gft->gft_tmp_op);
m0_be_op_callback_set(gft_op, &be_tx_group_format_seg_io_starting,
gft, M0_BOS_ACTIVE);
m0_be_op_callback_set(gft_op, &be_tx_group_format_seg_io_finished,
Expand Down
1 change: 0 additions & 1 deletion be/tx_group_format.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,6 @@ struct m0_be_group_format {
struct m0_be_log_discard_item *gft_log_discard_item;
struct m0_ext gft_ext;
struct m0_be_op gft_pd_io_op;
struct m0_be_op gft_tmp_op;
/** is used in m0_be_group_format_prepare() */
struct m0_be_op gft_pd_io_get;
/** is used in m0_be_group_format_prepare() */
Expand Down

0 comments on commit 9dfc351

Please sign in to comment.