Skip to content

Commit

Permalink
CORTX-33564: Fixed panic in rpc_session_cancelled() due to session state
Browse files Browse the repository at this point in the history
Problem:
  While session was getting canceled some other thread is trying to terminate
  that session and moves session state to M0_RPC_SESSION_TERMINATING, which
  lead to hit assert M0_POST(session->s_sm.sm_state == M0_RPC_SESSION_IDLE)
  in rpc_session_cancel().

  Seagate#5  in m0_arch_panic (c=c@entry=0x7f7276a91b40 <__pctx.14289>, ap=ap@entry=0x7f7268c05ce0) at lib/user_space/uassert.c:131
  Seagate#6  in m0_panic (ctx=ctx@entry=0x7f7276a91b40 <__pctx.14289>) at lib/assert.c:52
  Seagate#7  in m0_rpc_session_cancel (session=session@entry=0x56283c7c13d8) at rpc/session.c:863
  Seagate#8  in m0_rpc_conn_sessions_cancel (conn=conn@entry=0x56283c7c1030) at rpc/conn.c:1333
  Seagate#9  in rpc_conn__on_service_event_cb (clink=<optimized out>) at rpc/conn.c:1364
  Seagate#10 in clink_signal (clink=clink@entry=0x56283c7c12c0) at lib/chan.c:135
  Seagate#11 in chan_signal_nr (chan=chan@entry=0x56283c6a8768, nr=1) at lib/chan.c:154
  Seagate#12 in m0_chan_broadcast (chan=chan@entry=0x56283c6a8768) at lib/chan.c:174
  Seagate#13 in ha_state_accept (ignore_same_state=1, note=0x7f7268c06060, confc=0x56283816b028) at ha/note.c:18

Solution:
  It is possible that some other thread is trying to terminate the same session
  while session is getting cancelled, only the IDLE/BUSY sessions are allowed to
  cancel. Updated pre check to return from m0_rpc_cancel instead of panic/assert.
  Also replaced M0_POST()/assert with  proper debug log.

Signed-off-by: Yatin Mahajan <yatin.mahajan@seagate.com>
  • Loading branch information
yatin-mahajan committed Jul 12, 2022
1 parent fc82565 commit 2b01852
Showing 1 changed file with 3 additions and 7 deletions.
10 changes: 3 additions & 7 deletions rpc/session.c
Original file line number Diff line number Diff line change
Expand Up @@ -854,12 +854,13 @@ M0_INTERNAL void m0_rpc_session_cancel(struct m0_rpc_session *session)
M0_ENTRY("session %p", session);

M0_PRE(session->s_session_id != SESSION_ID_0);
m0_rpc_machine_lock(session->s_conn->c_rpc_machine);
if (!M0_IN(session_state(session),
(M0_RPC_SESSION_BUSY, M0_RPC_SESSION_IDLE))) {
M0_LEAVE("session %p", session);
M0_LEAVE("session %p state=%d", session, session_state(session));
m0_rpc_machine_unlock(session->s_conn->c_rpc_machine);
return;
}
m0_rpc_machine_lock(session->s_conn->c_rpc_machine);
if (session->s_cancelled)
goto leave_unlock;
session->s_cancelled = true;
Expand All @@ -871,11 +872,6 @@ M0_INTERNAL void m0_rpc_session_cancel(struct m0_rpc_session *session)
leave_unlock:
m0_rpc_machine_unlock(session->s_conn->c_rpc_machine);
M0_POST(pending_item_tlist_is_empty(&session->s_pending_cache));
if (!M0_IN(session_state(session),
(M0_RPC_SESSION_BUSY, M0_RPC_SESSION_IDLE))) {
M0_LEAVE("session is already finalised %p", session);
return;
}
M0_LEAVE("session %p", session);
}

Expand Down

0 comments on commit 2b01852

Please sign in to comment.