Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDL crashes in EventDispatcherImpl. #2709

Closed
Nilesh-harman opened this issue Oct 22, 2018 · 9 comments
Closed

SDL crashes in EventDispatcherImpl. #2709

Nilesh-harman opened this issue Oct 22, 2018 · 9 comments
Assignees

Comments

@Nilesh-harman
Copy link

Nilesh-harman commented Oct 22, 2018

Hi,
We are observing a crash in SDL. Version used is SDL 4.2

Crash Scenario:

  1. SetMediaClockTimer request has come from Mobile App1.
  2. SetMediaClockTimer has been sent to HMI and timer is started for timeout .
  3. Timer Expired and Request is removed from RequestController list.
  4. Response has come from HMI. Now EventDispatcherImpl will try to access an object , which is under deletion process and lead to crash ..

Our analysis:

ApplicationManagerImpl create shared object of Mobile RPC Commands and give object to RequestController to Add in request Set. Hence If RequestController
remove the object from list then command object will get deleted from memory .
Each Mobile RPC Command inherit CommandRequestImpl , which inherit EventObserver . Some of the RPC's class subscribe events using subscribe_on_event of EventObserver.
EventObserver subscribe for these events with EventDispatcherImpl which internally keep RAW pointer of Mobile RPC command.

Now here we are maintaing memory using RAII (Shared Pointer) and on another place keeping raw pointer of it , which is leading to crash in various scenarios.
Example :
Alert(alert_request.cc)
SetMediaClockTimer
PerformInteraction etc ..

Backtrace:

Above Analysis is based on below gdb BackTrace of all threads, thread1 is accessing Object at the same time thread2 is deleting:

Thread 1:
##########################################################################################################################
#6  0x762782dc in application_manager::event_engine::EventDispatcherImpl::raise_event (this=0x1399e48, event=...)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/event_engine/event_dispatcher_impl.cc:75
               
               
#7  0x76277694 in application_manager::event_engine::Event::raise (this=this@entry=0x601fe528, event_dispatcher=...)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/event_engine/event.cc:44
               
#8  0x76489b64 in application_manager::commands::UISetMediaClockTimerResponse::Run (this=0x15f7970)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/commands/hmi/ui_set_media_clock_timer_response.cc:51

#9  0x76e09fc0 in application_manager::ApplicationManagerImpl::ManageHMICommand (this=0x1399cc8, message=...)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/application_manager_impl.cc:1731
               
#10 0x76e1da70 in application_manager::ApplicationManagerImpl::ProcessMessageFromHMI (this=this@entry=0x1399cc8, message=...)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/application_manager_impl.cc:2172
               
#11 0x76e1de40 in application_manager::ApplicationManagerImpl::Handle (this=0x1399cc8, message=...)
##########################################################################################################################

Thread 2:
##########################################################################################################################
#6  0x76278f18 in application_manager::event_engine::EventObserver::unsubscribe_from_all_events (this=this@entry=0x5010e2c4)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/event_engine/event_observer.cc:60
#7  0x76278f40 in application_manager::event_engine::EventObserver::~EventObserver (this=0x5010e2c4, __in_chrg=<optimized out>)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/event_engine/event_observer.cc:47

#22 application_manager::request_controller::RequestController::TerminateRequest (this=this@entry=0x1399fa0, correlation_id=1994002700, correlation_id@entry=89, connection_key=19,
    connection_key@entry=65547, function_id=89, function_id@entry=15, force_terminate=force_terminate@entry=false)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/request_controller.cc:253
#23 0x76da1370 in application_manager::request_controller::RequestController::OnMobileResponse (this=this@entry=0x1399fa0, mobile_correlation_id=mobile_correlation_id@entry=89,
    connection_key=connection_key@entry=65547, function_id=15)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/request_controller.cc:274
#24 0x76e1f7ac in application_manager::ApplicationManagerImpl::SendMessageToMobile (this=0x1399cc8, message=..., final_message=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/application_manager_impl.cc:1426
#25 0x76342164 in application_manager::commands::SetMediaClockTimerResponse::Run (this=0x160a268)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/commands/mobile/set_media_clock_timer_response.cc:49
#26 0x76e0e76c in application_manager::ApplicationManagerImpl::ManageMobileCommand (this=0x1399cc8, message=..., origin=<optimized out>)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/application_manager_impl.cc:1556
######################################################################################################################################

Note:

Same root cause Issue was reported earlier also and it was solved by blocking duplicate request from mobile:
https://github.com/smartdevicelink/sdl_core/pull/2101/commits

Thanks.

@jacobkeeler
Copy link
Contributor

As you stated in the issue, this was fixed with #2101. If you need it in SDL Core 4.2.0, you will need to apply the patch yourself, as that version is no longer supported.

@Nilesh-harman
Copy link
Author

Hi @jacobkeeler

Scenario mentioned in #2101 is different from the scenario mentioned in current issue . we have mentioned one of the scenario above in issue description . We have already applied the patch #2101 , which has solved Duplicate correlation ID crash issue but we are still getting crash in other scenarios. Could you please provide your support ??

Thanks
Nilesh

@AStasiuk
Copy link

Reopened with agreement with @lauratonwe.
We got an additional more information from issue creator:

  1. Whenever there is a RPC command from mobile, a new (corresponding HMI) RPC will be formed in SDL and same will be sent to Applink layer.
  2. There will be a map in RequestController which maintains these - mobile RPC pointer and HMI RPC pointer - based on correlation ID.

In issue #2101,

  1. There are 2 mobile RPC commands with same correlation ID. So, the 2nd pointer will not get added to the map as there is an entry already in the map with same correlation ID. But, the HMI RPC pointer is communicated to EventDispatcher and corresponding RPC is communicated to Applink layer.
  2. When the addition to the map fails, its corresponding HMI RPC pointer will go out of scope and it gets deleted.
  3. When response comes from Applink layer for the 2nd HMI RPC, EventDispatcher will use the 2nd HMI RPC pointer which got deleted in step Contributing guidelines file is missing #4.
  4. This was causing the crash which got resolved in issue Fix EventDispatcher crash by rejecting duplicate correlation_ids #2101.

In current issue #2709,

  1. HMI RPC is communicated to Applink layer and HMI RPC pointer is stored in EventDispatcher and map at RequestController has the - mobile RPC pointer and HMI RPC pointer.
  2. When the timeout happens at RequestController, it will go ahead and delete the HMI RPC pointer, but EventDispatcher is still having the HMI RPC pointer.
  3. Whenever there is a response from Applink layer for the HMI RPC pointer deleted at step Loss of communication on Ubuntu 14.04.1 #8, then EventDispatcher will raise event on the HMI RPC pointer which got deleted in step Loss of communication on Ubuntu 14.04.1 #8.
  4. This will cause crash.

@AStasiuk AStasiuk reopened this Oct 30, 2018
@LitvinenkoIra LitvinenkoIra self-assigned this Oct 31, 2018
@LitvinenkoIra
Copy link
Contributor

LitvinenkoIra commented Oct 31, 2018

Hi @Nilesh-harman

Please take a look at the probable fix for this issue
29c586c
this fix is available on sdl_core master branch, and release Open SDL 5.0
Please let us know if it works for you.
For further investigation we need SDL logs, and SDL core file, so it would be nice if you could provide it.

@Nilesh-harman
Copy link
Author

Hi @LitvinenkoIra ,

Still I feel that the issue can happen with the fix 29c586c.

Reason:

void EventDispatcherImpl::raise_event(const Event& event) { …
AutoUnlock unlock_observer(observer_lock);

Lets say that the context switch happens and it executes the function EventDispatcherImpl::remove_observer_from_vector(EventObserver& observer)
then problem can still happen while raising on_event temp->on_event(event);

================================================

Thanks.

@LitvinenkoIra
Copy link
Contributor

LitvinenkoIra commented Nov 5, 2018

@Nilesh-harman
In your reason: If it executes the function EventDispatcherImpl::remove_observer_from_vector(EventObserver& observer) then we not able to execute EventDispatcherImpl::raise_event(const Event& event) function at the same time because of observer_lock_.
observer_lock_ will be unlocked before calling temp->on_event(event); but even if we go to remove_observer_from_vector function at this moment and delete observer from the observers vector - the observer will still alive and it will be deleted after execute the command in RequestController.

Let me explain how I understand the flow after 29c586c:
When the timeout happens at RequestController, it will go ahead and delete the mob command. Command is inherited from EventObserver, so in unsubscribe_from_all_events() function will be called in destructor. unsubscribe_from_all_events() -> remove_observer(..) -> remove_observer_from_vector(..) all this functions are thread safe.
So at the moment when we go to raise_event(..) function in EventDispatcherImpl : observer_lock_ will wait until the remove_observer_from_vector function finishes, so
observers for the corresponding event.id will be empty and we will not call temp->on_event(event);
And there are no reasons for the crash

@LitvinenkoIra
Copy link
Contributor

I think that the root cause of your crash was in while (!observers_.empty())
flow before 29c586c :
we go to EventDispatcherImpl::raise_event(const Event& event) function
call !observers_.empty() and it's returns true, so we will executetemp->on_event(event);
but at this moment when we call !observers_.empty() in EventDispatcherImpl::remove_observer_from_vector(EventObserver& observer) we remove this observer from vector and as result - core crash in temp = *observers_.begin();

@LitvinenkoIra
Copy link
Contributor

LitvinenkoIra commented Nov 5, 2018

@Nilesh-harman also could you please specify what branch or tag do you use.
Because I see

#6  0x762782dc in application_manager::event_engine::EventDispatcherImpl::raise_event (this=0x1399e48, event=...)
    at /data/work/IConnectivit/Primes_P010/modules/platform/imp/conn/AppLink/pf/src/private/src/components/application_manager/src/event_engine/event_dispatcher_impl.cc:75

in backtrace, but in all 4.2.* tags raise_event function ends before line 75
https://github.com/smartdevicelink/sdl_core/blob/4.2.0/src/components/application_manager/src/event_engine/event_dispatcher_impl.cc#L71

Thanks

@jacobkeeler
Copy link
Contributor

Closing due to no response. @Nilesh-harman feel free to reopen if you are still experiencing the issue on the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants