Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #2411: Workaround: Send Bye when received DownloaderResponse in wrong state #2438

Merged
merged 2 commits into from
Aug 22, 2017

Conversation

abdulazizali77
Copy link
Contributor

@abdulazizali77 abdulazizali77 commented Aug 21, 2017

@deruelle @FerUy @gvagenas
The current commit is admittedy a workaround. This doesnt completely fix the race condition issue, but at least tries to cover one failure in UssdInterpreter
Theres some major design+ cleanup work to be done in UssdCallManager UssdInterpreter and UssdCall and Downloader as i see it

@FerUy
Copy link

FerUy commented Aug 21, 2017

Thanks for the effort @abdulazizali77

@FerUy
Copy link

FerUy commented Aug 21, 2017

Hi @abdulazizali77 I tested your patch with provided restcomm-connect.ussd-8.2.0-SNAPSHOT.jar and the results didn't change anything from previous tests.

First I tested the same one as the previous one here, i.e.
RVD ES Timeout = 10 secs
USSD Gateway dialogtimeout = 5 secs
rvd.xml externalServiceTimeout = not set
External Service sleep time = 20 secs

The trace shows exactly the same behaviour: as can be seen in attached Wireshark trace, when USSD Gateway sends the SIP BYE after dialogtimeout value is trespassed (5 seconds) which is not responded by RC. The ES module sends an HTTP 200 OK at 5 seconds as it would have reached its default externalServiceTimeout (it was set much higher in ES) and therefore Restcomm-Connect sends SIP BYE to USSD Gateway too, which is responded with SIP 200 OK by the USSD Gateway. Again, initial SIP BYE sent by USSD Gateway is not yet responded by Restcomm-Connect. Weirdly, USSD Gateway sends previous SIP BYE again to Restcomm-Connect, which is answered by it with a now logical SIP 487 Call leg/Transaction does not exist.So, although this would eventually work in the sense of terminating in a not proper way the SIP dialog (and not according to the configured ES timeout -this is more issue #2410 related-), nothing changed from previous call flow.

pr2411_ESTimeout10s_externalServiceTimeoutNOTSET_USSDdialogTimeou5s_sleep20.pcap.pcapng.zip

Then, I tested this scenario:
RVD ES Timeout = 10 secs
USSD Gateway dialogtimeout = 25 secs
rvd.xml externalServiceTimeout = 6 secs
External Service sleep time = 20 secs
This is when everything goes terribly wrong. Although RC times out at 5 seconds, no SIP BYE is sent back to USSD Gateway. USSD Gateway times out according to its configured dialog timeout (25 seconds later it sent the last SIP INFO), so it starts sending SIP BYE, which is never answered by RC. As you can see in the trace, USSD Gw keeps sending SIP BYE until it considers RC is "dead" and stops. At USSD Gw side, either MAP and SIP dialogs are terminated, but on RC, SIP dialog is kept forever. This is the worst case scenario, and it's still working wrong.

pr2411_ESTimeout10s_externalServiceTimeout6s_USSDdialogTimeou25s_sleep20.pcap.pcapng.zip
server.log.zip

@FerUy
Copy link

FerUy commented Aug 21, 2017

Look at server.log error next for the problem described above:

15:22:11,157 ERROR [org.restcomm.connect.ussd.interpreter.UssdInterpreter] (RestComm-akka.actor.default-dispatcher-5) No transition could be found from a(n) Processing info request from client state to a(n) Disconnecting state.: org.restcomm.connect.commons.fsm.TransitionNotFoundException: No transition could be found from a(n) Processing info request from client state to a(n) Disconnecting state.
	at org.restcomm.connect.commons.fsm.FiniteStateMachine.transition(FiniteStateMachine.java:60) [restcomm-connect.commons-8.2.0.1278.jar:8.2.0.1278]
	at org.restcomm.connect.ussd.interpreter.UssdInterpreter.onReceive(UssdInterpreter.java:482) [restcomm-connect.ussd-8.2.0-SNAPSHOT.jar:8.2.0-SNAPSHOT]
	at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:159) [akka-actor_2.10-2.1.2.jar:]
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:425) [akka-actor_2.10-2.1.2.jar:]
	at akka.actor.ActorCell.invoke(ActorCell.scala:386) [akka-actor_2.10-2.1.2.jar:]
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:230) [akka-actor_2.10-2.1.2.jar:]
	at akka.dispatch.Mailbox.run(Mailbox.scala:212) [akka-actor_2.10-2.1.2.jar:]
	at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:506) [akka-actor_2.10-2.1.2.jar:]
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:262) [scala-library-2.10.1.jar:]
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) [scala-library-2.10.1.jar:]
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1478) [scala-library-2.10.1.jar:]
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) [scala-library-2.10.1.jar:]

15:22:11,163 ERROR [org.restcomm.connect.commons.faulttolerance.RestcommSupervisorStrategy] (RestComm-akka.actor.default-dispatcher-12) RestcommSupervisorStrategy, actor exception handling. Actor path akka://RestComm/user/$g/$a, exception cause org.restcomm.connect.commons.fsm.TransitionNotFoundException: No transition could be found from a(n) Processing info request from client state to a(n) Disconnecting state., default exception handling strategy Resume

@FerUy
Copy link

FerUy commented Aug 21, 2017

Hi @abdulazizali77, as commented in Slack, now I see it working as expected. I proceeded with the exact same tests as described above and everything looks good to me. Attached next are traces and RC server logs.
patch2_issue2411.zip

Moreover, the CDRs as seen in following image taken from RC GUI look as expected, both with canceled status (first one due to external service timeout, second one due to USSD Gw timeout).

tests_cdrs

In summary, for me it's a go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants