Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP request hangs with return_route header #1372

Closed
smithbk opened this issue Aug 23, 2021 · 14 comments · Fixed by #1853
Closed

HTTP request hangs with return_route header #1372

smithbk opened this issue Aug 23, 2021 · 14 comments · Fixed by #1853
Assignees

Comments

@smithbk
Copy link

smithbk commented Aug 23, 2021

In testing the issuance flow (AIP v1) where ACAPy is the issuer and the IBM agent is the holder, the HTTP request which the holder makes to ACAPy in order to deliver the final ACK hangs. In other words, when the IBM agent sends the HTTP reque3st containing the ACK to complete the issuance flow, the ACAPy agent never sends an HTTP response. However, when I change my IBM agent to NOT send the return_route header in the Aries ACK message, then ACAPy immediately sends the HTTP response and all is good.

@swcurran
Copy link
Contributor

@ianco can you look at this one? Or at least given an assessment of how quickly a fix is needed?

Thanks!

@TimoGlastra
Copy link
Contributor

Wouldn't this be somehow expected?

From the return route RFC:

For HTTP transports, the presence of this message decorator indicates that the receiving agent MAY hold onto the connection and use it to return messages as designated.

Because there is no message to be returned, the connection is kept alive (which is a desirable feature when you want to do e.g. long polling). Another way to solve would be to close the HTTP connection on the IBM holder side, or by not including the return routing when no response is expected.

This will have impact on how mobile agents interact with ACA-Py

@smithbk
Copy link
Author

smithbk commented Aug 24, 2021

I think only messagepickup/1.0/noop should be long polled.

The reason I say this is because otherwise it puts more of a burden on the sender to know when a return_route should be added or not. In some cases, the sender may want to send a message which may or may not terminate the thread. Only the receiver knows for certain if the thread is complete or if/when it will send another message on the thread. Also, I do not want to async send messages because if the message fails to be delivered via HTTP, I want in some cases to return that to the admin controller.

@ianco
Copy link
Contributor

ianco commented Sep 2, 2021

acapy-receive-inbound

Not sure how legible this is, but it's the basic aca-py flow for receiving and processing a message

To close an inbound return route I suggest one of 2 options:

  • add a method to the Responder class to close a return route (i.e. return 200 status) (this would have to be called from the message-specific handler)
  • add something in the Conductor's dispatch_complete method to close the return route if necessary (this could be called automatically)

I'm not sure of the implications on the Mediator, looking for some input from @TimoGlastra

@swcurran
Copy link
Contributor

swcurran commented Sep 2, 2021

For fun I made a plantuml version of the drawing above — @ianco please review:

acapy-message-handler

I’m thinking this should go in the ACA-Py docs?

@startuml

participant "Inbound\nMessage\nHandler" as oag
participant "http\nTransport" as ht
participant "Internal\nTransport\nManager" as itm
participant "Inbound\nSession" as is
participant "Message\nHandler" as mh
participant "Conductor" as con
participant "Dispatcher" as disp
participant "Responder" as resp


oag -> ht: "inbound_message_handler()"
ht->itm: "create_session()"
itm -> is: "create"
is --> itm
itm --> ht
ht --> is: "receive()"
is --> is: "parse_inbound()"
is --> is: "receive_inbound()"
is --> is: "process_inbound()"
is --> is: "inbound_inbound()"
is --> con: "inbound_msg_router()"
con --> disp: "queue_msg()"
disp --> disp: "handle_msg()"
disp --> disp: "make_msg()"
disp --> resp: "create()"
disp --> mh: "handle"
mh-->resp: "send_reply()"
mh --> disp: ""
disp --> con: ""
con --> con: "dispatch_complete()"
con --> is
is --> ht


@enduml

@ianco
Copy link
Contributor

ianco commented Sep 2, 2021

A few minor tweaks. Yes I think this should go in aca-py docs.

@startuml

participant "Inbound\nMessage\nHandler" as oag
participant "http\nTransport" as ht
participant "Internal\nTransport\nManager" as itm
participant "Inbound\nSession" as is
participant "Conductor" as con
participant "Dispatcher" as disp
participant "Responder" as resp
participant "Message\nProtocol\nHandler" as mh


oag -> ht: "inbound_message_handler()"
ht->itm: "create_session()"
itm -> is: "create"
is --> itm
itm --> ht
ht --> is: "receive()"
is --> is: "parse_inbound()"
is --> is: "receive_inbound()"
is --> is: "process_inbound()"
is --> is: "inbound_handler()"
is --> con: "inbound_message_router()"
con --> disp: "queue_message()"
disp --> disp: "handle_message()"
disp --> disp: "make_message()"
disp --> resp: "create()"
disp --> mh: "handle()"
mh-->resp: "send_reply()"
mh --> disp: ""
disp --> con: ""
con --> con: "dispatch_complete()"
con --> is
is --> ht


@enduml

@ianco
Copy link
Contributor

ianco commented Sep 2, 2021

VPF1JiCm38RlUOgg9pZi1JXCa-001oGqE4wgGcej4OqJ4XU9jyUfiwwTDcwD_w-__JljBfDEDFf1gSWFphYeaPhs1J_3YFq1No5dVOG3Fcli1qXjetCJz76DMw9um8-aCSUGQ88iNLeI9DJ3WcDxZUmkhexyZI7UUGWNKAYyXXu3zgEX81ucu5f-SZbgChRswFcu1lR0es1VzGIN-ho0jnGec32iL-6jKehtqMosdFYXQPqC

@ianco
Copy link
Contributor

ianco commented Sep 9, 2021

OK from a more detailed review of the code, it looks like aca-py should already send an HTTP 200 response even if there is no explicit response message.

This is the method that handles an HTTP request from another agent inbound_message_handler():

https://github.com/hyperledger/aries-cloudagent-python/blob/main/aries_cloudagent/transport/inbound/http.py#L71

You can see that there is logic to return a response message if one is available, however the "fall-through" logic should always return an HTTP 200 response regardless:

return web.Response(status=200)

@ianco
Copy link
Contributor

ianco commented Sep 9, 2021

... unless there is some other error that is crashing aca-py and preventing the response.

@smithbk can you send the aca-py logs for this scenario? --log-level debug

@TimoGlastra
Copy link
Contributor

Just ran into this issue myself, so adding some info.

@ianco it will never reach

return web.Response(status=200)

because a few lines above it waits for a message:

https://github.com/hyperledger/aries-cloudagent-python/blob/31659902475de16118d8322f3cbc9c74f0b85ef5/aries_cloudagent/transport/inbound/http.py#L99

This could be useful for long polling, but it also means the http request won't finish until a message should be delivered to the other agent. Currently in AFJ we have a timeout that waits x seconds before aborting the request, which leaves control to the client agent instead of the server agent.

But if we want to change this behaviour on the ACA-Py side we should change the behaviour of that wait line (maybe adding a timeout before it returns, or directly returning and allowing web socket connections to keep the socket open)

@TimoGlastra
Copy link
Contributor

Just ran into this issue again and after giving it some more thought, I now 100% agree with @smithbk. ACA-Py shouldn't keep the connection alive as this makes it impossible to know for the mobile client whether the request was successful.

@swcurran
Copy link
Contributor

@ianco -- back to you, since you looked at this long ago... :-)

@swcurran
Copy link
Contributor

swcurran commented Jun 3, 2022

@ianco -- Can you please look at this one again? I'm wondering how this would manifest in mobile scenarios. Could this cause delays in under some conditions -- e.g. intermittent delays, such as we are seeing at some times?

@TimoGlastra
Copy link
Contributor

TimoGlastra commented Jul 4, 2022

Made a PR to fix this. @ianco could you take a look at the PR? Mabye @smithbk you could verify whether this PR fixes your issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants