Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to get person credential #2226

Open
jleach opened this issue Oct 18, 2024 · 14 comments
Open

Unable to get person credential #2226

jleach opened this issue Oct 18, 2024 · 14 comments
Assignees
Labels
bug Something isn't working

Comments

@jleach
Copy link
Member

jleach commented Oct 18, 2024

User reported unable to get person credential. Issue seems to be consistent for him but intermittent for others. On backgrounding / foregrounding the app message trickle in one at a time.

There may be two different issues:

  1. Some times the wallet hangs and no messages (basic or offer) are received. Restarting the wallet fixes this and both messages arrive.
  2. Some times the offer is received but the wallet does not automatically accept the offer.

1. No Messages

  • Was able to reproduce on the Indecio public mediator and BC mediator.
  • Both messages (basic message and offer) were in notifications when I closed the "please wait..." screen.
  • Connection issues to IAS. The wallet hangs at "Connected to IAS agent, connectionId: ....". Could be a mediator issue as I've never seen this when testing with ACA-py before.
  • Tried mediator.dev.animo.id and that worked no better.
  • The BC mediator seems the most stable and works the most reliably out of the other two tested.

2. Hang

  1. This issue can be reproduced by restarting the app and get a person credential.
  2. Then I delete it, and repeat the process. This time I pause on a BCSC screen until you see the following three lines appear in the mediator logs (1.0.20 may only print one line so repetition may be irrelevant):
2024-10-18 21:10:18,830 aries_cloudagent.transport.inbound.ws ERROR Unexpected Websocket message type received: WSMsgType.CLOSED: None, None
2024-10-18 21:10:18,831 aries_cloudagent.transport.inbound.ws ERROR Unexpected Websocket message type received: WSMsgType.CLOSED: None, None
2024-10-18 21:10:18,831 aries_cloudagent.transport.inbound.ws ERROR Unexpected Websocket message type received: WSMsgType.CLOSED: None, None
  1. Complete the process.

You will see this in the mediator logs, at which point the BC Wallet app will be stalled at "Just a moment....". For both 1.0.20 and 1.0.21 it repeats 3x.

2024-10-18 21:10:25,945 aries_cloudagent.core.conductor WARNING Cannot queue message for delivery, no supported transport
2024-10-18 21:10:26,166 aries_cloudagent.core.conductor WARNING Cannot queue message for delivery, no supported transport
2024-10-18 21:10:26,529 aries_cloudagent.core.conductor WARNING Cannot queue message for delivery, no supported transport
  • This happens with or without pausing mediator message pickup;
  • Android 14
  • 1.0.21 latest
  • 1.0.20 has the same issue. It only prints the "ERROR Unexpected Websocket message type received:" once so repeating it 3x may be irrelevant.
@jleach jleach self-assigned this Oct 18, 2024
@jleach jleach added the bug Something isn't working label Oct 18, 2024
@cvarjao
Copy link
Member

cvarjao commented Oct 22, 2024

While we can do work on the server/mediator side to increase stability/availability, the wallet should not assume that a websocket connection is long lived. The connection can be interrupted at anytime by any reason, but the wallet should be resilient and try to re-connect.

@cvarjao cvarjao added this to the 1.0.21 milestone Oct 24, 2024
@jleach jleach self-assigned this Oct 30, 2024
@jleach
Copy link
Member Author

jleach commented Nov 8, 2024

Created some test mechanics here. Unable to reproduce outside of the mobile environment (on a mobile device).

@jleach
Copy link
Member Author

jleach commented Nov 8, 2024

Current Findings:

  • This always works the first time one gets a Person credential;
  • It mostly fails every time after the first success (following a fresh install).
  • I can reproduce this issue on an ACA-py and the non-ACA-py cloud mediator from Indicio. I confirmed with them it does not use ACA-py. This likely means it is a issue with Credo-ts.
  • I can reproduce on the latest 1.0.0 release of ACA-py.
  • I am unable to reproduce this using a test bench because I cannot get the WebSocket to close with the same error code (257). Rather, closing it manually results in a normal exit code (8). The only way to get the close with code 257 is when the Jest test exits. This may be why it cannot be reproduced outside of mobile.
  • This message, which can be seen at times in the mobile wallet debug Message is undeliverable to connection b535c73e-3706-42b7-98d9-aa919d2f39b6 (BC Wallet Service (Dev)) is caused by starting message pickup when the WebSocket is closed. Perhaps but good to know.
  • I was able to reproduce this using Traction and the mobile wallet. I don't think this is an issue with the WebSocket not recovering from being closed because rather a problem with Credo-ts not picking up message, and when it does not picking all of them up. If they are picked up they're not being delivered to the event system so they perculate up the the UI.

Next Steps:

  • Add debug messages to wherever Credo-ts checks for new messages to figure out why its not getting all messages, or if it is, why its not passing them to the event system.

@jleach
Copy link
Member Author

jleach commented Nov 8, 2024

There is no real substance to implicit message pickup. It is just a matter of having a long lived connection (WebSocket) and sending a trust ping. Assuming then the mediator is smart enough to figure out it has N messages queued up and it will send them all, in the proper order, to the client (wallet):

From MediationRecipientApi.ts L302

// Implicit means sending ping once and keeping connection open. This requires a long-lived transport
// such as WebSockets to work

Looking through other code there are only some mechanics to deal with opening / closing the websocket. Nothing specific to starting pickup or anything else related to message pickup.

Either all messages are not being sent or something is getting jammed up with rxjs event processing.

Also noticed this error is ok, it results from Credo not being able to find the previously deleted credential. May not be a problem but the exception may be jamming up the get person logic.

 LOG  WARN :
{
  "message": "Failed to process revocation notification message",
  "data": {}
}

Next Steps:

  • I'll try restarting the wallet with no mediator connection. If messages show up then perhaps Credo already has them queued locally and the problem is indeed with rxjs. If they do not show up, then perhaps the mediator is not delivering them all. Not sure how the cloud mediator would be impacted by this.
  • Retest to confirm this issue persists with the cloud mediator.
  • Test without deleting the person credential to see if the above mentioned exception is causing any problems

@jleach
Copy link
Member Author

jleach commented Nov 8, 2024

  • Confirmed, problem exists with cloud mediator. Two message are somewhere in the ether, backgrounding and foregrounding the app results in one message delivered. This needs to be done 2x for both messages to be delivered.
  • No change if message pickup is paused or not.
 LOG  DEBUG :
{
  "message": "Successfully connected to WebSocket wss://ws.us-east.public.mediator.indiciotech.io/ws"
}

@jleach
Copy link
Member Author

jleach commented Nov 12, 2024

Important You need an existing un-revoked Person credential in the wallet to trigger this issue. The reason is, when the app comes into the foreground a trust ping is set which will trigger a message to be delivered. The first message is usually the revocation message which does nothing to advance the workflow. If you don't have a Person credential, the first credential is the offer which allows the workflow to work as expected.

@jleach
Copy link
Member Author

jleach commented Nov 12, 2024

In an Implicit connection a trust ping is sent to start the pickup strategy. When a trust ping is sent it triggers the mediator to deliver one message.

This issue can be kludged by wrapping the trust ping logic when starting mediation in a setInterval(async () => { ... }, timeoutInMs). This causes the trust ping to be sent every timeoutInMs which will in tern cause the mediator to deliver exactly 1 message until it's queue is drained.

@jleach
Copy link
Member Author

jleach commented Nov 13, 2024

TLDR; The "Implicit" message pickup strategy is working as expected which is why this issue can be reproduced across different technologies. There are limitations of the implicit pickup mechanism and Pickup v2 is recommend or implementing similar solutions to ensure reliable message delivery in ACA-Py agent systems.

Problems with Implicit Pickup:

  • Message loss: It can lead to message loss due to a lack of acknowledgment from the recipient, causing the mediator to assume successful delivery even if it failed.
  • Unreliable delivery: The recipient has no way of knowing how many messages are queued or when all messages have been received.
  • Outdated: It's an old and poorly designed mechanism with better alternatives available (Pickup v2).

Alternatives and Solutions:

  • Pickup v2: A more robust protocol with explicit acknowledgments and status messages, addressing the shortcomings of implicit pickup.
  • Indicio's plugin: Indicio developed an ACA-Py plugin for Pickup v2, but it might need updates to work with the latest ACA-Py versions. The code is available on GitHub (link provided in the chat).
  • Persistence branch: A branch in the Indicio repository explores persisting queued messages in Redis for clustered mediators and fault tolerance, but it's not fully completed.

@jleach
Copy link
Member Author

jleach commented Nov 13, 2024

I tried other message pickup strategies, both seem to produce similar results as Implicit. Credo does not seem to want to go into any other mode by default/

mediatorPickupStrategy: MediatorPickupStrategy.PickUpV2,
maximumMessagePickup: 10,

This triggers polling, but this error message is repated in the mediator logs:

mediator-1  | 2024-11-13 18:59:32,569 aries_cloudagent.core.dispatcher ERROR Message parsing failed: 
Unrecognized message type https://didcomm.org/messagepickup/2.0/status-request, sending problem report

and

mediatorPickupStrategy: MediatorPickupStrategy.PickUpV2LiveMode,
maximumMessagePickup: 10,

If I force it to use PickUpV2LiveMode I see the error in the wallet logs:

 LOG  WARN :
{
  "message": "Error initiating message pickup with mediator 841b4ac2-e769-4758-b5db-45b8b2df1a05",
  "data": {}
}
 LOG  ERROR :
{
  "message": "Error resuming agent message pickup, TimeoutError: Timeout has occurred"
}

And the same error in the mediator logs:

mediator-1  | 2024-11-13 19:02:43,858 aries_cloudagent.core.dispatcher ERROR Message parsing failed:
Unrecognized message type https://didcomm.org/messagepickup/2.0/status-request, sending problem report

@jleach
Copy link
Member Author

jleach commented Nov 13, 2024

The status request works with the Indecio cloud mediator:

{
  "message": "Received message with type 'https://didcomm.org/messagepickup/2.0/status', recipient key z6MkgbMs9wXEhWi6UUevkJ6NSfcykotYd71AN31V1VCpjnch and sender key z6MkgSYBM63iHNeiT2VSQu7bbtXhGYCQrPJ8uEGurbfGbbgE",
  "data": {}
}

I've confirmed that MediatorPickupStrategy.Implicit on the Indecio cloud mediator reproduces the same behaviour as ACA-py. However, MediatorPickupStrategy.PickUpV2LiveMode works as expected without polling. The strategy MediatorPickupStrategy.PickUpV2 works as expected but with polling.

NOTE: There is probably an issue in Credo where it always starts / tries implicit pickup first, but message retrieval can easily be restarted in PickUpV2LiveMode.

@jeznorth
Copy link

HA Mediator work will block this issue from completion.

@bryce-mcmath bryce-mcmath self-assigned this Jan 28, 2025
@bryce-mcmath
Copy link
Contributor

The latest build of BC Wallet has fixed this issue in most scenarios, with the sole remaining scenario being:

  • after lockout but before a full exit and restart of the app

From diving into Credo, it appears the events are still occurring and received by the agent but the Credo react hooks are no longer picking up those changes and propagating them to reactive state. To summarize:

// this event will still occur even after agent restart
agent.events
  .observable(CredentialEventTypes.CredentialStateChanged)

// the reactive state from this Credo hook will not pickup changes after agent restart (it does pickup any changes before a restart)
const offers = useCredentialByState(CredentialState.OfferReceived)

@bryce-mcmath
Copy link
Contributor

bryce-mcmath commented Feb 7, 2025

There are some Android-specific issues remaining. Moving back to QA

@bryce-mcmath
Copy link
Contributor

Also worth mentioning, while Lucas and I were experimenting with mediator load tests, he noticed the version of ACA-Py that the current mediator is running is very old. It uses ACA-Py v0.10.4 which was released Oct 9, 2023. ACA-Py is now on 1.2.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants