Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade ACA-Py Version #142

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Upgrade ACA-Py Version #142

wants to merge 1 commit into from

Conversation

WadeBarnes
Copy link
Member

  • Upgrade to ACA-Py 0.12.1
  • Ensure the agent includes the full webhook payload.

- Upgrade to ACA-Py 0.12.1
- Ensure the agent includes the full webhook payload.

Signed-off-by: Wade Barnes <wade@neoterictech.ca>
@WadeBarnes WadeBarnes requested review from ianco and esune May 24, 2024 20:58
@WadeBarnes
Copy link
Member Author

Marked as a draft because I haven't applied these changes yet. I'm waiting to resolve the Sovrin TestNet issues before doing this.

@WadeBarnes
Copy link
Member Author

Deploy this upgrade before bcgov/von-bc-registries-agent-configurations#79. We want to test the compatibility with the older ACA-Py version on the BC Reg side first.

@WadeBarnes
Copy link
Member Author

I've applied the AGENT_DEBUG_WEBHOOKS=true environment variable to all of the environments so it's in place when the updated image is deployed.

@WadeBarnes
Copy link
Member Author

WadeBarnes commented May 28, 2024

Deploying the image requires a bit more coordination, since there are secure storage (wallet) upgrades that will be performed once started, and the OrgBook wallets are rather large.

Steps for each environment:

  • Update the agent's HPA, set it to min 1, max 1.
  • Disable health checks on the agent. We can't have the agent restarting in the middle of an upgrade.
  • Backup the wallet.
  • Deploy the new image.
  • Scale the agent to 1.
  • Wait for all of the upgrades to complete.
  • Test
  • Restore health checks
  • Restore HPA
  • Celebrate

Use the upgrade process in dev and test to get a sense of how long the process will take for the prod environment. Refine the process as needed.

Notes:

  • Upgrading from - to:
    • artifacts.developer.gov.bc.ca/docker-remote/bcgovimages/aries-cloudagent:py36-1.16-1_0.7.1
    • artifacts.developer.gov.bc.ca/github-docker-remote/hyperledger/aries-cloudagent-python:py3.9-indy-1.16.0-0.12.1

@WadeBarnes
Copy link
Member Author

WadeBarnes commented Nov 26, 2024

Related Indy to Askar secure storage migration test results:

DEV:

  • 12.46 GiB before migration
  • 5.29 GiB after migration
  • 81705 Credentials
  • Backup Restore: 39s
  • Migration: 47m 3.5s

TEST:

  • 86.87 GiB
  • 3658802 Credentials
  • Backup Restore: 31m 59.1s
  • Migration: Started yesterday afternoon, has not completed yet.

PROD:

  • 163.7 GiB
  • 5994241 Credentials
  • Backup Restore: 66m 25.6s
  • Migration: Failed after 524m 7.3s (8.7 hours) - ran out of disk space.
    • 2,524,200 items migrated before failure.
    • PVC Size 200GiB

@WadeBarnes
Copy link
Member Author

WadeBarnes commented Nov 27, 2024

Progress on OrgBook secure storage migration tests - since yesterday:

prod:

  • Migrated ~8.3 million item records of 10,172,915. Migration still in progress, it has not switched to the update (post migration) steps yet.
  • PVC usage - 260.9 GiB of 300 GiB

test:

  • Updated (post migration) ~2.9 million credentials of 3,659,022.
  • PVC usage - 89.94 GiB of 200 GiB

@swcurran
Copy link
Contributor

How much of your time is being taken on this @WadeBarnes ? Given that we are looking to revamp how data is fed into OrgBook (likely eliminating the DIDComm and AnonCreds processing), should we abandon this effort? It will likely be necessary to load the data from scratch into a new wallet.

@WadeBarnes
Copy link
Member Author

WadeBarnes commented Nov 27, 2024

Currently just monitoring the migration process occasionally so we get metrics.

@WadeBarnes
Copy link
Member Author

WadeBarnes commented Nov 27, 2024

Progress on OrgBook secure storage migration tests:

prod:

  • Still in progress

test:

  • Updated (post migration) 3,513,600 credentials of 3,659,022.
    • Failed after 2622m 20.1s (43.7 hours). The wallet container restarted, interrupting (possibly corrupting) the process.

Confirmed the upgrade/migration process does not continue where it left off. Restarting the process on the already partly upgraded and converted wallet encounters the following error:

Traceback (most recent call last):
  File "/usr/local/bin/askar-upgrade", line 8, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.10/site-packages/acapy_wallet_upgrade/__main__.py", line 207, in entrypoint
    asyncio.run(main(**vars(args)))
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/acapy_wallet_upgrade/__main__.py", line 202, in main
    await strategy_inst.run()
  File "/usr/local/lib/python3.10/site-packages/acapy_wallet_upgrade/strategies.py", line 553, in run
    await self.conn.pre_upgrade()
  File "/usr/local/lib/python3.10/site-packages/acapy_wallet_upgrade/pg_connection.py", line 59, in pre_upgrade
    raise UpgradeError("No metadata table found: not an Indy wallet database")
acapy_wallet_upgrade.error.UpgradeError: No metadata table found: not an Indy wallet database

Not going to restart this test, we know it would take somewhere over 45 hour to complete.

@WadeBarnes
Copy link
Member Author

Progress on OrgBook secure storage migration tests:

prod:

  • Failed after 1801m 53.2s (~31 hours)
  • Migrated 10,171,178 item records of 10,172,915. Migration still in progress, it had just switched to the update (post migration) steps yet.
  • Error:
Migrating items... 10171178
Opening wallet with Askar...
Updating keys... 42
Updating master secret(s)...Traceback (most recent call last):
  File "/usr/local/bin/askar-upgrade", line 8, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.10/site-packages/acapy_wallet_upgrade/__main__.py", line 207, in entrypoint
    asyncio.run(main(**vars(args)))
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/acapy_wallet_upgrade/__main__.py", line 202, in main
    await strategy_inst.run()
  File "/usr/local/lib/python3.10/site-packages/acapy_wallet_upgrade/strategies.py", line 562, in run
    await self.convert_items_to_askar(self.conn.uri, self.wallet_key)
  File "/usr/local/lib/python3.10/site-packages/acapy_wallet_upgrade/strategies.py", line 436, in convert_items_to_askar
    await self.update_master_keys(store)
  File "/usr/local/lib/python3.10/site-packages/acapy_wallet_upgrade/strategies.py", line 260, in update_master_keys
    raise Exception("Encountered multiple master secrets")
Exception: Encountered multiple master secrets

So we know we wouldn't be able to upgrade/migrate the production wallet database without further investigation and testing and the process would likely take well over 60 hours to complete. Again, I'm not going to restart this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Assigned
Development

Successfully merging this pull request may close these issues.

2 participants