Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider bulk_update_mappings or similar for delivery receipts #1494

Open
terrazoon opened this issue Dec 20, 2024 · 2 comments
Open

Consider bulk_update_mappings or similar for delivery receipts #1494

terrazoon opened this issue Dec 20, 2024 · 2 comments
Assignees

Comments

@terrazoon
Copy link
Contributor

terrazoon commented Dec 20, 2024

In notifications_dao:dao_update_notifications_message_id() we are going to be doing batch updates in the short term to get an immediate performance improvement.

In the long term we would like to do something even better, something like db.session.bulk_update_mappings(receipts) or a more modern alternative.

The problem is, the delivery receipts do not contain the ids for the notifications, and id is the primary key for that table, and you need the primary key to do the super fast bulk updates.

So we have to map message ids to notification ids somehow, and get them into the delivery receipts, without giving up all the performance gains we would get from moving to bulk updates.

One idea might be to put message id, notification id key pairs into redis in a big list.

@terrazoon
Copy link
Contributor Author

Okay, so assuming 8 million messages a day as a reasonable upside limit, we would need to store 32 million key value pairs in REDIS (notification_id, message_id) and this would require about 5 gb of RAM. That seems like a lot given our current Redis and cloud.gov contract, etc. I know at some point we'll scale up the contract, but it seems like Redis has finite limits anyway?

@ccostino any insight here? If we are going to need 5 gb of RAM in redis for this approach down the line (maybe in 2028), is that a show stopper?

I feel like running an extra query to get the notification_ids kills most of the advantages of doing bulk_update_mappings so if we can't use Redis, it's not clear that it's worth the effort.

Assuming that our batch update of 1000 rows takes 200 milliseconds (big assumption, we should add some logging), then we should be able to process 150,000 notifications a minute (leaving half of the time for other queries). That is way beyond any near term estimates for usage. I'll add some debug messages so we can get a feel for how performant this query is.

@terrazoon terrazoon self-assigned this Jan 2, 2025
@terrazoon terrazoon moved this from ⬇ Up-Next to 🏗 In progress (WIP: ≤ 3 per person) in Notify.gov product board Jan 2, 2025
@ccostino
Copy link
Contributor

ccostino commented Jan 6, 2025

🤔 if we need to run an extra query and/or need access to all of this extra memory, that could be a challenge. Actually the memory alone will be an issue for now.

Let's see what the debug and logging messages show us and what kind of times we're actually dealing with and we can figure out if this is worth shifting toward or not. Thanks!

@terrazoon terrazoon moved this from 🏗 In progress (WIP: ≤ 3 per person) to 👀 In review in Notify.gov product board Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 👀 In review
Development

No branches or pull requests

2 participants