Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat_: hash based query for outgoing messages. #5217

Merged
merged 19 commits into from
Jun 11, 2024
Merged

Conversation

kaichaosun
Copy link
Contributor

@kaichaosun kaichaosun commented May 23, 2024

For outgoing messages, only mark it as it sent after successfully found in store node, the messages are not found will be marked as expired and resend.

Important changes:

  • find outgoing messages which were sent 5s ago, query the store node with the hashes of the messages
  • message found will trigger EventEnvelopeSent, messages missed will trigger EventEnvelopeExpired

Relates to #5234

@status-im-auto
Copy link
Member

status-im-auto commented May 23, 2024

Jenkins Builds

Click to see older builds (106)
Commit #️⃣ Finished (UTC) Duration Platform Result
✖️ a179d51 #1 2024-05-23 10:47:00 ~1 min tests 📄log
✔️ a179d51 #1 2024-05-23 10:50:00 ~4 min linux 📦zip
✔️ a179d51 #1 2024-05-23 10:50:53 ~4 min ios 📦zip
✔️ a179d51 #1 2024-05-23 10:51:54 ~6 min android 📦aar
✖️ 27167d2 #2 2024-05-23 11:30:32 ~1 min tests 📄log
✔️ 27167d2 #2 2024-05-23 11:31:59 ~2 min android 📦aar
✔️ 27167d2 #2 2024-05-23 11:32:47 ~3 min ios 📦zip
✔️ 27167d2 #2 2024-05-23 11:33:31 ~4 min linux 📦zip
✖️ e719178 #3 2024-05-24 00:50:17 ~1 min tests 📄log
✔️ e719178 #3 2024-05-24 00:52:51 ~3 min ios 📦zip
✔️ e719178 #3 2024-05-24 00:53:11 ~4 min linux 📦zip
✔️ e719178 #3 2024-05-24 00:55:07 ~5 min android 📦aar
✖️ 1d2a81a #4 2024-05-24 03:33:26 ~1 min tests 📄log
✔️ 1d2a81a #4 2024-05-24 03:34:12 ~1 min android 📦aar
✔️ 1d2a81a #4 2024-05-24 03:34:40 ~2 min linux 📦zip
✔️ 1d2a81a #4 2024-05-24 03:35:48 ~3 min ios 📦zip
✖️ 79399ce #5 2024-05-24 04:20:05 ~1 min tests 📄log
✖️ 79399ce #6 2024-05-24 04:22:37 ~1 min tests 📄log
✔️ 79399ce #5 2024-05-24 04:21:03 ~2 min linux 📦zip
✔️ 79399ce #5 2024-05-24 04:21:16 ~2 min android 📦aar
✔️ 79399ce #5 2024-05-24 04:21:45 ~3 min ios 📦zip
✔️ 6b56b45 #6 2024-05-24 04:34:39 ~2 min android 📦aar
✔️ 6b56b45 #6 2024-05-24 04:34:47 ~2 min linux 📦zip
✔️ 6b56b45 #6 2024-05-24 04:35:35 ~3 min ios 📦zip
✖️ 6b56b45 #7 2024-05-24 04:39:29 ~6 min tests 📄log
✖️ 6b56b45 #8 2024-05-24 07:27:49 ~5 min tests 📄log
✖️ 6b56b45 #9 2024-05-27 03:21:45 ~7 min tests 📄log
✔️ d305d7f #7 2024-05-29 08:39:34 ~3 min ios 📦zip
✔️ d305d7f #7 2024-05-29 08:39:48 ~4 min linux 📦zip
✔️ d305d7f #7 2024-05-29 08:40:47 ~5 min android 📦aar
✖️ d305d7f #10 2024-05-29 08:42:29 ~6 min tests 📄log
✔️ 9b924a4 #8 2024-05-31 06:34:09 ~4 min linux 📦zip
✔️ 9b924a4 #8 2024-05-31 06:34:59 ~5 min ios 📦zip
✔️ 9b924a4 #8 2024-05-31 06:36:16 ~6 min android 📦aar
✖️ 9b924a4 #11 2024-05-31 06:36:46 ~6 min tests 📄log
✔️ 3cdc7b8 #9 2024-05-31 08:26:38 ~3 min ios 📦zip
✔️ 3cdc7b8 #9 2024-05-31 08:27:19 ~4 min linux 📦zip
✔️ 3cdc7b8 #9 2024-05-31 08:28:56 ~5 min android 📦aar
✖️ 3cdc7b8 #12 2024-05-31 08:29:31 ~6 min tests 📄log
✔️ 86a0f31 #10 2024-06-03 03:46:46 ~2 min linux 📦zip
✔️ 86a0f31 #10 2024-06-03 03:46:59 ~2 min android 📦aar
✔️ 86a0f31 #10 2024-06-03 03:47:43 ~3 min ios 📦zip
✖️ 86a0f31 #13 2024-06-03 03:49:22 ~5 min tests 📄log
✔️ d78064f #11 2024-06-03 08:03:11 ~2 min linux 📦zip
✔️ d78064f #11 2024-06-03 08:04:08 ~3 min android 📦aar
✔️ d78064f #11 2024-06-03 08:04:20 ~3 min ios 📦zip
✖️ d78064f #14 2024-06-03 08:07:27 ~6 min tests 📄log
✔️ d1a2e5f #12 2024-06-04 07:06:03 ~2 min linux 📦zip
✔️ d1a2e5f #12 2024-06-04 07:06:07 ~2 min android 📦aar
✔️ d1a2e5f #12 2024-06-04 07:07:01 ~3 min ios 📦zip
✖️ d1a2e5f #15 2024-06-04 07:09:01 ~5 min tests 📄log
✖️ d1a2e5f #16 2024-06-04 07:34:56 ~4 min tests 📄log
✔️ fab0642 #13 2024-06-04 11:04:34 ~2 min linux 📦zip
✔️ fab0642 #13 2024-06-04 11:05:58 ~4 min ios 📦zip
✔️ fab0642 #13 2024-06-04 11:07:29 ~5 min android 📦aar
✖️ fab0642 #17 2024-06-04 11:10:21 ~8 min tests 📄log
✖️ fab0642 #18 2024-06-04 11:20:13 ~6 min tests 📄log
✔️ 6931b53 #14 2024-06-05 03:18:28 ~2 min linux 📦zip
✔️ 6931b53 #14 2024-06-05 03:18:41 ~2 min android 📦aar
✔️ 6931b53 #14 2024-06-05 03:19:40 ~3 min ios 📦zip
✖️ 6931b53 #19 2024-06-05 03:47:36 ~31 min tests 📄log
✖️ 6931b53 #20 2024-06-05 05:53:52 ~30 min tests 📄log
✔️ bef78cd #15 2024-06-05 06:03:10 ~2 min linux 📦zip
✔️ bef78cd #15 2024-06-05 06:03:25 ~2 min android 📦aar
✔️ bef78cd #15 2024-06-05 06:04:28 ~3 min ios 📦zip
✖️ bef78cd #21 2024-06-05 06:34:33 ~33 min tests 📄log
✔️ dc29345 #16 2024-06-05 08:20:57 ~2 min linux 📦zip
✔️ dc29345 #16 2024-06-05 08:21:04 ~2 min android 📦aar
✔️ dc29345 #16 2024-06-05 08:21:39 ~3 min ios 📦zip
✔️ dc29345 #22 2024-06-05 08:58:15 ~40 min tests 📄log
✔️ 56e61b7 #17 2024-06-05 10:22:15 ~2 min android 📦aar
✔️ 56e61b7 #17 2024-06-05 10:23:06 ~3 min linux 📦zip
✔️ 56e61b7 #17 2024-06-05 10:23:27 ~3 min ios 📦zip
✔️ 56e61b7 #23 2024-06-05 11:00:18 ~40 min tests 📄log
✖️ 969890d #24 2024-06-07 00:50:55 ~1 min tests 📄log
✔️ 969890d #18 2024-06-07 00:52:04 ~2 min android 📦aar
✔️ 969890d #18 2024-06-07 00:52:38 ~3 min ios 📦zip
✔️ 969890d #18 2024-06-07 00:54:49 ~5 min linux 📦zip
✔️ 634b97e #19 2024-06-07 01:02:03 ~2 min ios 📦zip
✔️ 634b97e #19 2024-06-07 01:02:11 ~2 min linux 📦zip
✔️ 634b97e #19 2024-06-07 01:02:16 ~2 min android 📦aar
✔️ 634b97e #25 2024-06-07 01:40:17 ~40 min tests 📄log
✔️ 8b4aa3a #20 2024-06-07 13:13:36 ~2 min linux 📦zip
✔️ 8b4aa3a #20 2024-06-07 13:13:55 ~2 min android 📦aar
✔️ 8b4aa3a #20 2024-06-07 13:14:13 ~3 min ios 📦zip
✔️ eef98ef #21 2024-06-07 13:16:12 ~2 min linux 📦zip
✔️ eef98ef #21 2024-06-07 13:16:31 ~2 min android 📦aar
✔️ eef98ef #21 2024-06-07 13:17:41 ~3 min ios 📦zip
✔️ 8b4aa3a #26 2024-06-07 13:51:43 ~40 min tests 📄log
✖️ eef98ef #27 2024-06-07 13:57:34 ~5 min tests 📄log
✔️ eef98ef #28 2024-06-09 08:50:08 ~40 min tests 📄log
✔️ 741250a #22 2024-06-11 00:35:35 ~2 min android 📦aar
✔️ 741250a #22 2024-06-11 00:35:50 ~2 min linux 📦zip
✔️ 741250a #22 2024-06-11 00:36:14 ~3 min ios 📦zip
✔️ 741250a #29 2024-06-11 01:13:48 ~40 min tests 📄log
✔️ 56d69e3 #23 2024-06-11 04:32:25 ~2 min android 📦aar
✔️ 56d69e3 #23 2024-06-11 04:32:59 ~3 min ios 📦zip
✔️ 56d69e3 #23 2024-06-11 04:33:21 ~3 min linux 📦zip
✖️ 56d69e3 #30 2024-06-11 04:35:18 ~5 min tests 📄log
✖️ 56d69e3 #31 2024-06-11 05:44:30 ~5 min tests 📄log
✔️ 17dafb5 #24 2024-06-11 06:01:35 ~2 min linux 📦zip
✔️ 17dafb5 #24 2024-06-11 06:01:52 ~2 min android 📦aar
✔️ 17dafb5 #24 2024-06-11 06:02:29 ~3 min ios 📦zip
✔️ 61e2d71 #25 2024-06-11 06:17:50 ~2 min linux 📦zip
✔️ 61e2d71 #25 2024-06-11 06:18:32 ~2 min android 📦aar
✔️ 61e2d71 #25 2024-06-11 06:18:41 ~3 min ios 📦zip
Commit #️⃣ Finished (UTC) Duration Platform Result
✖️ 17dafb5 #32 2024-06-11 06:33:07 ~34 min tests 📄log
✔️ 61e2d71 #33 2024-06-11 07:14:14 ~40 min tests 📄log

@kaichaosun kaichaosun force-pushed the message-hash-query branch from 27167d2 to e719178 Compare May 24, 2024 00:48
@kaichaosun kaichaosun changed the title feat: hash based query for outgoing messages. feat_: hash based query for outgoing messages. May 24, 2024
@kaichaosun kaichaosun force-pushed the message-hash-query branch 2 times, most recently from 79399ce to 6b56b45 Compare May 24, 2024 04:32
wakuv2/waku.go Outdated Show resolved Hide resolved
wakuv2/waku.go Outdated Show resolved Hide resolved
messageHashes[i] = pb.ToMessageHash(hash.Bytes())
}

result, err := w.node.Store().QueryByHash(ctx, messageHashes, opts...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider store Query limits of how many hashes can be queried at once and probably batch these requests in parallel to multiple store nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, added limit, batch seems overkill since there is not many messages in a few seconds.

wakuv2/waku.go Show resolved Hide resolved
wakuv2/waku.go Show resolved Hide resolved
wakuv2/waku.go Outdated Show resolved Hide resolved
wakuv2/waku.go Outdated Show resolved Hide resolved
@kaichaosun kaichaosun force-pushed the message-hash-query branch 2 times, most recently from d305d7f to 9b924a4 Compare May 31, 2024 06:29
wakuv2/waku.go Outdated Show resolved Hide resolved
pubsubMessageIds := make([][]gethcommon.Hash, 0, len(w.sendMsgIDs))
for pubsubTopic, subMsgs := range w.sendMsgIDs {
var queryMsgIds []gethcommon.Hash
for msgID, sendTime := range subMsgs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make sure, here you are taking max 20 random messages from the one sent to check on store? maybe we should use a sorted map (or any sorted struct)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each pubsub topic, it will choose 20 random outgoing messages for the check on store.
After a quick search, there is no sorted map built in Go, adding extra logic for this feature seems overkill consider the frequency of outgoing messages. @cammellos

wakuv2/waku.go Outdated Show resolved Hide resolved
wakuv2/waku.go Outdated Show resolved Hide resolved
wakuv2/waku.go Outdated Show resolved Hide resolved
wakuv2/waku.go Outdated Show resolved Hide resolved
Copy link
Contributor

@qfrank qfrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. FYI, just saw sendDataSync invoked transport.TrackMany , so the packet sent via MVDS should also be monitored, it will retry with handleEnvelopeFailure until it hit maxAttempts. It would be cool if you can add a test @kaichaosun

@kaichaosun kaichaosun force-pushed the message-hash-query branch from d78064f to d1a2e5f Compare June 4, 2024 07:03
@kaichaosun kaichaosun marked this pull request as ready for review June 5, 2024 09:51
@chaitanyaprem
Copy link
Contributor

Wondering if there is an overlap between this and https://github.com/status-im/status-go/pull/5281/files.

Copy link
Contributor

@chaitanyaprem chaitanyaprem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM or than few minor comments.

Would love to see the effectiveness and how much additional bandwidth is consumed during dogfooding.

protocol/messenger_peersyncing.go Outdated Show resolved Hide resolved
wakuv2/waku.go Outdated Show resolved Hide resolved
@kaichaosun
Copy link
Contributor Author

This PR is for outgoing messages, #5281 is for incoming messages if I'm not mistaken. @chaitanyaprem

Copy link
Contributor

@qfrank qfrank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e if a message is acknowledged e2e, it should be removed
if the message is acknowledged by the other peer, we should stop resending, we do that for datasync messages already, but we want to do the same in kaichao's PR

It seems we haven't reached this goal with this PR yet? watchExpiredMessages will check the raw_messages table every second and there's still a chance it will resend the sent message? @kaichaosun

@kaichaosun
Copy link
Contributor Author

@qfrank for messages acked through mvds, it's deleted in the query queue, so won't be marked as expired. Not sure if there's other ack or not other than mvds.

wakuv2/waku.go Outdated Show resolved Hide resolved
@kaichaosun
Copy link
Contributor Author

@qfrank mentioned there is an edge case that raw message resend could be triggered just before the message marked as sent, likely happen in a few milliseconds. This is possible because the coordination depends on the database table raw_message, it can be mitigated by watching message sent event within resend raw message (watchExpiredMessages method), it seems not necessary for this kind of fail over logic IMO. Appreciate if there are more inputs or ideas.
cc @cammellos

@qfrank
Copy link
Contributor

qfrank commented Jun 11, 2024

@qfrank mentioned there is an edge case that raw message resend could be triggered just before the message marked as sent, likely happen in a few milliseconds. This is possible because the coordination depends on the database table raw_message, it can be mitigated by watching message sent event within resend raw message (watchExpiredMessages method), it seems not necessary for this kind of fail over logic IMO. Appreciate if there are more inputs or ideas. cc @cammellos

Hi @kaichaosun , just had a DM with @cammellos , we can deal with it at a later time, worst case we send a message twice, but we won't process the same message twice on the receiver side according to this, so just wasteful. Thank you for your PR!

@kaichaosun kaichaosun merged commit 47899fd into develop Jun 11, 2024
10 checks passed
@kaichaosun kaichaosun deleted the message-hash-query branch June 11, 2024 07:45
@cammellos
Copy link
Contributor

@kaichaosun has this been tested in the clients? I think at least running e2e on mobile should have been done before merging it, unless the feature is disabled, but I don't see any flag

@kaichaosun
Copy link
Contributor Author

I have tested it with
DM between status-desktop <-> status-desktop
DM between status-mobile <-> status-desktop,
this is the downstream PRs for testing, status-im/status-desktop#15130, status-im/status-mobile#20387.

Should we halt the changes for more QAs? @cammellos

@cammellos
Copy link
Contributor

@kaichaosun it's probably ok, maybe next time ping QA so they can run e2e tests on the build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants