chore: improve epoch monitoring #3197

darshankabariya · 2024-12-04T20:04:11Z

I discovered a bug while monitoring Grafana metrics in PR #3181 Currently, the waku_rln_proof_remining counter is reset inside the generateProof function. This approach fails in the following scenario:

If a new epoch starts but no proofs are generated (i.e., no messages are sent), the counter doesn't reset. Ideally, the counter should reset immediately when a new epoch begins, regardless of whether any messages are sent.

This PR addresses the issue by implementing an independent epoch monitoring task that ensures the counter is reset as soon as a new epoch starts.

Note: The spike should occur around 23:50, but it happens around 23:53 because the user does not generate any RLN proof ( msg ) within the 3-minute timeframe.

github-actions · 2024-12-04T20:09:07Z

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:3197

Built from 67558ef

Ivansete-status

Thanks for it! I added a couple of minor enhancements:)

waku/waku_rln_relay/rln_relay.nim

gabrielmer

On one hand, I'm not a super big fan of having a procedure running in a loop every 500 ms just to reset one counter. On the other hand, it is indeed a bug and have nothing better to propose right now lol.

So I think it's ok, but it will be cool to eventually design a solution that implies a bit less of brute force. Don't want to overcomplicate things either though.

Thanks so much for catching and fixing this issue!

darshankabariya · 2024-12-09T06:45:58Z

Thanks for it! I added a couple of minor enhancements:)

Hi @Ivansete-status thanks for your review! I've updated the PR based on your suggestions. Could you please check it again?

darshankabariya · 2024-12-09T06:49:22Z

On one hand, I'm not a super big fan of having a procedure running in a loop every 500 ms just to reset one counter. On the other hand, it is indeed a bug and have nothing better to propose right now lol.

So I think it's ok, but it will be cool to eventually design a solution that implies a bit less of brute force. Don't want to overcomplicate things either though.

Thanks so much for catching and fixing this issue!

Hi @gabrielmer , thank you so much for your comment! I found a better approach and have updated the PR. Could you please take a look?

gabrielmer

It looks much better, thanks so much!

Added a couple comments to make sure everything works as intended :)

gabrielmer · 2024-12-09T11:35:57Z

waku/waku_rln_relay/rln_relay.nim

@@ -392,6 +393,31 @@ proc generateRlnValidator*(

  return validator

+proc monitorEpochs(wakuRlnRelay: WakuRLNRelay): Future[void] {.async.} =
+  var nextEpochTime = epochTime()


Doesn't epochTime() return the current UNIX epoch? If so, why does nextEpochTime equal the current time?

gabrielmer · 2024-12-09T11:39:44Z

waku/waku_rln_relay/rln_relay.nim

+      let currentEpoch = wakuRlnRelay.calcEpoch(epochTime())
+
+      if currentEpoch != lastEpoch:
+        nextEpochTime = epochTime() + float64(wakuRlnRelay.rlnEpochSizeSec)


I think that if we use the current time and add the size of the RLN epoch, there will be small inaccuracies that will keep adding up and nextEpochTime will drift away from its intended value

Instead of calculating the current time and adding rlnEpochSizeSec, we should calculate the time at which the current RLN epoch started and add rlnEpochSizeSec to it (or just calculate directly at what exact time the next RLN epoch is supposed to start)

waku/waku_rln_relay/rln_relay.nim

Ivansete-status

Thanks for the enhancements!
Nevertheless, I have the impression that we are over complicating it :)
Let me elaborate a bit more.
We are looking for showing the generated proofs , and the remaining-available proofs within an RLN epoch when the node acts as a RLNaaS.

In this case, I think we have the epoch time window very well defined by the rlnEpochSizeSec config param (10min in the case of The Waku Network.) Then, I believe we only need a while loop that resets the waku_rln_remaining_proofs_per_epoch once every ``rlnEpochSizeSec` seconds.

Then, I think we only need:

while true:
  ...
  await sleepAsync(rlnEpochSizeSec * 1000)

On the other hand, it is pending to cancelAndWait the epochMonitorFuture from within the stop proc.

Ivansete-status · 2024-12-09T15:51:29Z

waku/waku_rln_relay/rln_relay.nim

+      if firstChangeDetected:
+        await sleepAsync(int((nextEpochTime - epochTime()) * 1000))
+      else:
+        await sleepAsync(500) # 1 second


Suggested change

await sleepAsync(500) # 1 second

await sleepAsync(500)

Ivansete-status · 2024-12-12T17:48:47Z

waku/waku_rln_relay/rln_relay.nim


 proc calcEpoch*(rlnPeer: WakuRLNRelay, t: float64): Epoch =
  ## gets time `t` as `flaot64` with subseconds resolution in the fractional part
  ## and returns its corresponding rln `Epoch` value
  let e = uint64(t / rlnPeer.rlnEpochSizeSec.float64)
  return toEpoch(e)

+proc nextEpochTime*(rlnPeer: WakuRLNRelay, t: float64): float64 =


I think we need to give information about the expected units in the parameters

Suggested change

proc nextEpochTime*(rlnPeer: WakuRLNRelay, t: float64): float64 =

proc nextEpochTime*(rlnPeer: WakuRLNRelay, timeMillis: float64): float64 =

Ivansete-status · 2024-12-12T17:50:17Z

waku/waku_rln_relay/rln_relay.nim

+        wakuRlnRelay.groupManager.userMessageLimit.get().float64
+      )
+      let nextEpochTime = wakuRlnRelay.nextEpochTime(epochTime())
+      await sleepAsync(int((nextEpochTime - epochTime()) * 1000))


The await should better go outside the exception tracking to avoid possible infinite blocking

Ivansete-status · 2024-12-12T17:51:08Z

waku/waku_rln_relay/rln_relay.nim

@@ -392,6 +415,17 @@ proc generateRlnValidator*(

  return validator

+proc monitorEpochs(wakuRlnRelay: WakuRLNRelay): Future[void] {.async.} =


Sorry if already explained somewhere else but, what are we willing to achieve? That sounds overcomplicated :)

gabrielmer · 2024-12-13T14:45:19Z

In this case, I think we have the epoch time window very well defined by the rlnEpochSizeSec config param (10min in the case of The Waku Network.) Then, I believe we only need a while loop that resets the waku_rln_remaining_proofs_per_epoch once every ``rlnEpochSizeSec` seconds.

Then, I think we only need:
while true:
  ...
  await sleepAsync(rlnEpochSizeSec * 1000)
On the other hand, it is pending to cancelAndWait the epochMonitorFuture from within the stop proc.

The issue is that a sleep of 10 minutes does not take exactly 10 minutes, there will be a small error and in every iteration of the loop the error will increase and increase, until at some point it will be completely out of shift.

Maybe it won't be immediately noticeable, but if you have a long-running node it does become an issue I believe

darshankabariya · 2024-12-14T21:41:49Z

The issue is that a sleep of 10 minutes does not take exactly 10 minutes, there will be a small error and in every iteration of the loop the error will increase and increase, until at some point it will be completely out of shift.

Maybe it won't be immediately noticeable, but if you have a long-running node it does become an issue I believe

excatly !

but recently i noticed that epoch time always sync across the waku network and triggers on the 10th minute (e.g., XX:X0). It never starts at random times like 13 or 17 (It means it's not just tied to the node's start time). I think we can leverage this to simplify the logic, but just wanted to confirm if I’m missing something. I observed this after several tests.

i try to raplicate to give a idea.

@gabrielmer @Ivansete-status

darshankabariya · 2024-12-14T23:12:54Z

Hi @gabrielmer @Ivansete-status @NagyZoltanPeter @SionoiS

I recently noticed something interesting during test this PR. we all know, with RLN, we can send up to 100 messages per epoch (10 minutes). Here’s what I observed, and I’m unsure if it’s expected behavior or a bug:

For example, if an epoch starts at 4:00, we can send 100 messages until 4:10. If I try to send more than 100 messages, they’re blocked, which makes sense. But in another scenario, if the epoch starts at 4:00 and I don’t send any messages until 4:06, then send all 100 messages between 4:06 and 4:09, when the new epoch starts at 4:10, I still can’t send any messages and get a ‘message limit exceeded’ error. Interestingly, I’m only able to send messages again at 4:16.

Is this behavior expected, or could it be a bug? If it’s intentional, I’ll need to adjust the logic to handle this scenario.

gabrielmer · 2024-12-16T09:05:23Z

but recently i noticed that epoch time always sync across the waku network and triggers on the 10th minute (e.g., XX:X0). It never starts at random times like 13 or 17 (It means it's not just tied to the node's start time). I think we can leverage this to simplify the logic, but just wanted to confirm if I’m missing something. I observed this after several tests.

Yes! My understanding is that the epoch is something global for the whole network and does not depend on one node's start time.

In this comment I explained how I found it is calculated #3197 (comment)

We can't assume that the epoch is 10 minutes though, I think we have to use the general rlnEpochSizeSec parameter to compute when the next epoch is supposed to start

gabrielmer · 2024-12-16T09:09:51Z

Hi @gabrielmer @Ivansete-status @NagyZoltanPeter @SionoiS

I recently noticed something interesting during test this PR. we all know, with RLN, we can send up to 100 messages per epoch (10 minutes). Here’s what I observed, and I’m unsure if it’s expected behavior or a bug:

For example, if an epoch starts at 4:00, we can send 100 messages until 4:10. If I try to send more than 100 messages, they’re blocked, which makes sense. But in another scenario, if the epoch starts at 4:00 and I don’t send any messages until 4:06, then send all 100 messages between 4:06 and 4:09, when the new epoch starts at 4:10, I still can’t send any messages and get a ‘message limit exceeded’ error. Interestingly, I’m only able to send messages again at 4:16.

Is this behavior expected, or could it be a bug? If it’s intentional, I’ll need to adjust the logic to handle this scenario.

Ah wow that's super interesting. I would think that it is a bug, my understanding is that the limit should be according to the global epoch and not depending on when our node starts sending messages. But I'm not sure though, @alrevuelta should know better

How are you sending the messages? via REST API? If so, I suspect it might be a bug in our REST layer. I might be wrong, but that's the first thing I would check

darshankabariya · 2024-12-16T09:43:38Z

We can't assume that the epoch is 10 minutes though, I think we have to use the general rlnEpochSizeSec parameter to compute when the next epoch is supposed to start

Thanks! I agree, there's a possibility that sync times could change (currently it starts at the 10th, but this could change) or even the rlnEpochSizeSec could change. In this scenario, the idea would fail. On further thought, I believe the current implementation is much more reliable.

darshankabariya · 2024-12-16T09:47:01Z

Ah wow that's super interesting. I would think that it is a bug, my understanding is that the limit should be according to the global epoch and not depending on when our node starts sending messages. But I'm not sure though, @alrevuelta should know better

How are you sending the messages? via REST API? If so, I suspect it might be a bug in our REST layer. I might be wrong, but that's the first thing I would check

I’m using the chat interface to send messages. There is a possibility of an issue with the chat interface, but let’s confirm with @alrevuelta.

github-actions bot assigned darshankabariya Dec 4, 2024

darshankabariya requested review from Ivansete-status, gabrielmer and NagyZoltanPeter and removed request for Ivansete-status December 4, 2024 20:10

Ivansete-status reviewed Dec 5, 2024

View reviewed changes

waku/waku_rln_relay/rln_relay.nim Outdated Show resolved Hide resolved

waku/waku_rln_relay/rln_relay.nim Outdated Show resolved Hide resolved

gabrielmer approved these changes Dec 5, 2024

View reviewed changes

darshankabariya force-pushed the reset_epoch branch from aa23154 to 5becb37 Compare December 9, 2024 06:43

darshankabariya requested review from gabrielmer and Ivansete-status December 9, 2024 06:46

gabrielmer reviewed Dec 9, 2024

View reviewed changes

darshankabariya requested a review from gabrielmer December 9, 2024 13:22

gabrielmer reviewed Dec 9, 2024

View reviewed changes

waku/waku_rln_relay/rln_relay.nim Outdated Show resolved Hide resolved

darshankabariya requested a review from gabrielmer December 9, 2024 17:33

gabrielmer approved these changes Dec 9, 2024

View reviewed changes

darshankabariya requested a review from gabrielmer December 10, 2024 05:52

gabrielmer reviewed Dec 10, 2024

View reviewed changes

waku/waku_rln_relay/rln_relay.nim Outdated Show resolved Hide resolved

darshankabariya closed this Dec 10, 2024

darshankabariya force-pushed the reset_epoch branch from 37b5204 to 8f2bd39 Compare December 10, 2024 12:46

darshankabariya reopened this Dec 10, 2024

darshankabariya requested review from SionoiS and anastasiyaig and removed request for anastasiyaig December 11, 2024 07:26

Ivansete-status reviewed Dec 12, 2024

View reviewed changes

darshankabariya force-pushed the reset_epoch branch from 25c4040 to f2832aa Compare December 12, 2024 21:23

darshankabariya force-pushed the reset_epoch branch from b876745 to 2aaf95e Compare December 17, 2024 07:14

darshankabariya closed this Dec 17, 2024

darshankabariya force-pushed the reset_epoch branch from 2aaf95e to 049fbea Compare December 17, 2024 11:04

chore: everything in one comment

3b374a3

darshankabariya reopened this Dec 17, 2024

chore: lint issue

da597c1

darshankabariya requested a review from Ivansete-status January 1, 2025 07:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: improve epoch monitoring #3197

chore: improve epoch monitoring #3197

darshankabariya commented Dec 4, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024 •

edited

Loading

Ivansete-status left a comment

gabrielmer left a comment

darshankabariya commented Dec 9, 2024

darshankabariya commented Dec 9, 2024

gabrielmer left a comment

gabrielmer Dec 9, 2024

gabrielmer Dec 9, 2024

Ivansete-status left a comment

Ivansete-status Dec 9, 2024

Ivansete-status Dec 12, 2024

Ivansete-status Dec 12, 2024

Ivansete-status Dec 12, 2024

gabrielmer commented Dec 13, 2024

darshankabariya commented Dec 14, 2024 •

edited

Loading

darshankabariya commented Dec 14, 2024 •

edited

Loading

gabrielmer commented Dec 16, 2024

gabrielmer commented Dec 16, 2024

darshankabariya commented Dec 16, 2024

darshankabariya commented Dec 16, 2024

	proc nextEpochTime*(rlnPeer: WakuRLNRelay, t: float64): float64 =
	proc nextEpochTime*(rlnPeer: WakuRLNRelay, timeMillis: float64): float64 =

		@@ -392,6 +415,17 @@ proc generateRlnValidator*(

		return validator

		proc monitorEpochs(wakuRlnRelay: WakuRLNRelay): Future[void] {.async.} =

chore: improve epoch monitoring #3197

Are you sure you want to change the base?

chore: improve epoch monitoring #3197

Conversation

darshankabariya commented Dec 4, 2024 • edited Loading

github-actions bot commented Dec 4, 2024 • edited Loading

Ivansete-status left a comment

Choose a reason for hiding this comment

gabrielmer left a comment

Choose a reason for hiding this comment

darshankabariya commented Dec 9, 2024

darshankabariya commented Dec 9, 2024

gabrielmer left a comment

Choose a reason for hiding this comment

gabrielmer Dec 9, 2024

Choose a reason for hiding this comment

gabrielmer Dec 9, 2024

Choose a reason for hiding this comment

Ivansete-status left a comment

Choose a reason for hiding this comment

Ivansete-status Dec 9, 2024

Choose a reason for hiding this comment

Ivansete-status Dec 12, 2024

Choose a reason for hiding this comment

Ivansete-status Dec 12, 2024

Choose a reason for hiding this comment

Ivansete-status Dec 12, 2024

Choose a reason for hiding this comment

gabrielmer commented Dec 13, 2024

darshankabariya commented Dec 14, 2024 • edited Loading

darshankabariya commented Dec 14, 2024 • edited Loading

gabrielmer commented Dec 16, 2024

gabrielmer commented Dec 16, 2024

darshankabariya commented Dec 16, 2024

darshankabariya commented Dec 16, 2024

darshankabariya commented Dec 4, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024 •

edited

Loading

darshankabariya commented Dec 14, 2024 •

edited

Loading

darshankabariya commented Dec 14, 2024 •

edited

Loading