Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: added size based retention policy #2098

Merged
merged 8 commits into from
Oct 10, 2023

Conversation

ABresting
Copy link
Contributor

@ABresting ABresting commented Oct 2, 2023

Description

Size retention Policy has been added. Based on the size limit provided in MB or GB. Once the size limit overflows by messages, then the 20% of the outdated messages are dropped and a vacuum of the database for defragmentation is triggered.

Changes

  • Size-related retention policy.
  • Vacuum process after the deletion of overflowed database entries in size-based retention policy.
  • Vacuum process in capacity-based retention policy.

Issue

#1885

@github-actions
Copy link

github-actions bot commented Oct 2, 2023

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2098

Built from 711f77d

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that PR! It is awesome!
I'm just adding a few comments that I hope you find useful.
Especially interesting to me is to only run the retention policy once, and make the limit validation before and after the retention policy execution.


## Then
# calculate the current database size
let pageSize = (waitFor driver.getPagesSize()).tryGet()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is great!
However, shall we apply this check right before the retention policy is being applied?
I'd like to see the test checking that the current db size is beyond the limit, and once we execute the retention policy, then effectively check that the size got lower the limit.

The retention policy can only be executed once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point Ivan,
Changes made to compare the current db size with the size limit, retention policy enabled only in case of overflow.
After the retention policy is executed, again comparison is made between the current db size and the size limit.

try:
sizeQuantity = parseFloat(sizeQuantityStr)
except ValueError:
return err("invalid size retention policy argument")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great error management!
It is advisable though to also append the current exception message.
That can be achieved with the next technique:

Suggested change
return err("invalid size retention policy argument")
return err("invalid size retention policy argument: " & getCurrentExceptionMsg())

That can be applied elsewhere.

Comment on lines 47 to 51
var pageCountRes = await driver.getPagesCount()
if pageCountRes.isErr():
return err("failed to get Pages count: " & pageCountRes.error)

var pageCount: int64 = pageCountRes.value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great indeed!
However, we can make it a bit shorter with:

Suggested change
var pageCountRes = await driver.getPagesCount()
if pageCountRes.isErr():
return err("failed to get Pages count: " & pageCountRes.error)
var pageCount: int64 = pageCountRes.value
let pageCount = await driver.getPagesCount().isOkOr:
return err("failed to get Pages count: " & $error)

This technique is preferable whenever possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the getPagesCount method is of future return type .sOkOr is not possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed to add the parenthesis in my suggestion.
It compiles with the next:

  let pageCount = (await driver.getPagesCount()).valueOr:
    return err("failed to get Pages count: " & $error)

Comment on lines 68 to 71
var numMessagesRes = await driver.getMessagesCount()
if numMessagesRes.isErr():
return err("failed to get messages count: " & numMessagesRes.error)
var numMessages = numMessagesRes.value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var numMessagesRes = await driver.getMessagesCount()
if numMessagesRes.isErr():
return err("failed to get messages count: " & numMessagesRes.error)
var numMessages = numMessagesRes.value
let numMessages = await driver.getMessagesCount().isOkOr:
return err("failed to get messages count: " & $error)

Notice that I suggest to also use a let type because we don't need a mutable var at this point.

Comment on lines 82 to 84
let resVaccum = await driver.performsVacuum()
if resVaccum.isErr():
return err("vacuumming failed: " & resVaccum.error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let resVaccum = await driver.performsVacuum()
if resVaccum.isErr():
return err("vacuumming failed: " & resVaccum.error)
await driver.performsVacuum().isOkOr:
return err("vacuumming failed: " & $error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before, future return type.

tests/waku_archive/test_retention_policy.nim Outdated Show resolved Hide resolved
Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Approving from approach POV (will let Ivan take a look at the detail of implementation). Some minor comments on code.

Importantly, though, this doesn't fix #1885, as it still needs to be integrated into the wakunode2 app configuration. This should be done in a separate PR before that issue can be closed (your comment in the issue description will auto close the issue once this is merged).

method getPagesSize*(driver: ArchiveDriver):
Future[ArchiveDriverResult[int64]] {.base, async.} = discard

method performsVacuum*(driver: ArchiveDriver):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
method performsVacuum*(driver: ArchiveDriver):
method performVacuum*(driver: ArchiveDriver):

Nitpick. :) (function names should be verbs in imperative form, i.e. without the s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!


# get page size of database
let pageSizeRes = await driver.getPagesSize()
var pageSize: int64 = int64(pageSizeRes.valueOr(0) div 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var pageSize: int64 = int64(pageSizeRes.valueOr(0) div 1024)
let pageSize: int64 = int64(pageSizeRes.valueOr(0) div 1024)

Unless you have a good reason to modify a variable later on, it should be final assigned with let. A good rule of thumb is to always use let, unless you are forced to create a variable that needs to be modifiable.

return err("failed to get Page size: " & pageSizeRes.error)

# database size in megabytes (Mb)
var totalSizeOfDB: float = float(pageSize * pageCount)/1024.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var totalSizeOfDB: float = float(pageSize * pageCount)/1024.0
let totalSizeOfDB: float = float(pageSize * pageCount)/1024.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and elsewhere)

Comment on lines 86 to 87
if sizeDB >= sizeLimit:
require (waitFor retentionPolicy.execute(driver)).isOk()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great indeed! However, would it be possible to enforce a require instead?

Suggested change
if sizeDB >= sizeLimit:
require (waitFor retentionPolicy.execute(driver)).isOk()
require sizeDB >= sizeLimit
require (waitFor retentionPolicy.execute(driver)).isOk()

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for it and for the patience! 💯

I just added a comment to add a further protection but for a separate PR.

@apentori - this feature is very interesting to be applied in the current wakuv2.test and wakuv2.prod fleets. wakuv2.test will receive that once this PR is merged, and wakuv2.prod will have this feature in the next release (I think during next week.)

return ok()

# keep deleting until the current db size falls within size limit
while totalSizeOfDB > p.sizeLimit:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks great to me! However, given the possible issue of stepping into an infinite loop, I think we should have a PR in the near future where we prevent that possible blocking issue. For example, if the measured totalSizeOfDB doesn't change in two consecutive iterations, then we should break the loop by returning the appropriate error.

Copy link
Contributor Author

@ABresting ABresting Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I do not like loops in the code, what I can imagine we can do is, since we have a max size/threshold of the database, we can have a lower threshold of the database as well, then inside the loop, we introduce a notation, i.e. max number of times the loop should run so that the size of the database will reduce to the lowest,

$$ n > \frac{\log(\text{lowerThresholdInMbs}/\text{maxDbSizeOrCurrentDbSize})}{\log(\text{reducedToUponEachIteration})} $$

Here:
lowerThresholdInMbs is the lowest database size possible
maxDbSizeOrCurrentDbSize is the maximum database size possible
reducedToUponEachIteration is the fraction of size/pages the database reduced to upon each iteration

For eg., in the case of 30720 MBs(30 GB) as max size, 0.1 MBs as a lower threshold, and 0.80 (80% of the db pages retained) as a fraction of the database retained upon each iteration. We get 57 iterations. SO if we use this then there is a guarantee that the loop will not cross 57 times at max.

This way there is a good chance that the forever loop is not encountered.
Post this we can affirm a warning/error if the database size is still not less than the maximum permitted size.

WDYT? @Ivansete-status @alrevuelta

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote to follow the simplest approach. However it is fine for this PR for now and we can revisit that in a separate PR in the near future.

Copy link
Contributor

@alrevuelta alrevuelta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment

if totalSizeOfDB < p.sizeLimit:
return ok()

# keep deleting until the current db size falls within size limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in this function a pattern that is usually solved with a do while. A "do while" can remove aroud 25 lines of code, since totalSizeOfDB, pageSize and pageCount are a bit boilerplate.

Since nim doesnt have "do while" I would suggest:

while true:
    ...(get page get count)
    var totalSizeOfDB: float = float(pageSize * pageCount)/1024.0
    if totalSizeOfDB > p.sizeLimit:
      break
    let res = await driver.deleteOldestMessagesNotWithinLimit(limit=pageDeleteWindow)
    if res.isErr():
        return err("deleting oldest messages failed: " & res.error)

So with something like this you just need 1 single call to totalSizeOfDB, pageSize and pageCount

@ABresting ABresting requested a review from alrevuelta October 9, 2023 16:13
Copy link
Contributor

@alrevuelta alrevuelta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ABresting ABresting force-pushed the chore-add-size-retention-policy branch from 7a4ebbd to 3a71923 Compare October 10, 2023 09:04
@github-actions
Copy link

This PR may contain changes to configuration options of one of the apps.

If you are introducing a breaking change (i.e. the set of options in latest release would no longer be applicable) make sure the original option is preserved with a deprecation note for 2 following releases before it is actually removed.

Please also make sure the label release-notes is added to make sure any changes to the user interface are properly announced in changelog and release notes.

@ABresting ABresting merged commit 25d6e52 into master Oct 10, 2023
9 of 10 checks passed
@ABresting ABresting deleted the chore-add-size-retention-policy branch October 10, 2023 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants