Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: added size based retention policy #2098

Merged
merged 8 commits into from
Oct 10, 2023
3 changes: 2 additions & 1 deletion docs/operators/how-to/configure-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,10 @@ If the waku store node is enabled (the `--store` option is set to `true`) the no

There is a set of configuration options to customize the waku store protocol's message store. These are the most relevant:

* `--store-message-retention-policy`: This option controls the retention policy i.e., how long certain messages will be persisted. Two different retention policies are supported:
* `--store-message-retention-policy`: This option controls the retention policy i.e., how long certain messages will be persisted. Three different retention policies are supported:
+ The time retention policy,`time:<duration-in-seconds>` (e.g., `time:14400`)
+ The capacity retention policy,`capacity:<messages-count>` (e.g, `capacity:25000`)
+ The size retention policy,`size:<size-in-gb-mb>` (e.g, `size:25Gb`)
+ To disable the retention policy, explicitly, set this option to to `""`, an empty string.
* `--store-message-db-url`: The message store database url option controls the message storage engine. This option follows the [_SQLAlchemy_ database URL format](https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls).

Expand Down
74 changes: 69 additions & 5 deletions tests/waku_archive/test_retention_policy.nim
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{.used.}

import
std/sequtils,
std/[sequtils,times],
stew/results,
testutils/unittests,
chronos
Expand All @@ -12,6 +12,7 @@ import
../../../waku/waku_archive/driver/sqlite_driver,
../../../waku/waku_archive/retention_policy,
../../../waku/waku_archive/retention_policy/retention_policy_capacity,
../../../waku/waku_archive/retention_policy/retention_policy_size,
../testlib/common,
../testlib/wakucore

Expand All @@ -30,26 +31,88 @@ suite "Waku Archive - Retention policy":
## Given
let
capacity = 100
excess = 65
excess = 60

let driver = newTestArchiveDriver()

let retentionPolicy: RetentionPolicy = CapacityRetentionPolicy.init(capacity=capacity)
var putFutures = newSeq[Future[ArchiveDriverResult[void]]]()

## When
for i in 1..capacity+excess:
let msg = fakeWakuMessage(payload= @[byte i], contentTopic=DefaultContentTopic, ts=Timestamp(i))
putFutures.add(driver.put(DefaultPubsubTopic, msg, computeDigest(msg), msg.timestamp))

discard waitFor allFinished(putFutures)

require (waitFor driver.put(DefaultPubsubTopic, msg, computeDigest(msg), msg.timestamp)).isOk()
require (waitFor retentionPolicy.execute(driver)).isOk()
require (waitFor retentionPolicy.execute(driver)).isOk()

## Then
let numMessages = (waitFor driver.getMessagesCount()).tryGet()
check:
# Expected number of messages is 120 because
# (capacity = 100) + (half of the overflow window = 15) + (5 messages added after after the last delete)
# the window size changes when changing `const maxStoreOverflow = 1.3 in sqlite_store
numMessages == 120
numMessages == 115

## Cleanup
(waitFor driver.close()).expect("driver to close")

test "size retention policy - windowed message deletion":
## Given
let
# in megabytes
sizeLimit:float = 0.05
excess = 325

let driver = newTestArchiveDriver()

let retentionPolicy: RetentionPolicy = SizeRetentionPolicy.init(size=sizeLimit)
var putFutures = newSeq[Future[ArchiveDriverResult[void]]]()

# variables to check the db size
var pageSize = (waitFor driver.getPagesSize()).tryGet()
var pageCount = (waitFor driver.getPagesCount()).tryGet()
var sizeDB = float(pageCount * pageSize) / (1024.0 * 1024.0)

# make sure that the db is empty to before test begins
let storedMsg = (waitFor driver.getAllMessages()).tryGet()
# if there are messages in db, empty them
if storedMsg.len > 0:
let now = getNanosecondTime(getTime().toUnixFloat())
require (waitFor driver.deleteMessagesOlderThanTimestamp(ts=now)).isOk()
require (waitFor driver.performVacuum()).isOk()

## When

# create a number of messages so that the size of the DB overshoots
for i in 1..excess:
let msg = fakeWakuMessage(payload= @[byte i], contentTopic=DefaultContentTopic, ts=Timestamp(i))
putFutures.add(driver.put(DefaultPubsubTopic, msg, computeDigest(msg), msg.timestamp))

# waitFor is used to synchronously wait for the futures to complete.
discard waitFor allFinished(putFutures)

## Then
# calculate the current database size
pageSize = (waitFor driver.getPagesSize()).tryGet()
pageCount = (waitFor driver.getPagesCount()).tryGet()
sizeDB = float(pageCount * pageSize) / (1024.0 * 1024.0)

# execute policy provided the current db size oveflows
require (sizeDB >= sizeLimit)
require (waitFor retentionPolicy.execute(driver)).isOk()

# update the current db size
pageSize = (waitFor driver.getPagesSize()).tryGet()
pageCount = (waitFor driver.getPagesCount()).tryGet()
sizeDB = float(pageCount * pageSize) / (1024.0 * 1024.0)

check:
# size of the database is used to check if the storage limit has been preserved
# check the current database size with the limitSize provided by the user
# it should be lower
sizeDB <= sizeLimit

## Cleanup
(waitFor driver.close()).expect("driver to close")
Expand Down Expand Up @@ -90,3 +153,4 @@ suite "Waku Archive - Retention policy":

## Cleanup
(waitFor driver.close()).expect("driver to close")

3 changes: 2 additions & 1 deletion waku/common/databases/db_sqlite.nim
Original file line number Diff line number Diff line change
Expand Up @@ -484,4 +484,5 @@ proc performSqliteVacuum*(db: SqliteDatabase): DatabaseResult[void] =
if resVacuum.isErr():
return err("failed to execute vacuum: " & resVacuum.error)

debug "finished sqlite database vacuuming"
debug "finished sqlite database vacuuming"
ok()
10 changes: 10 additions & 0 deletions waku/waku_archive/driver.nim
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,15 @@ method getMessages*(driver: ArchiveDriver,
method getMessagesCount*(driver: ArchiveDriver):
Future[ArchiveDriverResult[int64]] {.base, async.} = discard

method getPagesCount*(driver: ArchiveDriver):
Future[ArchiveDriverResult[int64]] {.base, async.} = discard

method getPagesSize*(driver: ArchiveDriver):
Future[ArchiveDriverResult[int64]] {.base, async.} = discard

method performVacuum*(driver: ArchiveDriver):
Future[ArchiveDriverResult[void]] {.base, async.} = discard

method getOldestMessageTimestamp*(driver: ArchiveDriver):
Future[ArchiveDriverResult[Timestamp]] {.base, async.} = discard

Expand All @@ -61,3 +70,4 @@ method deleteOldestMessagesNotWithinLimit*(driver: ArchiveDriver,

method close*(driver: ArchiveDriver):
Future[ArchiveDriverResult[void]] {.base, async.} = discard

1 change: 1 addition & 0 deletions waku/waku_archive/driver/builder.nim
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,4 @@ proc new*(T: type ArchiveDriver,
debug "setting up in-memory waku archive driver"
let driver = QueueDriver.new() # Defaults to a capacity of 25.000 messages
return ok(driver)

14 changes: 13 additions & 1 deletion waku/waku_archive/driver/queue_driver/queue_driver.nim
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,18 @@ method getMessagesCount*(driver: QueueDriver):
Future[ArchiveDriverResult[int64]] {.async} =
return ok(int64(driver.len()))

method getPagesCount*(driver: QueueDriver):
Future[ArchiveDriverResult[int64]] {.async} =
return ok(int64(driver.len()))

method getPagesSize*(driver: QueueDriver):
Future[ArchiveDriverResult[int64]] {.async} =
return ok(int64(driver.len()))

method performVacuum*(driver: QueueDriver):
Future[ArchiveDriverResult[void]] {.async.} =
return err("interface method not implemented")

method getOldestMessageTimestamp*(driver: QueueDriver):
Future[ArchiveDriverResult[Timestamp]] {.async.} =
return driver.first().map(proc(msg: IndexedWakuMessage): Timestamp = msg.index.receiverTime)
Expand All @@ -302,4 +314,4 @@ method deleteOldestMessagesNotWithinLimit*(driver: QueueDriver,

method close*(driver: QueueDriver):
Future[ArchiveDriverResult[void]] {.async.} =
return ok()
return ok()
13 changes: 13 additions & 0 deletions waku/waku_archive/driver/sqlite_driver/sqlite_driver.nim
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,18 @@ method getMessagesCount*(s: SqliteDriver):
Future[ArchiveDriverResult[int64]] {.async.} =
return s.db.getMessageCount()

method getPagesCount*(s: SqliteDriver):
Future[ArchiveDriverResult[int64]] {.async.} =
return s.db.getPageCount()

method getPagesSize*(s: SqliteDriver):
Future[ArchiveDriverResult[int64]] {.async.} =
return s.db.getPageSize()

method performVacuum*(s: SqliteDriver):
Future[ArchiveDriverResult[void]] {.async.} =
return s.db.performSqliteVacuum()

method getOldestMessageTimestamp*(s: SqliteDriver):
Future[ArchiveDriverResult[Timestamp]] {.async.} =
return s.db.selectOldestReceiverTimestamp()
Expand All @@ -135,3 +147,4 @@ method close*(s: SqliteDriver):
# Close connection
s.db.close()
return ok()

37 changes: 36 additions & 1 deletion waku/waku_archive/retention_policy/builder.nim
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ import
import
../retention_policy,
./retention_policy_time,
./retention_policy_capacity
./retention_policy_capacity,
./retention_policy_size

proc new*(T: type RetentionPolicy,
retPolicy: string):
Expand Down Expand Up @@ -51,5 +52,39 @@ proc new*(T: type RetentionPolicy,
let retPolicy: RetentionPolicy = CapacityRetentionPolicy.init(retentionCapacity)
return ok(some(retPolicy))

elif policy == "size":
var retentionSize: string
retentionSize = policyArgs

# captures the size unit such as Gb or Mb
let sizeUnit = retentionSize.substr(retentionSize.len-2)
# captures the string type number data of the size provided
let sizeQuantityStr = retentionSize.substr(0,retentionSize.len-3)
# to hold the numeric value data of size
var sizeQuantity: float

if sizeUnit in ["gb", "Gb", "GB", "gB"]:
# parse the actual value into integer type var
try:
sizeQuantity = parseFloat(sizeQuantityStr)
except ValueError:
return err("invalid size retention policy argument: " & getCurrentExceptionMsg())
# Gb data is converted into Mb for uniform processing
sizeQuantity = sizeQuantity * 1024
elif sizeUnit in ["mb", "Mb", "MB", "mB"]:
try:
sizeQuantity = parseFloat(sizeQuantityStr)
except ValueError:
return err("invalid size retention policy argument")
else:
return err ("""invalid size retention value unit: expected "Mb" or "Gb" but got """ & sizeUnit )

if sizeQuantity <= 0:
return err("invalid size retention policy argument: a non-zero value is required")

let retPolicy: RetentionPolicy = SizeRetentionPolicy.init(sizeQuantity)
return ok(some(retPolicy))

else:
return err("unknown retention policy")

Original file line number Diff line number Diff line change
Expand Up @@ -75,4 +75,10 @@ method execute*(p: CapacityRetentionPolicy,
if res.isErr():
return err("deleting oldest messages failed: " & res.error)

# vacuum to get the deleted pages defragments to save storage space
# this will resize the database size
let resVaccum = await driver.performVacuum()
if resVaccum.isErr():
return err("vacuumming failed: " & resVaccum.error)

return ok()
98 changes: 98 additions & 0 deletions waku/waku_archive/retention_policy/retention_policy_size.nim
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
when (NimMajor, NimMinor) < (1, 4):
{.push raises: [Defect].}
else:
{.push raises: [].}

import
std/times,
stew/results,
chronicles,
chronos,
os
import
../driver,
../retention_policy

logScope:
topics = "waku archive retention_policy"

# default size is 30 Gb
const DefaultRetentionSize*: float = 30_720

# to remove 20% of the outdated data from database
const DeleteLimit = 0.80

type
# SizeRetentionPolicy implements auto delete as follows:
# - sizeLimit is the size in megabytes (Mbs) the database can grow upto
# to reduce the size of the databases, remove the rows/number-of-messages
# DeleteLimit is the total number of messages to delete beyond this limit
# when the database size crosses the sizeLimit, then only a fraction of messages are kept,
# rest of the outdated message are deleted using deleteOldestMessagesNotWithinLimit(),
# upon deletion process the fragmented space is retrieve back using Vacuum process.
SizeRetentionPolicy* = ref object of RetentionPolicy
sizeLimit: float

proc init*(T: type SizeRetentionPolicy, size=DefaultRetentionSize): T =
SizeRetentionPolicy(
sizeLimit: size
)

method execute*(p: SizeRetentionPolicy,
driver: ArchiveDriver):
Future[RetentionPolicyResult[void]] {.async.} =
## when db size overshoots the database limit, shread 20% of outdated messages

# to get the size of the database, pageCount and PageSize is required
# get page count in "messages" database
var pageCount = (await driver.getPagesCount()).valueOr:
return err("failed to get Pages count: " & $error)

# get page size of database
var pageSizeRes = await driver.getPagesSize()
var pageSize: int64 = int64(pageSizeRes.valueOr(0) div 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var pageSize: int64 = int64(pageSizeRes.valueOr(0) div 1024)
let pageSize: int64 = int64(pageSizeRes.valueOr(0) div 1024)

Unless you have a good reason to modify a variable later on, it should be final assigned with let. A good rule of thumb is to always use let, unless you are forced to create a variable that needs to be modifiable.


if pageSize == 0:
return err("failed to get Page size: " & pageSizeRes.error)

# database size in megabytes (Mb)
var totalSizeOfDB: float = float(pageSize * pageCount)/1024.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var totalSizeOfDB: float = float(pageSize * pageCount)/1024.0
let totalSizeOfDB: float = float(pageSize * pageCount)/1024.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and elsewhere)


# check if current databse size crosses the db size limit
if totalSizeOfDB < p.sizeLimit:
return ok()

# keep deleting until the current db size falls within size limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in this function a pattern that is usually solved with a do while. A "do while" can remove aroud 25 lines of code, since totalSizeOfDB, pageSize and pageCount are a bit boilerplate.

Since nim doesnt have "do while" I would suggest:

while true:
    ...(get page get count)
    var totalSizeOfDB: float = float(pageSize * pageCount)/1024.0
    if totalSizeOfDB > p.sizeLimit:
      break
    let res = await driver.deleteOldestMessagesNotWithinLimit(limit=pageDeleteWindow)
    if res.isErr():
        return err("deleting oldest messages failed: " & res.error)

So with something like this you just need 1 single call to totalSizeOfDB, pageSize and pageCount

while totalSizeOfDB > p.sizeLimit:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks great to me! However, given the possible issue of stepping into an infinite loop, I think we should have a PR in the near future where we prevent that possible blocking issue. For example, if the measured totalSizeOfDB doesn't change in two consecutive iterations, then we should break the loop by returning the appropriate error.

Copy link
Contributor Author

@ABresting ABresting Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I do not like loops in the code, what I can imagine we can do is, since we have a max size/threshold of the database, we can have a lower threshold of the database as well, then inside the loop, we introduce a notation, i.e. max number of times the loop should run so that the size of the database will reduce to the lowest,

$$ n > \frac{\log(\text{lowerThresholdInMbs}/\text{maxDbSizeOrCurrentDbSize})}{\log(\text{reducedToUponEachIteration})} $$

Here:
lowerThresholdInMbs is the lowest database size possible
maxDbSizeOrCurrentDbSize is the maximum database size possible
reducedToUponEachIteration is the fraction of size/pages the database reduced to upon each iteration

For eg., in the case of 30720 MBs(30 GB) as max size, 0.1 MBs as a lower threshold, and 0.80 (80% of the db pages retained) as a fraction of the database retained upon each iteration. We get 57 iterations. SO if we use this then there is a guarantee that the loop will not cross 57 times at max.

This way there is a good chance that the forever loop is not encountered.
Post this we can affirm a warning/error if the database size is still not less than the maximum permitted size.

WDYT? @Ivansete-status @alrevuelta

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote to follow the simplest approach. However it is fine for this PR for now and we can revisit that in a separate PR in the near future.

# to shread/delete messsges, get the total row/message count
let numMessagesRes = await driver.getMessagesCount()
if numMessagesRes.isErr():
return err("failed to get messages count: " & numMessagesRes.error)
let numMessages = numMessagesRes.value

# 80% of the total messages are to be kept, delete others
let pageDeleteWindow = int(float(numMessages) * DeleteLimit)

let res = await driver.deleteOldestMessagesNotWithinLimit(limit=pageDeleteWindow)
if res.isErr():
return err("deleting oldest messages failed: " & res.error)

# vacuum to get the deleted pages defragments to save storage space
# this will resize the database size
let resVaccum = await driver.performVacuum()
if resVaccum.isErr():
return err("vacuumming failed: " & resVaccum.error)

# get the db size again for the loop condition check
pageCount = (await driver.getPagesCount()).valueOr:
return err("failed to get Pages count: " & $error)

pageSizeRes = await driver.getPagesSize()
pageSize = int64(pageSizeRes.valueOr(0) div 1024)

if pageSize == 0:
return err("failed to get Page size: " & pageSizeRes.error)

totalSizeOfDB = float(pageSize * pageCount)/1024.0

return ok()