Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retention policy with GB or MB limitation #1885

Closed
Tracked by #3072
Ivansete-status opened this issue Aug 4, 2023 · 23 comments
Closed
Tracked by #3072

Add retention policy with GB or MB limitation #1885

Ivansete-status opened this issue Aug 4, 2023 · 23 comments

Comments

@Ivansete-status
Copy link
Collaborator

Ivansete-status commented Aug 4, 2023

Background

We came across an issue in HK-prod node where the SQLite database reached 39GB.
We need a mechanism to apply a retention policy so that the database doesn't grow more than a certain amount of GB, or MB.

Acceptance criteria

The Waku node with Store mounted should be able to limit the maximum space occupied by the underlying database.
This applies to both sqlite and postgres.

The new retention policy should support the next formats:

--store-message-retention-policy=size:25GB
--store-message-retention-policy=size:25gb
--store-message-retention-policy=size:250MB
--store-message-retention-policy=size:250mb
... any of the case combinations
@Ivansete-status Ivansete-status added the enhancement New feature or request label Aug 4, 2023
@Ivansete-status Ivansete-status changed the title chore: add retention policy with GB limitation chore: add retention policy with GB or MB limitation Aug 4, 2023
@SionoiS
Copy link
Contributor

SionoiS commented Aug 4, 2023

If the cleaning (vacuuming?) require space it should also be taken into account.

@Ivansete-status
Copy link
Collaborator Author

If the cleaning (vacuuming?) require space it should also be taken into account.

What do you mean sorry?

@richard-ramos
Copy link
Member

richard-ramos commented Aug 4, 2023

Some useful queries for this task:

-- sqlite
SELECT page_count * page_size as size FROM pragma_page_count(), pragma_page_size();

This will return total DB size, including free pages, so it should take into account VACUUM


-- postgresql
SELECT pg_size_pretty( pg_database_size('dbname') );

@SionoiS
Copy link
Contributor

SionoiS commented Aug 4, 2023

If the cleaning (vacuuming?) require space it should also be taken into account.

What do you mean sorry?

The algo. that delete and move messages around require memory and disk space.
If the node is already at max disk capacity it could fail?

@Ivansete-status
Copy link
Collaborator Author

If the cleaning (vacuuming?) require space it should also be taken into account.

What do you mean sorry?

The algo. that delete and move messages around require memory and disk space. If the node is already at max disk capacity it could fail?

ah yes! You are absolutely right! The user might need to leave some space. Interesting to add this recommendation in the wakunode2 description parameter.

@Ivansete-status Ivansete-status mentioned this issue Aug 7, 2023
8 tasks
@vpavlin vpavlin moved this to To Do in Waku Aug 22, 2023
@fryorcraken fryorcraken added the E:PostgreSQL See https://github.com/waku-org/pm/issues/84 for details label Sep 8, 2023
@ABresting
Copy link
Contributor

Added the changes, with the test case, and tested on the machine several times.
Kindly go through it!

@ABresting
Copy link
Contributor

Weekly update:
Added the new retention policy based on DB size.
Users can provide the size such as <size_number><case_insenstive_gb_or_mb> ex. 30gb (which is also the default)
--store-message-retention-policy=size:25GB
--store-message-retention-policy=size:25gb
--store-message-retention-policy=size:250MB
--store-message-retention-policy=size:250mb
Test case also added.
Outdated messages/rows are deleted to suffice the size limit, with 20% size reduction upon overflowing.

ABresting added a commit that referenced this issue Sep 29, 2023
@ABresting ABresting self-assigned this Sep 29, 2023
ABresting added a commit that referenced this issue Sep 30, 2023
* chore: add retention policy with GB or MB limitation #1885

* chore: add retention policy with GB or MB limitation

* chore: updated code post review- retention policy

* ci: extract discordNotify to separate file

Signed-off-by: Jakub Sokołowski <jakub@status.im>

* ci: push images to new wakuorg/nwaku repo

Signed-off-by: Jakub Sokołowski <jakub@status.im>

* ci: enforce default Docker image tags strictly

Signed-off-by: Jakub Sokołowski <jakub@status.im>

* ci: push GIT_REF if it looks like a version

Signed-off-by: Jakub Sokołowski <jakub@status.im>

* fix: update wakuv2 fleet DNS discovery enrtree

https://github.com/status-im/infra-misc/issues/171

* chore: resolving DNS IP and publishing it when no extIp is provided (#2030)

* feat(coverage): Add simple coverage (#2067)

* Add test aggregator to all directories.
* Implement coverage script.

* fix(ci): fix name of discord notify method

Also use absolute path to load Groovy script.

Signed-off-by: Jakub Sokołowski <jakub@status.im>

* chore(networkmonitor): refactor setConnectedPeersMetrics, make it partially concurrent, add version (#2080)

* chore(networkmonitor): refactor setConnectedPeersMetrics, make it partially concurrent, add version

* add more metrics, refactor how most metrics are calculated

* rework metrics table fillup

* reset connErr to make sure we honour successful reconnection

* chore(cbindings): Adding cpp example that integrates the 'libwaku' (#2079)

* Adding cpp example that integrates the `libwaku`

---------

Co-authored-by: NagyZoltanPeter <113987313+NagyZoltanPeter@users.noreply.github.com>

* fix(ci): update the dependency list in pre-release WF (#2088)

* chore: adding NetConfig test suite (#2091)

---------

Signed-off-by: Jakub Sokołowski <jakub@status.im>
Co-authored-by: Jakub Sokołowski <jakub@status.im>
Co-authored-by: Anton Iakimov <yakimant@gmail.com>
Co-authored-by: gabrielmer <101006718+gabrielmer@users.noreply.github.com>
Co-authored-by: Álex Cabeza Romero <alex93cabeza@gmail.com>
Co-authored-by: Vaclav Pavlin <vaclav@status.im>
Co-authored-by: Ivan Folgueira Bande <128452529+Ivansete-status@users.noreply.github.com>
Co-authored-by: NagyZoltanPeter <113987313+NagyZoltanPeter@users.noreply.github.com>
@ABresting
Copy link
Contributor

Weekly update:
In review: the database bug to delete limited messages/rows
Upcoming/working: updated retention policy + test + missing tes on timestamp based retention policy

Undergoing: MUID concept on message level

@ABresting
Copy link
Contributor

Closing this issue, seems feature complete. Also made changes related to Infra-level testing.

@github-project-automation github-project-automation bot moved this from To Do to Done in Waku Oct 13, 2023
@Ivansete-status
Copy link
Collaborator Author

Hey @ABresting! Although it is awesome the implementation so far, I think we cannot close this issue yet because we need to double-check that it works correctly on Postgres.

@chair28980 chair28980 added the E:2.1: Production testing of existing protocols See https://github.com/waku-org/pm/issues/49 for details label Nov 7, 2023
@chair28980
Copy link
Contributor

Can be completely de-prioritized per nwaku PM meeting 2023-11-07.

@fryorcraken
Copy link
Collaborator

Does not seem to work for PostgreSQL:

ESC[36mnwaku_1              |ESC[0m DBG 2023-11-28 14:11:03.624+00:00 executing message retention policy         topics="waku node" tid=1 file=waku_node.nim:753
ESC[36mnwaku_1              |ESC[0m ERR 2023-11-28 14:11:03.662+00:00 exception in getInt, parseInt              tid=1 file=postgres_driver.nim:424 error="invalid integer: "
export EXTRA_ARGS="--store-message-retention-policy=size:30GB"

Justification of the de-prioritization?

@Ivansete-status
Copy link
Collaborator Author

The current implementation is correct from the nwaku point of view, but we need an additional action to effectively reduce the size occupied by the database.

The database size is not properly updated when deleting rows as per how Postgres works. i.e. the rows are marked as "deletable". Therefore, we have these options:

  1. Perform VACUUM. This doesn't give space back to the OS. Instead, the freed space is reserved to be used again by Postgres.
  2. Perform VACUUM FULL. This frees space stored by the database and brings it back to the OS but has a very big drawback, which is that it blocks the whole database for writing and reading.
  3. Use pg_repack. This should be done manually and the Postgres database needs to have this extension installed.
  4. ...

@fryorcraken @ABresting @apentori @jakubgs @yakimant - what is the best approach from your point of view to maintain the Postgres database size under a certain level?

@fryorcraken
Copy link
Collaborator

When using (1) I guess the trick is to ensure it is performed before the database reaches maximum size, right? So we may need to operate with some buffer where we delete and VACUUM before max size is reached.
E..g delete 20% of msg and RUN VACUUM when database reached 95% of max size.

the 20% need to be selected so that the VACUUM Works efficiently and also, we don't end up VACUUM ing every 10 s.

@ABresting
Copy link
Contributor

When using (1) I guess the trick is to ensure it is performed before the database reaches maximum size, right? So we may need to operate with some buffer where we delete and VACUUM before max size is reached. E..g delete 20% of msg and RUN VACUUM when database reached 95% of max size.

the 20% need to be selected so that the VACUUM Works efficiently and also, we don't end up VACUUM ing every 10 s.

I believe that initially it was designed to work like this, when retention limit is reached, remove the 20% of rows.

Like you suggest we can now assume if user has given X size limit then from this we can calculate the new retention limit (95% of input) just to be on the safer side. BUT the issue before was with SQLite, may be with this PR it has been resolved.

@Ivansete-status
Copy link
Collaborator Author

Hey there!

imo, the approach 1 - VACUUM would be the best option but we need to adapt the code because right now it would apply the database reduction each time the retention policy is executed. Let me elaborate with an example:

  1. Scenario:
    a. size limit == 1000 bytes
    b. database-size measured == 2000 bytes (the Postgres driver gives this info)
    c. the database has 100 rows
  2. The retention policy is applied because 1.b > 1.a, and then 20% of the rows are deleted
  3. Scenario:
    a. size limit == 1000 bytes
    b. database-size measured == 2000 bytes (the Postgres driver still measures 2000 bytes because the VACUUM operation doesn't return the space to the OS)
    c. the database has 80 rows
  4. This will repeat and the database will become empty in the end.

( cc @fryorcraken @ABresting )

@yakimant
Copy link
Member

@Ivansete-status, I need to read more to understand PG vacuuming.
But as from you desctription - I like approach 1.

@yakimant
Copy link
Member

BTW, did you have a look at autovacuum?
https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM

@Ivansete-status
Copy link
Collaborator Author

BTW, did you have a look at autovacuum? https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM

Thanks for the comments and suggestion!
At first I was kind of reluctant of having an automatic process but is something it worth considering too. Apparently it doesn't cause too much read/write blocks: https://stackoverflow.com/questions/73655953/is-postgres-autovacuum-process-blocking-reading-and-writing-the-table#:~:text=It%20will%20take%20light%2Dweight,block%20ordinary%20reads%20and%20writes.

@chair28980 chair28980 moved this from Done to In Progress in Waku Jan 2, 2024
@jakubgs
Copy link
Contributor

jakubgs commented Jan 3, 2024

I don't really see the point of an automatic VACUUM FULL if that space is going to get filled again shortly.

We don't care if filesystem space is used, we care that we don't hit the filesystem size limit and crash and burn.

@yakimant
Copy link
Member

Best article on SQLITE vacuuming I've seen:
https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/
Posting for history.

@chair28980 chair28980 added Epic E:nwaku PostgreSQL Maintenance and removed enhancement New feature or request E:2.1: Production testing of existing protocols See https://github.com/waku-org/pm/issues/49 for details E:PostgreSQL See https://github.com/waku-org/pm/issues/84 for details labels Feb 4, 2024
@chair28980 chair28980 changed the title chore: add retention policy with GB or MB limitation [Epic] add retention policy with GB or MB limitation Feb 4, 2024
@chair28980 chair28980 changed the title [Epic] add retention policy with GB or MB limitation Add retention policy with GB or MB limitation Feb 5, 2024
@fryorcraken
Copy link
Collaborator

Note we are rescoping this under waku-org/pm#119 to match new process.

@Ivansete-status
Copy link
Collaborator Author

I'm closing this issue as per the work made in #2506

@github-project-automation github-project-automation bot moved this from In Progress to Done in Waku Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

8 participants