-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: postgres vacuum enabled with test case #2313
Conversation
This PR may contain changes to database schema of one of the drivers. If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues. Please make sure the label |
You can find the image built from this PR at
Built from 6164530 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
Nevertheless, the behaviour isn't 100% valid because the database size is not changed at all.
I think we need to properly configure the "autocavuum" in the database ( suggested by @yakimant ) and simplify this logic. I'd remove the while
.
On the other hand, I've tested locally with the following settings in order to configure a quite intense vacuum service, and the database size gets reduced properly:
autovacuum = on # Enable autovacuum subprocess? 'on'
# requires track_counts to also be on.
# Aggressive settings for frequent autovacuum operations
autovacuum_vacuum_scale_factor = 0.01 # Trigger vacuum at 1% of dead tuples
autovacuum_vacuum_threshold = 50 # Minimum number of updated or deleted tuples before vacuum
autovacuum_analyze_scale_factor = 0.01 # Trigger analyze at 1% of changed tuples
autovacuum_analyze_threshold = 50 # Minimum number of inserted, updated, or deleted tuples before analyze
autovacuum_freeze_max_age = 200000000 # Maximum age before forced vacuum freeze
# Optionally adjust vacuum cost delay to control vacuuming speed
autovacuum_vacuum_cost_delay = 10 # Vacuuming cost delay in milliseconds
@Ivansete-status, just to mention, that I've never used Thanks for trying it! Would be great to know, how well does it work for us. |
Basically, the |
I've read a bit about Looks like But I don't think this will actually let the space available to the filesystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ABresting !
It looks nice. I've added some comments. Ping me when done and I'll double-check again.
Regarding the disk space consumption, I've noticed that the VACUUM
doesn't work well in a normal scenario. In other words, it only worked well when I forced VACUUM
every two seconds.
I've also tried the autovacuum
and it didn't work well either. I couldn't manage to reduce the database size.
IMO, the only working solution is to use pg_repack
tool, even though at first I was quite reluctant to use it (it requires installing a non-standard extension and having the pg_repack
utility installed in the system as well.)
As you properly mentioned, this PR is not enough and we need to perform additional actions to keep the database size bounded.
# sleep to give it some time to complete vacuuming | ||
await sleepAsync(350) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sleep shouldn't be needed. If we need to perform that sleep to make tests work properly, I suggest applying that sleep in the tests directly.
That applies to the other two retention policies :)
# NOTE: Using SQLite vacuuming is done manually, we delete a percentage of rows | ||
# if vacumming is done automatically then we aim to check DB size periodially for efficient | ||
# retention policy implementation. | ||
# to shread/delete messsges, get the total row/message count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny typo:
# to shread/delete messsges, get the total row/message count | |
# to shread/delete messages, get the total row/message count |
Do you mind reviewing all the comments within this execute
proc? There is another tiny typo in "periodially" and some lines seem outdated.
let dbEngine = driver.getDbType() | ||
if dbEngine == "sqlite": | ||
return ok() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't seem correct as it prevents from applying "size" retention policy in SQLite.
This logic might be suitable to be run from within driver.performVacuum()
Thanks for the nice review Ivan!
I am well aware of the situation you are mentioning. That's a Postgres
async issue where when you delete a large number of rows then they take
time to first delete and then on top if you vacuum then too it takes time.
But good part is that all those operations will eventually take place if
not instantly, since Postgres operations are best effort. Not a lot of
things we can do there. Open to hear how pg_repack tool's trade off in our
case since I have a hunch that it comes with some baggage, performance
degradation, data corruption to name some I could find.
…On Sat, 30 Dec 2023, 01:44 Ivan FB, ***@***.***> wrote:
***@***.**** commented on this pull request.
Thanks @ABresting <https://github.com/ABresting> !
It looks nice. I've added some comments. Ping me when done and I'll
double-check again.
Regarding the disk space consumption, I've noticed that the VACUUM
doesn't work well in a normal scenario. In other words, it only worked well
when I forced VACUUM every two seconds.
I've also tried the autovacuum and it didn't work well either. I couldn't
manage to reduce the database size.
IMO, the only working solution is to use pg_repack tool, even though at
first I was quite reluctant to use it (it requires installing a
non-standard extension and having the pg_repack utility installed in the
system as well.)
As you properly mentioned, this PR is not enough and we need to perform
additional actions to keep the database size bounded.
------------------------------
In waku/waku_archive/retention_policy/retention_policy_capacity.nim
<#2313 (comment)>:
> + # sleep to give it some time to complete vacuuming
+ await sleepAsync(350)
This sleep shouldn't be needed. If we need to perform that sleep to make
tests work properly, I suggest applying that sleep in the tests directly.
That applies to the other two retention policies :)
------------------------------
In waku/waku_archive/retention_policy/retention_policy_size.nim
<#2313 (comment)>:
> # NOTE: Using SQLite vacuuming is done manually, we delete a percentage of rows
# if vacumming is done automatically then we aim to check DB size periodially for efficient
# retention policy implementation.
+ # to shread/delete messsges, get the total row/message count
tiny typo:
⬇️ Suggested change
- # to shread/delete messsges, get the total row/message count
+ # to shread/delete messages, get the total row/message count
Do you mind reviewing all the comments within this execute proc? There is
another tiny typo in "periodially" and some lines seem outdated.
------------------------------
In waku/waku_archive/retention_policy/retention_policy_size.nim
<#2313 (comment)>:
> + let dbEngine = driver.getDbType()
+ if dbEngine == "sqlite":
+ return ok()
That doesn't seem correct as it prevents from applying "size" retention
policy in *SQLite*.
This logic might be suitable to be run from within driver.performVacuum()
—
Reply to this email directly, view it on GitHub
<#2313 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEGBFRITJ3C4JHYWCVRYRY3YL4QA3AVCNFSM6AAAAABA6IFBX2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOOJYHAYDOMJZHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Ivansete-status vacuum and autovacuum should not make disk usage lower. But probably it's fine in many cases. |
Thanks for the comment @yakimant ! You are absolutely right. In order to have the "size" retention policy in Postgres working correctly we need to take additional action to reduce the disk space occupied by the database. If not, every time the retention policy is applied, it will drop rows until the table gets empty. If we cannot work on such an external tool (i.e. |
Thanks for the comment Ivan 💯 About the size of the disk space, I believe that from a user perspective, if cleaned space is not reclaimed to the file system, but still be used by DB to insert new messages then again it is a win-win anyway. I do not think we have run infra tests this way where the retention policy doesn't work in an aggressive way (risk of emptying DB). I think just the Table insertions were not the major issue, it was the ever-growing log size? BTW this running/checking retention policy after x amount of time is how it is working rn in the app code I believe. |
Thanks for the comment @ABresting ! You are right that the database won't get empty in the happy scenario where the node gets messages continuously. However, we cannot always guarantee that, and there will be periods of inactivity. We cannot deliver something that won't work well in 100% of the cases. The only solution I see, if we want to support the "size" retention policy for Postgres, is that we start using the |
Thanks so much for the PR @ABresting ! |
Description
With this change, Waku Store protocol now supports retention policy on PostgreSQL. Outdated messages are deleted and DB enters a non-blocking VACUUM state. Effectively it reduces the size of the DB on disk while allowing parallel read/write operation on database.
Changes
Apart from vacuum functionality in PostgreSQL database, I have extended the ArchiveDriver so that using the driver one can know which type of Database driver it is currently using i.e. Sqlite or Postgres or in-memory (Queue driver) etc. It was required to disable SQLite-based vacuuming smoothly.
Retention policy test cases have also been updated to support Postgres instead of SQLite. We make this choice since SQLite Vacuum process blocks the read/write operations, so it is decided to do Vacuum manually on Waku nodes/client running SQLite as store archive.
Changed the Size based retention policy test case such that it fulfills the purpose that after performing the vacuum, DB size is reduced.
Issue
closes #1885