Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: postgres vacuum enabled with test case #2313

Closed
wants to merge 4 commits into from

Conversation

ABresting
Copy link
Contributor

@ABresting ABresting commented Dec 21, 2023

Description

With this change, Waku Store protocol now supports retention policy on PostgreSQL. Outdated messages are deleted and DB enters a non-blocking VACUUM state. Effectively it reduces the size of the DB on disk while allowing parallel read/write operation on database.

Changes

Apart from vacuum functionality in PostgreSQL database, I have extended the ArchiveDriver so that using the driver one can know which type of Database driver it is currently using i.e. Sqlite or Postgres or in-memory (Queue driver) etc. It was required to disable SQLite-based vacuuming smoothly.

Retention policy test cases have also been updated to support Postgres instead of SQLite. We make this choice since SQLite Vacuum process blocks the read/write operations, so it is decided to do Vacuum manually on Waku nodes/client running SQLite as store archive.

Changed the Size based retention policy test case such that it fulfills the purpose that after performing the vacuum, DB size is reduced.

  • Postgres Vacuum function created
  • SQLite focused retention policy test cases replaced by Postgres
  • Vacuum function added after each type of retention policy
  • ArchiveDriver extended to support database type feature

Issue

closes #1885

Copy link

This PR may contain changes to database schema of one of the drivers.

If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues.

Please make sure the label release-notes is added to make sure upgrade instructions properly highlight this change.

@ABresting ABresting added the release-notes Issue/PR needs to be evaluated for inclusion in release notes highlights or upgrade instructions label Dec 21, 2023
Copy link

github-actions bot commented Dec 21, 2023

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2313

Built from 6164530

Copy link
Contributor

@AlejandroCabeza AlejandroCabeza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!
Nevertheless, the behaviour isn't 100% valid because the database size is not changed at all.

I think we need to properly configure the "autocavuum" in the database ( suggested by @yakimant ) and simplify this logic. I'd remove the while.

On the other hand, I've tested locally with the following settings in order to configure a quite intense vacuum service, and the database size gets reduced properly:

autovacuum = on			# Enable autovacuum subprocess?  'on'
					# requires track_counts to also be on.

# Aggressive settings for frequent autovacuum operations
autovacuum_vacuum_scale_factor = 0.01   # Trigger vacuum at 1% of dead tuples
autovacuum_vacuum_threshold = 50        # Minimum number of updated or deleted tuples before vacuum
autovacuum_analyze_scale_factor = 0.01   # Trigger analyze at 1% of changed tuples
autovacuum_analyze_threshold = 50        # Minimum number of inserted, updated, or deleted tuples before analyze
autovacuum_freeze_max_age = 200000000   # Maximum age before forced vacuum freeze

# Optionally adjust vacuum cost delay to control vacuuming speed
autovacuum_vacuum_cost_delay = 10       # Vacuuming cost delay in milliseconds

tests/waku_archive/test_retention_policy.nim Outdated Show resolved Hide resolved
tests/waku_archive/test_retention_policy.nim Outdated Show resolved Hide resolved
waku/waku_archive/driver/queue_driver/queue_driver.nim Outdated Show resolved Hide resolved
@yakimant
Copy link
Member

@Ivansete-status, just to mention, that I've never used autovacuum before and don't know the pros/cons and how it is applicable for our case. I've just seen this option somewhere.

Thanks for trying it! Would be great to know, how well does it work for us.

@ABresting
Copy link
Contributor Author

@Ivansete-status, just to mention, that I've never used autovacuum before and don't know the pros/cons and how it is applicable for our case. I've just seen this option somewhere.

Thanks for trying it! Would be great to know, how well does it work for us.

Basically, the autovacuum process in Postgres is to ensure that deleted tuples/rows space is reusable and the database size is utilized properly. There are certainly thresholds based on which a timely non-blocking simple DB vacuum is done automatically. autovacuum is good to have provided the parameters of autovacuum are set carefully.

@yakimant
Copy link
Member

I've read a bit about vacuum, autovacuum, vacuum full and pg_repack.

Looks like autovacuum is a must for most of the systems, which do update and or delete operations. This should make them emptyed space available for new edits.

But I don't think this will actually let the space available to the filesystem.
Probably we don't neet to take of that unless there is an instant jump of db activity.
In that case pg_repack is suggested by community, rather than vacuum full

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ABresting !
It looks nice. I've added some comments. Ping me when done and I'll double-check again.

Regarding the disk space consumption, I've noticed that the VACUUM doesn't work well in a normal scenario. In other words, it only worked well when I forced VACUUM every two seconds.

I've also tried the autovacuum and it didn't work well either. I couldn't manage to reduce the database size.

IMO, the only working solution is to use pg_repack tool, even though at first I was quite reluctant to use it (it requires installing a non-standard extension and having the pg_repack utility installed in the system as well.)

As you properly mentioned, this PR is not enough and we need to perform additional actions to keep the database size bounded.

Comment on lines +79 to +80
# sleep to give it some time to complete vacuuming
await sleepAsync(350)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sleep shouldn't be needed. If we need to perform that sleep to make tests work properly, I suggest applying that sleep in the tests directly.
That applies to the other two retention policies :)

# NOTE: Using SQLite vacuuming is done manually, we delete a percentage of rows
# if vacumming is done automatically then we aim to check DB size periodially for efficient
# retention policy implementation.
# to shread/delete messsges, get the total row/message count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny typo:

Suggested change
# to shread/delete messsges, get the total row/message count
# to shread/delete messages, get the total row/message count

Do you mind reviewing all the comments within this execute proc? There is another tiny typo in "periodially" and some lines seem outdated.

Comment on lines +46 to +48
let dbEngine = driver.getDbType()
if dbEngine == "sqlite":
return ok()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem correct as it prevents from applying "size" retention policy in SQLite.
This logic might be suitable to be run from within driver.performVacuum()

@ABresting
Copy link
Contributor Author

ABresting commented Dec 30, 2023 via email

@yakimant
Copy link
Member

yakimant commented Jan 3, 2024

@Ivansete-status vacuum and autovacuum should not make disk usage lower.
It only allows cleaned data to give space for new records.

But probably it's fine in many cases.

@Ivansete-status
Copy link
Collaborator

@Ivansete-status vacuum and autovacuum should not make disk usage lower. It only allows cleaned data to give space for new records.

But probably it's fine in many cases.

Thanks for the comment @yakimant ! You are absolutely right.

In order to have the "size" retention policy in Postgres working correctly we need to take additional action to reduce the disk space occupied by the database. If not, every time the retention policy is applied, it will drop rows until the table gets empty.

If we cannot work on such an external tool (i.e. pg_repack) then we shouldn't support "size" retention policy when Postgres is used (cc @ABresting .)

@ABresting
Copy link
Contributor Author

@Ivansete-status vacuum and autovacuum should not make disk usage lower. It only allows cleaned data to give space for new records.
But probably it's fine in many cases.

Thanks for the comment @yakimant ! You are absolutely right.

In order to have the "size" retention policy in Postgres working correctly we need to take additional action to reduce the disk space occupied by the database. If not, every time the retention policy is applied, it will drop rows until the table gets empty.

Thanks for the comment Ivan 💯
Well, IMO if we devise the retention policy in such as way that a drop of rows only happens once and then after x amount of time (conditional to if the size is above the threshold), in that case, we should not be worried about table getting emptied. It should not work aggressively but rather best-effort policy. Now you may find it non-reliable/probable but if not having out of a box tool/solution then the current retention policy should work. If there is no harm then why not?

About the size of the disk space, I believe that from a user perspective, if cleaned space is not reclaimed to the file system, but still be used by DB to insert new messages then again it is a win-win anyway. I do not think we have run infra tests this way where the retention policy doesn't work in an aggressive way (risk of emptying DB). I think just the Table insertions were not the major issue, it was the ever-growing log size?

BTW this running/checking retention policy after x amount of time is how it is working rn in the app code I believe.

@Ivansete-status
Copy link
Collaborator

Thanks for the comment @ABresting !

You are right that the database won't get empty in the happy scenario where the node gets messages continuously. However, we cannot always guarantee that, and there will be periods of inactivity. We cannot deliver something that won't work well in 100% of the cases.

The only solution I see, if we want to support the "size" retention policy for Postgres, is that we start using the pg_repack utility. For that, we need to use a postgres docker image with that extension installed, e.g. hartmutcouk/pg-repack-docker:1.4.8, and on the other hand, we will need to have an external app (another docker service) that invokes the pg_repack regularly (PGPASSWORD=test123 ~/utils/pg_repack-1.4.8/bin/pg_repack -U postgres -h 127.0.0.1 -p 5432 -d postgres --table messages) so that the database size gets bounded properly.

@Ivansete-status
Copy link
Collaborator

Thanks so much for the PR @ABresting !
In the end, we've applied a different approach and we can keep the database size controlled by utilizing partitions. See #2506
Excuse me, I am getting the license to close this PR according to the mentioned above.
Big hug!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes Issue/PR needs to be evaluated for inclusion in release notes highlights or upgrade instructions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add retention policy with GB or MB limitation
4 participants