Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: vacuum in time based retention policy #2165

Closed
wants to merge 1 commit into from

Conversation

ABresting
Copy link
Contributor

Description

Enables the vacuum of DB in Time-based retention policy.

Changes

  • vacuuming of DB in time-based retention policy

Issue

#2164

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for that @ABresting !
Would it be possible to test that this actually reduces the sqlite file size?
Cheers

@github-actions
Copy link

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2165

Built from 9d78ab5

Copy link
Contributor

@SionoiS SionoiS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved BUT like Ivan said. Are we sure it works?

@ABresting
Copy link
Contributor Author

Approved BUT like Ivan said. Are we sure it works?

Now that's actually a good first issue material!

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, although I think we have to monitor how long this vacuuming can stall nodes with large DBs (e.g. Status with 30 days of history), or do we already have data on how long a vacuum can take?

@richard-ramos
Copy link
Member

Would be good to test this PR with a copy of a database like status.test or status.prod. I'm worried that this will end up impacting negatively the node. VACUUM seems more of an activity that should be executed by infra periodically by taking the node offline than something that could be executed as part of normal operation of the node

@richard-ramos
Copy link
Member

The go-waku prod fleet is currently connected to status.prod and keeps 7d of history. VACUUM took 11mins to execute, (because of this, this PR was introduced to indicate the user that something is happening waku-org/go-waku#838)
cc: @yakimant

@jm-clius
Copy link
Contributor

I agree with @richard-ramos here. Approved previously, but with some hard numbers of time it takes to vacuum, I don't think this is a good idea. @ABresting @Ivansete-status I'm not sure that periodic vacuuming should be part of retention policy at all - this seems to be something specifically required by SQLite and not any other type of backend. In this case a better approach is indeed to have infra perform the vacuum "out of band" from time to time so as not to stall production nodes. In our case, this is only necessary until the migration to postgresql, which should be imminent.

@Ivansete-status
Copy link
Collaborator

Good point indeed!
Thanks for bringing this @richard-ramos !

How long does it take if you perform VACUUM regularly? For example, daily, or two consecutive VACUUM calls? I have the impression that it took 11' because the vacuum operation needed to handle a lot of data.

I believe that the VACUUM concept is widely used in other backends. In Postgres, the "standard" VACUUM operation doesn't necessarily block other operations (see this.) And we are performing it asynchronously.

Finally, if we decide that the VACUUM operation should be made manually, then we will need to adapt the other retention policies (capacity and size.) as well.

CC: @NagyZoltanPeter @apentori

@richard-ramos
Copy link
Member

richard-ramos commented Oct 31, 2023

@Ivansete-status: How long does it take if you perform VACUUM regularly? For example, daily, or two consecutive VACUUM calls?

@yakimant executed VACUUM twice in a node with a small db with 1.8GB of data and this is the result:

$ du -sh node/data/store.db
1.8G    node/data/store.db
$ time sudo sqlite3 node/data/store.db 'VACUUM;'

real    2m0.485s
user    0m0.005s
sys     0m0.005s
$ du -sh node/data/store.db
1.8G    node/data/store.db
$ time sudo sqlite3 node/data/store.db 'VACUUM;'

real    1m51.302s
user    0m0.003s
sys     0m0.008s

2m the first run, 1m51s the second. Not much time difference between both attempts.

Some items to take into account when deciding whether to proceed with automated vacuums:

  • VACUUM is an expensive operation, since it requires a lock on the DB
  • A previous attempt with a much larger DB (but way smaller than what status.prod has) took 11 mins.
  • status.prod DBs are 91-93gb.

I'd suggest downloading a copy of one of those databases and running VACUUM locally first to see the impact

I believe that the VACUUM concept is widely used in other backends. In Postgres, the "standard" VACUUM operation doesn't necessarily block other operations (see this.) And we are performing it asynchronously.

Ah! i'm only focusing on sqlite in here. I have not read yet on how postgresql handles vacuum

@Ivansete-status
Copy link
Collaborator

Thanks for checking @richard-ramos , @yakimant !
Wow, it is super slow!

In this case, even though this PR is approved we can't merge it @ABresting. I think we need to close this PR and submit a different one where we remove the VACUUM calls from the "capacity" and "size" retention policies. We cannot perform VACUUM when working with SQLite. There shouldn't be issues in Postgres because all is async there.

On the other hand, we need to create a separate chron script to perform the vacuum task separately under certain conditions. WDYT @richard-ramos , @yakimant ?

@yakimant
Copy link
Member

yakimant commented Nov 8, 2023

I have simmilar feelings, we need to address sqlite vacuuming differently.
Either by cron job or https://sqlite.org/pragma.html#pragma_auto_vacuum, but not a part of waku code.

@richard-ramos
Copy link
Member

I'm actually thinking of removing the vacuum functionality of go-waku because seems to me that having it as part of the node is too problematic.

@ABresting
Copy link
Contributor Author

closing this PR since due to SQLite it is better to not use automatic vacuuming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants