-
-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archive old posts to reduce disk usage #5016
Comments
I am actually more concerned about the amount of writes votes do to SSD storage. I really chews through NVMe drives, and consumer grade SSDs with a low TDW are gone in about a year of Lemmy usage. |
If we do something like this, I'd like to add an We could update that |
Speaking from Lemmy.World perspective, we do not want to drop old votes. |
Wouldn't this affect people who sort by Disabling comments would be bad though I think. It doesn't really save disk space, does it? I always thought being able to comment on and continue with old posts was a big strength of Lemmy over Reddit. Because Lemmy has the |
Its possible to clear out old votes in a way that doesn't lock the old content. We'd just need to make sure that it never recalculates the scores from scratch for those items. |
With storage prices being very cheap, I don't see a reason to archive votes. When Lemmy starts getting indexed more, I believe Lemmy will rise. If you archive votes, you will essentially confuse people that don't use Lemmy as to why a post has x comments but no votes. Furthermore, just like in YouTube, upvotes and downvotes are an indicator as to how a post is perceived by the community. If a post has bad advice/content/etc., downvotes will (hopefully) dominate and let other users know about it. If at all, an optional auto deletion of old posts with little to no activity can be used instead (e.g. posts with 0-2 comments and/or 0-5 votes that are older than 2+ years) |
While storage prices are cheap, votes are still one of the parts using the most storage in the db, as they are two of the tables with the most rows. |
I'm against this, it severely limits the ability for people to engage with older content, it both confuses people and prevents people from adding to the discussion later if new information is present. I always hated the idea of "archiving" posts so people could no longer interact. You know how many times I've gotten useful information from replying to very old posts or given people useful information because they replied to one of my older posts. So yeah I'm against this, maybe if we can find a way to reduce the vote data without denying the ability for future votes to be added to it that would be good but I'm against locking old posts and saying they're archived, the Reddit way. |
By default it should be off, majority of instance admins aren't going to touch the config, and that would cause headaces for other people, and possibly them when this starts happening. If you're going to have it at all, make it off by default. Still opposed to the Reddit way though of locking all further engagement of a post, that feels wrong, because as I said earlier, people do benefit from engagement with older posts. |
It still kind of boggles me that 25% of our DB is just votes. People sometimes post pages of markdown. Other possibilities for saving space, that wouldn't be archiving:
|
published is needed to deal with activities received out of order, at least for some time. it might make a difference if it's nullable and could be purged for older content, but there's also an argument to keep the dates, as people looking at liked content will usually prefer to have that sorted by when they liked it, not by when it was published, similar to #4446. while this isn't exposed in lemmy 0.19.5, i think #5034 makes this available in the next release? |
Local votes could keep their published timestamp but remote votes don't need it? Although it's kind of wasteful to have another table or something, but it would probably help small instances a bit |
remote votes still do to properly deal with activities received out of order. when someone downvotes and then upvotes something and this gets transmitted out of order, letting the receiving instance see the upvote first and then the downvote, there wouldn't be a way for the receiving instance to know to ignore the upvote otherwise. |
It should be possible to send vote timestamps via federation, but not store them in the db. |
Another point, since I do daily DB backups for lemmy.ml. A So although postgres might take up a lot more space in operation (due to indexes and other things), the actual size of the data compressed isn't that large. |
Closing this as db size doesnt seem to be a real concern for admins. |
Requirements
Is your proposal related to a problem?
Votes make up the largest part of Lemmy's database size. In case of lemmy.ml, the database is 40.4 GB, with 15 GB of those being votes. Of those votes, 63% are more than 6 months old, and this proportion will only go up with time.
Describe the solution you'd like.
Really there is no reason to keep all these old votes around, because old posts are ignored by ranking algoritms. So we could save about 9.5 GB or 24% of disk space on lemmy.ml by deleting votes for posts older than 6 months. Votes displayed to users will still be correct as they are stored separately in
post_aggregates
table. We only need to ensure that ranking algorithms never recalculate scores for archived posts. Additionally it makes sense to lock commenting and other actions on posts after the same interval.Describe alternatives you've considered.
Keep the current behaviour, but it will lead to very large database sizes in a few years.
Additional context
No response
The text was updated successfully, but these errors were encountered: