-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewriting Records Via Scrub #15335
Comments
This feature would permanently inflate the allocated space requirements of all datasets that have snapshots. The only benefit over employing send/recv (which could keep the shapshot chain and, with careful ordering, could also preserve clone hierarchy) would be that it happens online, compared to the minimal offline time needed for a final incremental |
@GregorKopka: Maybe I am missing sth but isn't there the added advantage of using whatever space is available to little by little migrate the blocks compared to send/recv requiering the availabilty of enough free space to (temporaririly) double the entire dataset? |
This is exactly what I meant, not sure why @GregorKopka ignored that part? I specifically mention the rewrite option for scrub requiring a freespace limit, below which it will stop rewriting records for the dataset/pool. This allows the rewriting to occur incrementally over time as more freespace becomes available, i.e- as older snapshots are destroyed. And as you say, it allows this to be done without requiring a minimum of double the space be available either on the same or another pool for handling the send/receive. We don't all just have a handy second, unused, storage array just lying around, and it's a big overkill operation to perform when only a fraction of records may actually require rewriting. |
You two seem to blissfully ignore that snapshots, clones, bookmarks and checkpoints exist and are used by people using ZFS. How should they be treated, in regard to this feature? |
How about instead of insulting us, you could try reading what we said? The freespace limit is intended specifically to address this; it means that if the "new" records are consuming too much space (because snapshots won't release the "old" records) then the rewriting will stop until a later scrub. So long as snapshots are being destroyed when they're no longer needed, this allows multiple rounds of scrubbing (which is something you should be doing periodically anyway) to eventually rewrite everything that needs rewriting over time. If a pool never discards snapshots the rewriting will never be able to complete, but it's up to the documentation to make that clear. For example, if your dataset/pool has 150gb of freespace, and you set the limit to 100gb, then you can do up to 50gb of rewrites. Following that, if you free up 25gb of space as a result of destroying old or intermediate snapshots, then the next scrub can complete 25gb of rewrites and so-on until eventually it's finished (no more rewrites left to find). If block pointer rewriting is ever implemented, this caveat can then be removed as it will no longer be a problem. |
Let's look at the simplest possible backup scheme with
Logical conclusion #1: All existing data in the dataset is always held by a snapshot, |
I disagree with your third conclusion. In all of my setups, there is an additional step that expires snapshots on the backup/receiver side which results in all data deleted on the source side to eventually free previously used space. As far as I can tell, the proposed feature will allow to eventually rewrite most if not all data (depending on the use case / data turnover) without the need for block pointer rewrites. |
As long as an incremental backup routine is active, there will always be a snapshot on both the source and destination. Newly data written on the source will be distributed to the special vdevs anyway: feature is a NOP. |
The problem isn't newly written data. So this "argument" is a NOP.
The problem isn't data being received (which is essentially also newly written data) but data that is already present. So another "argument" that is a NOP.
You're assuming that all snapshots are retained indefinitely? So another "argument" that is a NOP! Even on a backup target (which isn't really what this feature is for) you don't want to retain every snapshot because you'd need to continuously upgrade storage even if the size of current data isn't growing (additions are matched by deletions), while if it is growing you'd increase the rate at which capacity needs to be upgraded even further. But on the sending side of most setups you're not going to need such a high level of snapshot retention, and you're certainly not going to want to keep them indefinitely unless you can throw a continuous stream of new hardware at it. Most setups in reality are going to keep local snapshots for a shorter time for the purpose of potential emergency rollbacks, and more snapshots on a backup for recovery in the even the main pool fails entirely somehow. If you don't want local snapshots you also have the option of using bookmarks (or using these separately for backups) as they don't tie up old records (that's the whole point of them; they enable you to discard the snapshot they are created from and still send to another target). Sure, there will be a point at which both "old" and "new" versions of records exist thanks to spapshots/clones/whatever, but again that's exactly what the freespace limit is intended to address. But once the record has been rewritten newer snapshots will no longer reference the old version at all, so as older snapshots are destroyed eventually the old version of the record disappears entirely, just like any other data that is no longer referenced such as files that were deleted etc. But we've now been over this at least three times, I'm sick of explaining it, and I'm starting to think the problem here is ideological rather than practical. |
There are quite some professions where the ability to prove that certain data was created at a certain point in time is important, as is the ability to restore data from points back in time (30 years of professional IT taught me that users take between days to even even years to notice that they accidentally deleted something that they need now). Having a snapshot regime that thins out frequent snapshots to a reasonable daily/weekly/monthly/quarterly/yearly retention is by no way unusable, tools like https://github.com/zfsonlinux/zfs-auto-snapshot or https://github.com/jimsalterjrs/sanoid (which aim at doing exactly that) are quite popular for a reason. Please don't ass-u-me that your way of using technology is the norm for everyone, or even close to a relevant fraction of users. |
This feature doesn't prevent anyone from keeping snapshots indefinitely if they want to or need to, but you're acting like this use case is a blocker to the feature when it's not at all. This is a bad faith straw-man argument and a distraction that has gone on more than long enough. Keeping snapshots indefinitely means you need to aware of the extra storage cost of doing so, and any documentation around this feature (which it's going to require since it's a new option) merely needs to make clear that the data is currently copied (so will appear twice for as long as snapshots hold onto the old version) if that's how it needs to be implemented. That's just a caveat of using this entirely optional feature which users with indefinite snapshots are perfectly free to simply not use until block pointer rewriting is implemented (if ever), as they need to balance their priorities between storage used and updating properties for older records exactly as they already need to using the "overkill" methods (full send/receive or copying affected files). |
So you can practically say from the start: If ZFS development does not implement this feature in the future, ZFS will eventually disappear into obscurity in the professional sector because it is not competitive with the competition from the COW file system sector. This does not mean that it will no longer be used in the semi-professional sector, but rather that it is becoming increasingly less important in memory management and long-term data storage. In the same way, proper storage tier ring can finally be implemented. The solution using special devices works, but it is not a correct solution. Especially since you can no longer remove the special devices from the pool. A function could also be implemented that scans older data and then compresses and de-duplicates it to a higher degree, thus saving more storage space. This would mean that ZFS would finally be competitive again on the storage market. |
Describe the feature would like to see added to OpenZFS
I would like to see the ability to fully "rewrite" records added to ZFS as an alternative to the elusive "block pointer rewrite" feature that so often blocks other features from being implemented.
The idea is simple; given one or more records, ZFS will write out the same data as "new" records before atomically retiring the old ones, no block pointer rewrite necessary. This would behave exactly the same as if the user copied a file and renamed it into place, but more granular and with guaranteed atomicity.
To allow it to immediately be put to use, this feature could be accompanied by a "rewrite records" option on
zfs scrub
which, when enabled, tells scrub to rewrite any record it encounters that does not match certain properties of its dataset (e.g- different compression algorithm). The option would require a freespace size (in bytes or as a percentage) and if rewriting the record would take the dataset below this amount of freespace, scrub will skip the rewrite but continue scrubbing as normal. This will allow datasets to be progressively rewritten via multiple scrubs over time until any rewriting is complete, without risking consumption of all remaining freespace. The amount of records rewritten or skipped would be reported tozpool status
.How will this feature improve OpenZFS?
This will enable records to be rewritten to take advantage of new properties, special devices etc. without the need to perform a full
zfs send/receive
cycle to recreate the entire dataset (and requiring downtime for the switchover). If we ever do gain a proper block pointer rewrite capability that can account for snapshots etc., it can then simply be used to optimise this feature, applying the benefit to anything that was implemented using this "full rewrite" method.Implementing this feature could allow issues such #9762 to be closed since the scrubbing should meet its requirements. While issues such as #15226 could be implemented by adding support for their cases to the rewriting scrub (i.e- check if a record should be on the special device, and rewrite it to do so).
The text was updated successfully, but these errors were encountered: