Fast Dedup: FDT-log feature #15895

allanjude · 2024-02-14T14:20:41Z

Motivation and Context

Dedup tables have a huge performance overhead in part because they require an update to the table on disk for every write, on every transaction. Adding a write-only journal allows updates to be batched up and deferred, reducing the immediate cost.

Description

To address this, the dedup log was added. If the fast_dedup feature is enabled, at the end of each txg, modified entries will be copied to an in-memory "log" object (ddt_log_t), and appended to an on-disk log. If the same block is requested again, the in-memory object will be checked first, and if its there, the entry inflated back onto the live tree without going to storage. The on-disk log is only read at pool import time, to reload the in-memory log.

Each txg, some amount of the in-memory log will be flushed out to a DDT storage object (ie ZAP) as normal. OpenZFS will try hard to flush enough to keep up with the rate of change on dedup entries, but not so much that it would impact overall throughput, and not using too much memory. See the zfs_dedup_log_* tuneables in zfs(4) for more details.

How Has This Been Tested?

TBD.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

cmd/zdb/zdb.c

adamdmoss

Broadly-speaking I like the approach and implementation, although:

I'm not qualified to vouch for its correctness, LGTM but I'm sure there are 1001 subtleties in the journalling that are beyond me
The changes still need to be taken to completion (ifdefs/xxx)
I'm concerned about how much this has been exercised, especially when separated from the other DDT optimization PRs

robn · 2024-05-15T03:47:21Z

[Fast dedup stack rebased to master 3c941d1]

robn · 2024-06-07T01:38:06Z

@amotin last push switches from storing count of blocks to count of bytes, as discussed. Note that it still operates on whole blocks internally, it just changes what we store, so we could do something different next time.

robn · 2024-06-07T01:53:36Z

@adamdmoss

The changes still need to be taken to completion (ifdefs/xxx)

I'm concerned about how much this has been exercised, especially when separated from the other DDT optimization PRs

Both reasonable things to flag! This PR is near the top of the larger "fast dedup" stack of work, and is waiting for some stuff lower down in the stack to be finalised before this one is finalised. To that extent, yes, it is a draft. There's also some specific workarounds, see the commit list. Those will be resolved before the end.

However, we believe the structure is largely correct, even if some of the fine details are not. Review is still helpful here, because if there's an error in the fundamentals then the polishing won't matter.

amotin · 2024-06-10T17:54:35Z

last push switches from storing count of blocks to count of bytes, as discussed. Note that it still operates on whole blocks internally, it just changes what we store, so we could do something different next time.

@robn The on-disk log format is still based on blocks, since you do not allow records to cross block boundaries, so we won't be able to change the algorithm without introducing new DDT version. The benefit from using byte offsets I would see is ability to continue writing incomplete log block from previous TXGs, but as I see your current code always rounds the ddl_size to the next block boundary in ddt_log_commit(), while in ddt_log_begin() you always assign dlu_offset to zero. Would we try to change it later, we would become backward-incompatible, so it would be nice to allow it now, at least making it more flexible.

module/zfs/ddt_log.c

module/zfs/dsl_scan.c

module/zfs/ddt_log.c

module/zfs/ddt_stats.c

robn · 2024-06-14T01:40:35Z

[Fast dedup stack rebased to master c98295e]

module/zfs/ddt_log.c

robn · 2024-06-18T11:49:22Z

Alright, so we now have large blocks, and it looks good I think.

To clarify some of the above comments, by design, we never write across a block boundary, and we never write to the same block more than once. Not writing across block boundaries just makes the code easier (and not a lot of waste; records are small). Not writing a block more than once is to ensure the log is always write-only. A big part of the pain of dedup historically is needing to read in order to write, and while appending to an object is very different to updating a complex object like a ZAP, we set it as a design goal so we'd never have to deal with it again in the future.

I don't think there's any issue with changing the format in the future. For one thing, there's version and flag bits, so we can spot an old format and deal with it. For the other, the log can be disabled and flushed out without too much effort, so a format change can be done by emptying the log and then recreating it with the new thing enabled.

TODO from above:

byteswaps
ignore entries before checkpoint during load
scrub start txg

amotin · 2024-06-18T14:24:34Z

Not writing a block more than once is to ensure the log is always write-only.

My thinking was about a stream of very short transactions, that may produce a long chain of very small blocks, that may take longer to read on import due to required head seeks and overhead. But I guess in that case log should not grow too long, so it may be not a huge deal, though I haven't looked close on the log flushing math yet. I don't insist on this.

robn · 2024-06-20T06:53:56Z

Alright, big push today:

on-disk formats should now be endian-safe: all u64-based, and use the BF64_* macros to break it down. Untested on a big-endian machine (who has those?) but think this will be right.
when loading, just skip entries before the checkpoint, rather than loading them and then destorying them later. Cleaner and faster.
fix that "before checkpoint" assert
finalise object bs/ibs setup
fix scan cutoff txg, clean up the code a bit there

(If it helps to see the new changes, robn/fdt-rel-log-wip has the pre-squash commits).

Home stretch now. Pretty sure the last thing missing is some tests just to ensure any of this works - it obviously does by the numbers, but lets actually prove that now and forever.

include/sys/ddt_impl.h

module/zfs/ddt_stats.c

module/zfs/ddt.c

The upcoming dedup features break the long held assumption that all blocks on disk with a 'D' dedup bit will always be present in the DDT, or will have the same set of DVA allocations on disk as in the DDT. If the DDT is no longer a complete picture of all the dedup blocks that will be and should be on disk, then it does us no good to walk and prime it up front, since it won't necessarily match up with every block we'll see anyway. Instead, we rework things here to be more like the BRT checks. When we see a dedup'd block, we look it up in the DDT, consume a refcount, and for the second-or-later instances, count them as duplicates. The DDT and BRT are moved ahead of the space accounting. This will become important for the "flat" feature, which may need to count a modified version of the block. Co-authored-by: Allan Jude <allan@klarasystems.com> Co-authored-by: Don Brady <don.brady@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

The "flat phys" feature will use only a single phys slot for all entries, which means the old "single", "double" etc naming now makes no sense, and more importantly, means that choosing the right slot for a given block pointer will depend on how many slots are in use for a given DDT. This removes the old names, and adds accessor macros to decouple specific phys array indexes from any particular meaning. (These macros look strange in isolation, mainly in the way they take the ddt_t* as an arg but don't use it. This is mostly a separate commit to introduce the concept to the reader before the "flat phys" commit extends it). Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

The idea here is that sometimes you need the contents of an entry with no intent to modify it, and/or from a place where its difficult to get hold of its originating ddt_t to know how to interpret it. A lightweight entry contains everything you might need to "read" an entry - its key, type and phys contents - but none of the extras for modifying it or using it in a larger context. It also has the full complement of phys slots, so it can represent any kind of dedup entry without having to know the specific configuration of the table it came from. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

This slims down the in-memory entry to as small as it can be. The IO-related parts are made into a separate entry, since they're relatively rarely needed. The variable allocation for dde_phys is to support the upcoming flat format. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

Traditional dedup keeps a separate ddt_phys_t "type" for each possible count of DVAs (that is, copies=) parameter. Each of these are tracked independently of each other, and have their own set of DVAs. This leads to an (admittedly rare) situation where you can create as many as six copies of the data, by changing the copies= parameter between copying. This is both a waste of storage on disk, but also a waste of space in the stored DDT entries, since there never needs to be more than three DVAs to handle all possible values of copies=. This commit adds a new FDT feature, DDT_FLAG_FLAT. When active, only the first ddt_phys_t is used. Each time a block is written with the dedup bit set, this single phys is checked to see if it has enough DVAs to fulfill the request. If it does, the block is filled with the saved DVAs as normal. If not, an adjusted write is issued to create as many extra copies as are needed to fulfill the request, which are then saved into the entry too. Because a single phys is no longer an all-or-nothing, but can be transitioning from fewer to more DVAs, the write path now has to keep a copy of the previous "known good" DVA set so we can revert to it in case an error occurs. zio_ddt_write() has been restructured and heavily commented to make it much easier to see what's happening. Backwards compatibility is maintained simply by allocating four ddt_phys_t when the DDT_FLAG_FLAT flag is not set, and updating the phys selection macros to check the flag. In the old arrangement, each number of copies gets a whole phys, so it will always have either zero or all necessary DVAs filled, with no in-between, so the old behaviour naturally falls out of the new code. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Co-authored-by: Don Brady <don.brady@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

Both the API and the code were kinda mangled and I was really struggling to follow it. The worst offender was the old ddt_stat_add(); after fixing it up the rest of the changes are mostly knock-on effects and targets of opportunity. Note that the old ddt_stat_add() was safe against overflows - it could produce crazy numbers, but the compiler wouldn't do anything stupid. The assertions in ddt_stat_sub() go a lot of the way to protecting against this; getting in a position where overflows are a problem is definitely a programming error. Also expanding ddt_stat_add() and ddt_histogram_empty() produces less efficient assembly. I'm not bothered about this right now though; these should not be hot functions, and if they are we'll optimise them later. If we have to go back to the old form, we'll comment it like crazy. Finally, I've removed the assertion that the bucket will never be negative, as it will soon be possible to have entries with zero refcounts: an entry for a block that is no longer on the pool, but is on the log waiting to be synced out. It might be better to have a separate bucket for these, since they're still using real space on disk, but ultimately these stats are driving UI, and for now I've chosen to keep them matching how they've looked in the past, as well as match the operators mental model - pool usage is managed elsewhere. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

This yields substantial performance improvements when we only write out some small % of entries at a time, as it will cause entries that will go into "nearby" ZAP leaf nodes to be grouped closer together in the AVL, and so touch fewer blocks. Without this, the distribution is an even spread, so we touch a lot more ZAP leaf nodes for any given number of entries. Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

All objects stored in the MOS get copies=3. For a large dedup table, this requires significant extra IO and disk space, when its not really necessary - the dedup table itself isn't needed to read or write data, only to keep data usage down. Losing the dedup table does not render the pool unusable, it just messes up the accounting somewhat. This adds a dmu_ddt_copies tuneable. When set to 0, the existing behaviour is used. When set higher, dedup table blocks (ZAP and log) will have this many copies rather than the usual 3, while indirect blocks will have one more again. This is a tuneable for now mostly for testing. Losing a dedup table can cause blocks to be leaked, and we currently have no facilities to repair that. Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

Adds a log/journal to dedup. At the end of txg, instead of writing the entry directly to the ZAP, instead its adding to an in-memory tree and appended to an on-disk object. The on-disk object is only read at import, to reload the in-memory tree. Lookups first go the the log tree before going to the ZAP, so recently-used entries will remain close by in memory. This vastly reduces overhead from dedup IO, as it will not have to do so many read/update/write cycles on ZAP leaf nodes. A flushing facility is added at end of txg, to push logged entries out to the ZAP. There's actually two separate "logs" (in-memory tree and on-disk object), one active (recieving updated entries) and one flushing (writing out to disk). These are swapped (ie flushing begins) based on memory used by the in-memory log trees and time since we last flushed something. The flushing facility monitors the amount of entries coming in and being flushed out, and calibrates itself to try to flush enough each txg to keep up with the ingest rate without competing too much with other IO. Multiple tuneables are provided to control the flushing facility. All the histograms and stats are update to accomodate the log as a separate entry store. zdb gains knowledge of how to count them and dump them. Documentation included! Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

The dedup log does not have a stable cursor, so its not possible to persist our current scan location within it across pool reloads. Beccause of this, when walking (scanning), we can't treat it like just another source of dedup entries. Instead, when a scan is wanted, we switch to an aggressive flushing mode, pushing out entries older than the scan start txg as fast as we can, before starting the scan proper. Entries after the scan start txg will be handled via other methods; the DDT ZAPs and logs will be written as normal, and blocks not seen yet will be offered to the scan machinery as normal. Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

Adds per-DDT stats counting lookups and where they were serviced from (either log or backing zap), number of log entries in memory, and flow rates. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

Signed-off-by: Allan Jude <allan@klarasystems.com>

This yields substantial performance improvements when we only write out some small % of entries at a time, as it will cause entries that will go into "nearby" ZAP leaf nodes to be grouped closer together in the AVL, and so touch fewer blocks. Without this, the distribution is an even spread, so we touch a lot more ZAP leaf nodes for any given number of entries. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895

All objects stored in the MOS get copies=3. For a large dedup table, this requires significant extra IO and disk space, when its not really necessary - the dedup table itself isn't needed to read or write data, only to keep data usage down. Losing the dedup table does not render the pool unusable, it just messes up the accounting somewhat. This adds a dmu_ddt_copies tuneable. When set to 0, the existing behaviour is used. When set higher, dedup table blocks (ZAP and log) will have this many copies rather than the usual 3, while indirect blocks will have one more again. This is a tuneable for now mostly for testing. Losing a dedup table can cause blocks to be leaked, and we currently have no facilities to repair that. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895

Adds a log/journal to dedup. At the end of txg, instead of writing the entry directly to the ZAP, instead its adding to an in-memory tree and appended to an on-disk object. The on-disk object is only read at import, to reload the in-memory tree. Lookups first go the the log tree before going to the ZAP, so recently-used entries will remain close by in memory. This vastly reduces overhead from dedup IO, as it will not have to do so many read/update/write cycles on ZAP leaf nodes. A flushing facility is added at end of txg, to push logged entries out to the ZAP. There's actually two separate "logs" (in-memory tree and on-disk object), one active (recieving updated entries) and one flushing (writing out to disk). These are swapped (ie flushing begins) based on memory used by the in-memory log trees and time since we last flushed something. The flushing facility monitors the amount of entries coming in and being flushed out, and calibrates itself to try to flush enough each txg to keep up with the ingest rate without competing too much with other IO. Multiple tuneables are provided to control the flushing facility. All the histograms and stats are update to accomodate the log as a separate entry store. zdb gains knowledge of how to count them and dump them. Documentation included! Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895

The dedup log does not have a stable cursor, so its not possible to persist our current scan location within it across pool reloads. Beccause of this, when walking (scanning), we can't treat it like just another source of dedup entries. Instead, when a scan is wanted, we switch to an aggressive flushing mode, pushing out entries older than the scan start txg as fast as we can, before starting the scan proper. Entries after the scan start txg will be handled via other methods; the DDT ZAPs and logs will be written as normal, and blocks not seen yet will be offered to the scan machinery as normal. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895

Adds per-DDT stats counting lookups and where they were serviced from (either log or backing zap), number of log entries in memory, and flow rates. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15895

Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #15895

robn · 2024-08-17T13:07:23Z

Thanks all!

Both the API and the code were kinda mangled and I was really struggling to follow it. The worst offender was the old ddt_stat_add(); after fixing it up the rest of the changes are mostly knock-on effects and targets of opportunity. Note that the old ddt_stat_add() was safe against overflows - it could produce crazy numbers, but the compiler wouldn't do anything stupid. The assertions in ddt_stat_sub() go a lot of the way to protecting against this; getting in a position where overflows are a problem is definitely a programming error. Also expanding ddt_stat_add() and ddt_histogram_empty() produces less efficient assembly. I'm not bothered about this right now though; these should not be hot functions, and if they are we'll optimise them later. If we have to go back to the old form, we'll comment it like crazy. Finally, I've removed the assertion that the bucket will never be negative, as it will soon be possible to have entries with zero refcounts: an entry for a block that is no longer on the pool, but is on the log waiting to be synced out. It might be better to have a separate bucket for these, since they're still using real space on disk, but ultimately these stats are driving UI, and for now I've chosen to keep them matching how they've looked in the past, as well as match the operators mental model - pool usage is managed elsewhere. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes openzfs#15895

This yields substantial performance improvements when we only write out some small % of entries at a time, as it will cause entries that will go into "nearby" ZAP leaf nodes to be grouped closer together in the AVL, and so touch fewer blocks. Without this, the distribution is an even spread, so we touch a lot more ZAP leaf nodes for any given number of entries. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes openzfs#15895

All objects stored in the MOS get copies=3. For a large dedup table, this requires significant extra IO and disk space, when its not really necessary - the dedup table itself isn't needed to read or write data, only to keep data usage down. Losing the dedup table does not render the pool unusable, it just messes up the accounting somewhat. This adds a dmu_ddt_copies tuneable. When set to 0, the existing behaviour is used. When set higher, dedup table blocks (ZAP and log) will have this many copies rather than the usual 3, while indirect blocks will have one more again. This is a tuneable for now mostly for testing. Losing a dedup table can cause blocks to be leaked, and we currently have no facilities to repair that. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes openzfs#15895

Adds a log/journal to dedup. At the end of txg, instead of writing the entry directly to the ZAP, instead its adding to an in-memory tree and appended to an on-disk object. The on-disk object is only read at import, to reload the in-memory tree. Lookups first go the the log tree before going to the ZAP, so recently-used entries will remain close by in memory. This vastly reduces overhead from dedup IO, as it will not have to do so many read/update/write cycles on ZAP leaf nodes. A flushing facility is added at end of txg, to push logged entries out to the ZAP. There's actually two separate "logs" (in-memory tree and on-disk object), one active (recieving updated entries) and one flushing (writing out to disk). These are swapped (ie flushing begins) based on memory used by the in-memory log trees and time since we last flushed something. The flushing facility monitors the amount of entries coming in and being flushed out, and calibrates itself to try to flush enough each txg to keep up with the ingest rate without competing too much with other IO. Multiple tuneables are provided to control the flushing facility. All the histograms and stats are update to accomodate the log as a separate entry store. zdb gains knowledge of how to count them and dump them. Documentation included! Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes openzfs#15895

The dedup log does not have a stable cursor, so its not possible to persist our current scan location within it across pool reloads. Beccause of this, when walking (scanning), we can't treat it like just another source of dedup entries. Instead, when a scan is wanted, we switch to an aggressive flushing mode, pushing out entries older than the scan start txg as fast as we can, before starting the scan proper. Entries after the scan start txg will be handled via other methods; the DDT ZAPs and logs will be written as normal, and blocks not seen yet will be offered to the scan machinery as normal. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes openzfs#15895

Adds per-DDT stats counting lookups and where they were serviced from (either log or backing zap), number of log entries in memory, and flow rates. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes openzfs#15895

Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes openzfs#15895

behlendorf added the Status: Code Review Needed Ready for review and testing label Feb 15, 2024

robn force-pushed the fdt-rel-log branch from 94f3e97 to d29f710 Compare February 15, 2024 19:58

adamdmoss reviewed May 14, 2024

View reviewed changes

cmd/zdb/zdb.c Outdated Show resolved Hide resolved

adamdmoss reviewed May 14, 2024

View reviewed changes

robn force-pushed the fdt-rel-log branch from d29f710 to 25db08d Compare May 15, 2024 03:39

robn force-pushed the fdt-rel-log branch 2 times, most recently from 8073d64 to 8301065 Compare May 21, 2024 05:00

robn force-pushed the fdt-rel-log branch from 8301065 to 6a9bed1 Compare June 7, 2024 01:16

amotin reviewed Jun 10, 2024

View reviewed changes

amotin reviewed Jun 11, 2024

View reviewed changes

module/zfs/ddt_stats.c Show resolved Hide resolved

robn force-pushed the fdt-rel-log branch from 6a9bed1 to 85922b8 Compare June 14, 2024 01:35

robn force-pushed the fdt-rel-log branch 5 times, most recently from 6fb36a9 to 0f84746 Compare June 18, 2024 11:02

github-advanced-security bot found potential problems Jun 18, 2024

View reviewed changes

module/zfs/ddt_log.c Fixed Show resolved Hide resolved

robn force-pushed the fdt-rel-log branch from 0f84746 to df52660 Compare June 18, 2024 11:38

robn force-pushed the fdt-rel-log branch from df52660 to a1746a6 Compare June 20, 2024 06:39

amotin reviewed Jun 20, 2024

View reviewed changes

include/sys/ddt_impl.h Outdated Show resolved Hide resolved

include/sys/ddt_impl.h Outdated Show resolved Hide resolved

module/zfs/ddt_stats.c Show resolved Hide resolved

module/zfs/ddt.c Outdated Show resolved Hide resolved

module/zfs/ddt.c Outdated Show resolved Hide resolved

robn force-pushed the fdt-rel-log branch 2 times, most recently from 1a289e1 to b56fc1f Compare June 21, 2024 05:36

robn and others added 8 commits August 16, 2024 09:59

robn force-pushed the fdt-rel-log branch from c3930e2 to b65f86a Compare August 16, 2024 00:00

robn and others added 4 commits August 16, 2024 15:31

ddt: lookup and log stats

26d5069

Adds per-DDT stats counting lookups and where they were serviced from (either log or backing zap), number of log entries in memory, and flow rates. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc.

Man page updates for dmu_ddt_copies

bc877fe

Signed-off-by: Allan Jude <allan@klarasystems.com>

robn force-pushed the fdt-rel-log branch from b65f86a to bc877fe Compare August 16, 2024 05:31

behlendorf closed this in 27e9cb5 Aug 16, 2024

behlendorf pushed a commit that referenced this pull request Aug 16, 2024

Man page updates for dmu_ddt_copies

a60e15d

Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #15895

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast Dedup: FDT-log feature #15895

Fast Dedup: FDT-log feature #15895

allanjude commented Feb 14, 2024

adamdmoss left a comment

robn commented May 15, 2024

robn commented Jun 7, 2024

robn commented Jun 7, 2024

amotin commented Jun 10, 2024 •

edited

Loading

robn commented Jun 14, 2024

robn commented Jun 18, 2024 •

edited

Loading

amotin commented Jun 18, 2024

robn commented Jun 20, 2024

robn commented Aug 17, 2024

Fast Dedup: FDT-log feature #15895

Fast Dedup: FDT-log feature #15895

Conversation

allanjude commented Feb 14, 2024

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

adamdmoss left a comment

Choose a reason for hiding this comment

robn commented May 15, 2024

robn commented Jun 7, 2024

robn commented Jun 7, 2024

amotin commented Jun 10, 2024 • edited Loading

robn commented Jun 14, 2024

robn commented Jun 18, 2024 • edited Loading

amotin commented Jun 18, 2024

robn commented Jun 20, 2024

robn commented Aug 17, 2024

amotin commented Jun 10, 2024 •

edited

Loading

robn commented Jun 18, 2024 •

edited

Loading