raidz expansion >100% done #16803

VictorDrijkoningen · 2024-11-22T21:41:44Z

System information

Type	Version/Name
Distribution Name	TrueNAS scale
Distribution Version	Electric Eel 24.10.0.2
Kernel Version	6.6.44-production+truenas
Architecture	x86 (Ryzen CPU)
OpenZFS Version	zfs-2.2.99-1 / zfs-kmod-2.2.99-1

Describe the problem you're observing

After raidz expansion, zpool status is more than 100% done

Describe how to reproduce the problem

Have a drive failure in between an expansion, with an full (>80%) pool. Then do zpool status when the expansion is almost done. (this is what happened, but i'm not sure if a drive failure has anything to with it.)

Include any warning/errors/backtraces from the system logs

see lines with 100.25% and 100.37%

  pool: pool1
 state: ONLINE
  scan: resilvered 1.36T in 10:24:49 with 0 errors on Thu Nov 21 02:51:56 2024
expand: expansion of raidz2-0 in progress since Wed Nov 13 14:35:36 2024
        12.8T / 12.8T copied at 16.6M/s, 100.25% done, (copy is slow, no estimated time)
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool1                                     ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            7b808470-cc0c-4d90-abfd-2388d707669c  ONLINE       0     0     0
            b0102604-e309-4777-9f50-19df82a410eb  ONLINE       0     0     0
            158f185c-68d5-4f02-8f44-99f9a8787bab  ONLINE       0     0     0
            4e971f0a-9a7b-493c-8276-81f180d10a8d  ONLINE       0     0     0
            79154809-98f9-4100-9a7c-e04aa24cef70  ONLINE       0     0     0

errors: No known data errors

user@machine~ % sudo zpool status pool1
  pool: pool1
 state: ONLINE
  scan: resilvered 1.36T in 10:24:49 with 0 errors on Thu Nov 21 02:51:56 2024
expand: expansion of raidz2-0 in progress since Wed Nov 13 14:35:36 2024
        12.9T / 12.8T copied at 16.7M/s, 100.98% done, (copy is slow, no estimated time)
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool1                                     ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            7b808470-cc0c-4d90-abfd-2388d707669c  ONLINE       0     0     0
            b0102604-e309-4777-9f50-19df82a410eb  ONLINE       0     0     0
            158f185c-68d5-4f02-8f44-99f9a8787bab  ONLINE       0     0     0
            4e971f0a-9a7b-493c-8276-81f180d10a8d  ONLINE       0     0     0
            79154809-98f9-4100-9a7c-e04aa24cef70  ONLINE       0     0     0

errors: No known data errors

The text was updated successfully, but these errors were encountered:

amotin · 2024-11-30T03:54:56Z

I'll take a look on that code tomorrow, but my first guess is that unlike scrub, removal, etc the expansion allows vdev writes during the process (just into different metaslabs than the currently processed), so unless it is handled explicitly in the code I guess it might be a correct behavior if it finally ended.

amotin · 2024-11-30T22:45:12Z

As I expected, it can be reproduced if you delete something big from a part of the vdev that was already expanded. zpool status always reports as total amount of data that needs an expansion present vdev's allocated size. So on delete the allocated size will decrease, while copied size will not, since copy was actually already done. It could be solved by more complicated accounting, if somebody care, but I personally don't. It is a purely cosmetic issue:

expand: expansion of raidz1-0 in progress since Sat Nov 30 22:25:54 2024
        235G / 264G copied at 495M/s, 89.06% done, 00:00:59 to go

expand: expansion of raidz1-0 in progress since Sat Nov 30 22:25:54 2024
        259G / 198G copied at 488M/s, 130.89% done, (copy is slow, no estimated time)

behlendorf · 2024-12-03T01:04:35Z

A similar thing can also happen with resilvering when the used capacity changes significantly. I agree Alexander, this is purely a cosmetic issue and doesn't seem worth the added complexity.

VictorDrijkoningen added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 22, 2024

snajpa mentioned this issue Nov 29, 2024

Optimize RAIDZ expansion #16819

Merged

13 tasks

amotin added the Status: Understood The root cause of the issue is known label Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raidz expansion >100% done #16803

raidz expansion >100% done #16803

VictorDrijkoningen commented Nov 22, 2024 •

edited

Loading

amotin commented Nov 30, 2024

amotin commented Nov 30, 2024 •

edited

Loading

behlendorf commented Dec 3, 2024

raidz expansion >100% done #16803

raidz expansion >100% done #16803

Comments

VictorDrijkoningen commented Nov 22, 2024 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

amotin commented Nov 30, 2024

amotin commented Nov 30, 2024 • edited Loading

behlendorf commented Dec 3, 2024

VictorDrijkoningen commented Nov 22, 2024 •

edited

Loading

amotin commented Nov 30, 2024 •

edited

Loading