Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raidz expansion >100% done #16803

Open
VictorDrijkoningen opened this issue Nov 22, 2024 · 3 comments
Open

raidz expansion >100% done #16803

VictorDrijkoningen opened this issue Nov 22, 2024 · 3 comments
Labels
Status: Understood The root cause of the issue is known Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@VictorDrijkoningen
Copy link

VictorDrijkoningen commented Nov 22, 2024

System information

Type Version/Name
Distribution Name TrueNAS scale
Distribution Version Electric Eel 24.10.0.2
Kernel Version 6.6.44-production+truenas
Architecture x86 (Ryzen CPU)
OpenZFS Version zfs-2.2.99-1 / zfs-kmod-2.2.99-1

Describe the problem you're observing

After raidz expansion, zpool status is more than 100% done

Describe how to reproduce the problem

Have a drive failure in between an expansion, with an full (>80%) pool. Then do zpool status when the expansion is almost done. (this is what happened, but i'm not sure if a drive failure has anything to with it.)

Include any warning/errors/backtraces from the system logs

see lines with 100.25% and 100.37%

  pool: pool1
 state: ONLINE
  scan: resilvered 1.36T in 10:24:49 with 0 errors on Thu Nov 21 02:51:56 2024
expand: expansion of raidz2-0 in progress since Wed Nov 13 14:35:36 2024
        12.8T / 12.8T copied at 16.6M/s, 100.25% done, (copy is slow, no estimated time)
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool1                                     ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            7b808470-cc0c-4d90-abfd-2388d707669c  ONLINE       0     0     0
            b0102604-e309-4777-9f50-19df82a410eb  ONLINE       0     0     0
            158f185c-68d5-4f02-8f44-99f9a8787bab  ONLINE       0     0     0
            4e971f0a-9a7b-493c-8276-81f180d10a8d  ONLINE       0     0     0
            79154809-98f9-4100-9a7c-e04aa24cef70  ONLINE       0     0     0

errors: No known data errors

user@machine~ % sudo zpool status pool1
  pool: pool1
 state: ONLINE
  scan: resilvered 1.36T in 10:24:49 with 0 errors on Thu Nov 21 02:51:56 2024
expand: expansion of raidz2-0 in progress since Wed Nov 13 14:35:36 2024
        12.9T / 12.8T copied at 16.7M/s, 100.98% done, (copy is slow, no estimated time)
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool1                                     ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            7b808470-cc0c-4d90-abfd-2388d707669c  ONLINE       0     0     0
            b0102604-e309-4777-9f50-19df82a410eb  ONLINE       0     0     0
            158f185c-68d5-4f02-8f44-99f9a8787bab  ONLINE       0     0     0
            4e971f0a-9a7b-493c-8276-81f180d10a8d  ONLINE       0     0     0
            79154809-98f9-4100-9a7c-e04aa24cef70  ONLINE       0     0     0

errors: No known data errors
@VictorDrijkoningen VictorDrijkoningen added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 22, 2024
@snajpa snajpa mentioned this issue Nov 29, 2024
13 tasks
@amotin
Copy link
Member

amotin commented Nov 30, 2024

I'll take a look on that code tomorrow, but my first guess is that unlike scrub, removal, etc the expansion allows vdev writes during the process (just into different metaslabs than the currently processed), so unless it is handled explicitly in the code I guess it might be a correct behavior if it finally ended.

@amotin
Copy link
Member

amotin commented Nov 30, 2024

As I expected, it can be reproduced if you delete something big from a part of the vdev that was already expanded. zpool status always reports as total amount of data that needs an expansion present vdev's allocated size. So on delete the allocated size will decrease, while copied size will not, since copy was actually already done. It could be solved by more complicated accounting, if somebody care, but I personally don't. It is a purely cosmetic issue:

expand: expansion of raidz1-0 in progress since Sat Nov 30 22:25:54 2024
        235G / 264G copied at 495M/s, 89.06% done, 00:00:59 to go
expand: expansion of raidz1-0 in progress since Sat Nov 30 22:25:54 2024
        259G / 198G copied at 488M/s, 130.89% done, (copy is slow, no estimated time)

@amotin amotin added the Status: Understood The root cause of the issue is known label Nov 30, 2024
@behlendorf
Copy link
Contributor

A similar thing can also happen with resilvering when the used capacity changes significantly. I agree Alexander, this is purely a cosmetic issue and doesn't seem worth the added complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Understood The root cause of the issue is known Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants