Scrub "issued" value skyrockets #8800

DeHackEd · 2019-05-23T14:22:09Z

(I swear there should be an issue for this already, but I couldn't find it)

System information

Type	Version/Name
Distribution Name	CentOS
Distribution Version	7.6.1810
Linux Kernel	3.10.0-957.12.2.el7.x86_64
Architecture	x86_64
ZFS Version	0.8.0-rc5_2_g `9dc41a7`
SPL Version	(included)

Describe the problem you're observing

  pool: whoopass4
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: scrub in progress since Thu May 23 10:05:14 2019
	460G scanned at 1.31G/s, 1.11P issued at 3.23T/s, 32.7T total
	0B repaired, 3474.55% done, no estimated completion time
config:

	NAME            STATE     READ WRITE CKSUM
	whoopass4       ONLINE       0     0     0
	  md100         ONLINE       0     0     0
	  md101         ONLINE       0     0     0
	cache
	  centos-l2arc  ONLINE       0     0     0

errors: No known data errors

The pool size of 32.7T is correct, but 1.11P for the amount of scan issued is clearly incorrect. Similarly the 3,475% completion

Describe how to reproduce the problem

I just started a scrub and it happened as soon as I checked the status.

Other pool info:

NAME       PROPERTY                       VALUE                          SOURCE
whoopass4  size                           54.4T                          -
whoopass4  capacity                       60%                            -
whoopass4  altroot                        -                              default
whoopass4  health                         ONLINE                         -
whoopass4  guid                           8364009432977399989            -
whoopass4  version                        -                              default
whoopass4  bootfs                         -                              default
whoopass4  delegation                     on                             default
whoopass4  autoreplace                    off                            default
whoopass4  cachefile                      none                           local
whoopass4  failmode                       wait                           default
whoopass4  listsnapshots                  off                            default
whoopass4  autoexpand                     off                            default
whoopass4  dedupditto                     0                              default
whoopass4  dedupratio                     1.00x                          -
whoopass4  free                           21.7T                          -
whoopass4  allocated                      32.7T                          -
whoopass4  readonly                       off                            -
whoopass4  ashift                         0                              default
whoopass4  comment                        -                              default
whoopass4  expandsize                     -                              -
whoopass4  freeing                        0                              -
whoopass4  fragmentation                  38%                            -
whoopass4  leaked                         0                              -
whoopass4  multihost                      off                            default
whoopass4  checkpoint                     -                              -
whoopass4  load_guid                      9063026787472695142            -
whoopass4  autotrim                       off                            default
whoopass4  feature@async_destroy          enabled                        local
whoopass4  feature@empty_bpobj            active                         local
whoopass4  feature@lz4_compress           active                         local
whoopass4  feature@multi_vdev_crash_dump  enabled                        local
whoopass4  feature@spacemap_histogram     active                         local
whoopass4  feature@enabled_txg            active                         local
whoopass4  feature@hole_birth             active                         local
whoopass4  feature@extensible_dataset     active                         local
whoopass4  feature@embedded_data          active                         local
whoopass4  feature@bookmarks              enabled                        local
whoopass4  feature@filesystem_limits      enabled                        local
whoopass4  feature@large_blocks           active                         local
whoopass4  feature@large_dnode            enabled                        local
whoopass4  feature@sha512                 enabled                        local
whoopass4  feature@skein                  enabled                        local
whoopass4  feature@edonr                  enabled                        local
whoopass4  feature@userobj_accounting     active                         local
whoopass4  feature@encryption             enabled                        local
whoopass4  feature@project_quota          active                         local
whoopass4  feature@device_removal         enabled                        local
whoopass4  feature@obsolete_counts        enabled                        local
whoopass4  feature@zpool_checkpoint       disabled                       local
whoopass4  feature@spacemap_v2            disabled                       local
whoopass4  feature@allocation_classes     disabled                       local
whoopass4  feature@resilver_defer         disabled                       local
whoopass4  feature@bookmark_v2            enabled                        local

Pool previously had encrypted datasets which were all destroyed for errata 4, but not restored yet.

The md100 and md101 devices are mdadm RAID-6 of 12 disks each, though of mismatched sizes. I'm accepting the performance hit. The arrays appear healthy.

Include any warning/errors/backtraces from the system logs

Import was slow enough to set off the "blocked for 120s" alarms, but for these drives and the enclosures they are in this isn't a complete surprise.

The text was updated successfully, but these errors were encountered:

loli10K · 2019-05-23T16:14:38Z

Supposedly fixed by #8766

DeHackEd · 2019-05-23T17:09:11Z

Supposedly, but this is running rc5 and it still does it.

DeHackEd · 2019-05-24T23:54:48Z

 pool: whoopass4
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: scrub in progress since Thu May 23 10:05:14 2019
	29.8T scanned at 274M/s, 53.6P issued at 493G/s, 32.7T total
	0B repaired, 168004.70% done, no estimated completion time
config:

	NAME            STATE     READ WRITE CKSUM
	whoopass4       ONLINE       0     0     0
	  md100         ONLINE       0     0     0
	  md101         ONLINE       0     0     0
	cache
	  centos-l2arc  ONLINE       0     0     0

errors: No known data errors

(Same scrub)

behlendorf · 2019-05-24T23:56:59Z

@DeHackEd was that with the proposed fix in #8766 applied?

(Same scrub)

[edit] You'll probably need to restart the scrub to verify the proposed fix since the wildly incorrect "issued" value is stored on disk.

DeHackEd · 2019-05-25T00:50:23Z

Okay, applied, rebuilt, and looking much more sane.

  pool: whoopass4
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: scrub in progress since Fri May 24 20:31:48 2019
	1.36T scanned at 1.30G/s, 104G issued at 100M/s, 32.5T total
	0B repaired, 0.31% done, 3 days 22:14:19 to go
config:

	NAME            STATE     READ WRITE CKSUM
	whoopass4       ONLINE       0     0     0
	  md100         ONLINE       0     0     0
	  md101         ONLINE       0     0     0
	cache
	  centos-l2arc  ONLINE       0     0     0

errors: No known data errors

(I don't know why, but I thought this was an older issue and the fixed merged already)

Currently, count_block() does not correctly account for the possibility that the bp that is passed to it could be embedded. These blocks shouldn't be counted since the work of scanning these blocks in already handled when the containing block is scanned. This patch simply resolves this issue by returning early in this case. Reviewed by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Authored-by: Bill Sommerfeld <sommerfeld@alum.mit.edu> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #8800 Closes #8766

Currently, count_block() does not correctly account for the possibility that the bp that is passed to it could be embedded. These blocks shouldn't be counted since the work of scanning these blocks in already handled when the containing block is scanned. This patch simply resolves this issue by returning early in this case. Reviewed by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Authored-by: Bill Sommerfeld <sommerfeld@alum.mit.edu> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes openzfs#8800 Closes openzfs#8766

behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label May 24, 2019

behlendorf closed this as completed in 3b61ca3 May 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrub "issued" value skyrockets #8800

Scrub "issued" value skyrockets #8800

DeHackEd commented May 23, 2019

loli10K commented May 23, 2019

DeHackEd commented May 23, 2019

DeHackEd commented May 24, 2019

behlendorf commented May 24, 2019 •

edited

Loading

DeHackEd commented May 25, 2019

Scrub "issued" value skyrockets #8800

Scrub "issued" value skyrockets #8800

Comments

DeHackEd commented May 23, 2019

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

loli10K commented May 23, 2019

DeHackEd commented May 23, 2019

DeHackEd commented May 24, 2019

behlendorf commented May 24, 2019 • edited Loading

DeHackEd commented May 25, 2019

behlendorf commented May 24, 2019 •

edited

Loading