Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS intermittently locks up during the tar / untar sycle #2752

Closed
alitvak69 opened this issue Oct 1, 2014 · 32 comments
Closed

ZFS intermittently locks up during the tar / untar sycle #2752

alitvak69 opened this issue Oct 1, 2014 · 32 comments
Labels
Type: Performance Performance improvement or performance problem

Comments

@alitvak69
Copy link

I have recently upgraded to ZOL release 0.6.3 on SL 6.5. We also added a new Dell PV MD1200 with 12 x 2 TB SAS drives connected to Dell Perc H800 LSI Megaraid based controller.

I created 12 RAID0 disks and and then assigned to a raidz2 in a single pool u3
I originally had another pool u2 with a separate Dell H700 controller and similar disks but in the server chassis.

NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
u2    8.92T  6.80T  2.13T    76%  1.00x  ONLINE  -
u3    18.1T  14.0G  18.1T     0%  1.00x  ONLINE  -

pool: u2
 state: ONLINE
  scan: scrub repaired 0 in 12h20m with 0 errors on Sat Feb 15 19:46:48 2014
config:

    NAME        STATE     READ WRITE CKSUM
    u2          ONLINE       0     0     0
      sda7      ONLINE       0     0     0
      sdb1      ONLINE       0     0     0

errors: No known data errors

  pool: u3
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    u3          ONLINE       0     0     0
      sdc       ONLINE       0     0     0
         sdd       ONLINE       0     0     0
        .....
        sdn       ONLINE       0     0     0

errors: No known data errors

I tried various kernels: 2.6.32-431.29.2, 3.10.55, 3.16.3 and the issue is still the same.

On some runs of tar / untar copy between volumes:
(example)
tar cf - /u2/logs/old -P | pv -s $(du -sb /u2/logs/old | awk '{print $1}') | tar xf - -C 3

I see that throughput drops to 0 bytes and stays there for extended period of time, more then 30 seconds,
At that time dstat shows some of the block devices on u3 busy 100% and others have 0% utilization.
zpool iostat -v shows 0 read or write activity on u3. On u2 which is older volume I see some activity but it is not related to the current operation. There is no load increase and when i ran strace it would show both tar cf and tar xf are waiting.
I cannot kill the operation unless I kill all underlying shells (no reaction to abort otherwise)

When I run dd against any block device of zpool u3 (example below), operation resumes immediately.

dd if=/dev/sdm of=/dev/zero bs=4k count=10

I see no disk errors, controller log or system logs show nothing consistently happening during this lockups. I ran a long smart self-test against all of the disks on MD1200 and they showed no errors (disks are healthy)

At this point I am not sure how to diagnose the issue. Can any of you help?

Thanks,

@alitvak69
Copy link
Author

I guess I posted the wrong question but in hope anyone actually read it here is and additional information

2014-10-05T00:56:38.790439-07:00 logserver1-chi kernel: INFO: task tar:491 blocked for more than 120 seconds.
2014-10-05T00:56:38.790451-07:00 logserver1-chi kernel:      Tainted: P           O  3.16.3-1.el6.elrepo.x86_64 #1
2014-10-05T00:56:38.790454-07:00 logserver1-chi kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2014-10-05T00:56:38.790456-07:00 logserver1-chi kernel: tar             D 0000000000000004     0   491  28473 0x00000000
2014-10-05T00:56:38.790457-07:00 logserver1-chi kernel: ffff8803a9957b78 0000000000000082 ffff88032ccaba70 ffff8803a9954010
2014-10-05T00:56:38.790458-07:00 logserver1-chi kernel: 0000000000014400 0000000000014400 ffff880421d12150 ffff8804253a2fb0
2014-10-05T00:56:38.790459-07:00 logserver1-chi kernel: ffff8803a9957b78 ffff88042fc54400 ffff880421d12150 0000000000000002
2014-10-05T00:56:38.790461-07:00 logserver1-chi kernel: Call Trace:
2014-10-05T00:56:38.790462-07:00 logserver1-chi kernel: [<ffffffff81648219>] schedule+0x29/0x70
2014-10-05T00:56:38.790463-07:00 logserver1-chi kernel: [<ffffffff816482ec>] io_schedule+0x8c/0xd0
2014-10-05T00:56:38.790464-07:00 logserver1-chi kernel: [<ffffffffa01ca48c>] cv_wait_common+0xac/0x1b0 [spl]
2014-10-05T00:56:38.790466-07:00 logserver1-chi kernel: [<ffffffff810b6230>] ? bit_waitqueue+0xe0/0xe0
2014-10-05T00:56:38.790467-07:00 logserver1-chi kernel: [<ffffffffa01ca5a8>] __cv_wait_io+0x18/0x20 [spl]
2014-10-05T00:56:38.790468-07:00 logserver1-chi kernel: [<ffffffffa03496cb>] zio_wait+0xfb/0x1b0 [zfs]
2014-10-05T00:56:38.790469-07:00 logserver1-chi kernel: [<ffffffffa02b8297>] dmu_buf_hold_array_by_dnode+0x2d7/0x540 [zfs]
2014-10-05T00:56:38.790470-07:00 logserver1-chi kernel: [<ffffffffa02b8785>] dmu_buf_hold_array+0x65/0x90 [zfs]
2014-10-05T00:56:38.790471-07:00 logserver1-chi kernel: [<ffffffffa02b87f1>] dmu_read_uio+0x41/0xd0 [zfs]
2014-10-05T00:56:38.790472-07:00 logserver1-chi kernel: [<ffffffffa033181b>] ? zfs_range_lock+0xdb/0x110 [zfs]
2014-10-05T00:56:38.790473-07:00 logserver1-chi kernel: [<ffffffffa033c42a>] zfs_read+0x15a/0x350 [zfs]
2014-10-05T00:56:38.790474-07:00 logserver1-chi kernel: [<ffffffffa0351672>] zpl_read_common+0x52/0x80 [zfs]
2014-10-05T00:56:38.790475-07:00 logserver1-chi kernel: [<ffffffffa0351708>] zpl_read+0x68/0xa0 [zfs]
2014-10-05T00:56:38.790477-07:00 logserver1-chi kernel: [<ffffffff811cdae3>] vfs_read+0xa3/0x110
2014-10-05T00:56:38.790478-07:00 logserver1-chi kernel: [<ffffffff811ea673>] ? __fdget+0x13/0x20
2014-10-05T00:56:38.790479-07:00 logserver1-chi kernel: [<ffffffff811ce096>] SyS_read+0x56/0xd0
2014-10-05T00:56:38.790480-07:00 logserver1-chi kernel: [<ffffffff8164bae9>] system_call_fastpath+0x16/0x1b

And then

2014-10-05T00:57:39.333427-07:00 logserver1-chi kernel: sd 1:2:4:0: [sdg] megasas: RESET cmd=28 retries=0
2014-10-05T00:57:39.333438-07:00 logserver1-chi kernel: megasas: [ 0]waiting for 3 commands to complete
2014-10-05T00:57:40.334431-07:00 logserver1-chi kernel: megaraid_sas: no pending cmds after reset
2014-10-05T00:57:40.334442-07:00 logserver1-chi kernel: megasas: reset successful 
2014-10-05T00:57:40.334444-07:00 logserver1-chi kernel: sd 1:2:9:0: [sdl] megasas: RESET cmd=28 retries=0
2014-10-05T00:57:40.334446-07:00 logserver1-chi kernel: megaraid_sas: no pending cmds after reset

So it looks like in 60 seconds drive was reset by kernel

Still I am not sure is it a zfs bug or there something else in play.

@alitvak69
Copy link
Author

Looks like building from git makes not much of a difference.

@behlendorf
Copy link
Contributor

@alitvak69 based on the stack you've provided this sounds like an issue with your hardware. They show that zfs is waiting for a physical IO to the disk to either complete / fail / timeout. This is all managed by the lower level block layer in Linux, the device driver, and the actual hardware. Until one of these three things occurs there's not much zfs can do aside from wait.

There is a proposed patch in #2387 which would allow us to timeout more quickly but that would just result in the offending drive being marked as failed. This may not be what we you depending on the situation.

@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Oct 6, 2014
@kernelOfTruth
Copy link
Contributor

you have an option of adding an L2ARC device ?

even on a mirrored zpool on WD30EFRX (2x) git is really really slow when working on the kernel for me

after having added the l2arc it's rather fast & comparable to e.g. ext4, btrfs

@alitvak69
Copy link
Author

Thank you for your response. I ran a long test on all of the disks using
smart (~ 5 hours test). All devices passed, i.e. showed "Health OK" and 0
errors that were not corrected.
I don't see any errors in the controller log I can jump on. We will be
re-checking all connections and re-seating drives, controllers, and cables
today. All firmware is already upgraded. After that I will be opening
case with Dell. As I hear from them I will post my report here.

On Mon, Oct 6, 2014 at 12:48 PM, kpande notifications@github.com wrote:

L2ARC shouldn't be necessary here, this seems like a hardware fault. ZFS
without L2ARC on a single vdev with two 2TB SATA (WD Blacks) in
ashift=12 is already comparable to ext4 (faster, actually)


Reply to this email directly or view it on GitHub
#2752 (comment).

@behlendorf
Copy link
Contributor

@alitvak69 it's been noted by others that ZFS is capable of driving the hardware more aggressively that other filesystems. This can cause issues with borderline hardware.

@alitvak69
Copy link
Author

I don't have space for L2ARC device. I wanted to add two SSDs but both
server chassis and MD1200
are fully populated. As far as ashift=12. I wonderr if it makes sense.
All my 2 TB are 512 and not 4k

On Mon, Oct 6, 2014 at 1:20 PM, Sasha Litvak alexander.v.litvak@gmail.com
wrote:

Thank you for your response. I ran a long test on all of the disks using
smart (~ 5 hours test). All devices passed, i.e. showed "Health OK" and 0
errors that were not corrected.
I don't see any errors in the controller log I can jump on. We will be
re-checking all connections and re-seating drives, controllers, and cables
today. All firmware is already upgraded. After that I will be opening
case with Dell. As I hear from them I will post my report here.

On Mon, Oct 6, 2014 at 12:48 PM, kpande notifications@github.com wrote:

L2ARC shouldn't be necessary here, this seems like a hardware fault. ZFS
without L2ARC on a single vdev with two 2TB SATA (WD Blacks) in
ashift=12 is already comparable to ext4 (faster, actually)


Reply to this email directly or view it on GitHub
#2752 (comment).

@alitvak69
Copy link
Author

I went through series of the sessions with Dell but it it doesn't seem to be a faulty hardware issue.
I wonder if the thread below would describe similar issue some people have on FreeBSD. From what I gather, it seems to be a interrupts related issue and the patch was applied to compensate for it

If you happened to go through thread below you will see an explanation of what this patch does. It seem that similar thing happens with Linux

http://lists.freebsd.org/pipermail/freebsd-scsi/2011-March/004832.html

@behlendorf
Copy link
Contributor

@alitvak69 That's entirely possible. Due to various hardware quirks the SCSI subsystem contains many of these kind of workarounds. Since the fix for this would need to happen below the ZFS layer my suggestion would be to try a newer kernel with an updated driver and see if it works better.

@umask
Copy link

umask commented Dec 6, 2014

I have the same issue on my hardware.

It's looks like

[25108.622770] sd 0:2:2:0: [sdc] megasas: RESET -7968430 cmd=28 retries=0
[25108.623345] megasas: [ 0]waiting for 1 commands to complete
[25109.623501] megaraid_sas: no pending cmds after reset
[25109.624075] megasas: reset successful 
[25295.577599] sd 0:2:0:0: [sda] megasas: RESET -7971675 cmd=28 retries=0
[25295.578179] megasas: [ 0]waiting for 5 commands to complete
[25296.579283] megaraid_sas: no pending cmds after reset
[25296.579856] megasas: reset successful 
[25296.580418] sd 0:2:1:0: [sdb] megasas: RESET -7971674 cmd=28 retries=0
[25296.580954] megaraid_sas: no pending cmds after reset
[25296.581512] megasas: reset successful 
[25296.582064] sd 0:2:2:0: [sdc] megasas: RESET -7971673 cmd=28 retries=0
[25296.582605] megaraid_sas: no pending cmds after reset
[25296.583158] megasas: reset successful 
[25296.583712] sd 0:2:6:0: [sdg] megasas: RESET -7971677 cmd=28 retries=0
[25296.584279] megaraid_sas: no pending cmds after reset
[25296.584823] megasas: reset successful 
[25296.585383] sd 0:2:7:0: [sdh] megasas: RESET -7971676 cmd=28 retries=0
[25296.585955] megaraid_sas: no pending cmds after reset
[25296.586511] megasas: reset successful 
[25504.514055] sd 0:2:0:0: [sda] megasas: RESET -7973935 cmd=28 retries=0
[25504.514644] megasas: [ 0]waiting for 4 commands to complete
[25505.515760] megaraid_sas: no pending cmds after reset
[25505.516353] megasas: reset successful 
[25505.516968] sd 0:2:1:0: [sdb] megasas: RESET -7973934 cmd=28 retries=0
[25505.517550] megaraid_sas: no pending cmds after reset
[25505.518164] megasas: reset successful 
[25505.518807] sd 0:2:2:0: [sdc] megasas: RESET -7973933 cmd=28 retries=0
[25505.519435] megaraid_sas: no pending cmds after reset
[25505.520072] megasas: reset successful 
[25505.520717] sd 0:2:3:0: [sdd] megasas: RESET -7973932 cmd=28 retries=0
[25505.521371] megaraid_sas: no pending cmds after reset
[25505.522021] megasas: reset successful 
[25688.454889] sd 0:2:0:0: [sda] megasas: RESET -7975464 cmd=28 retries=0
[25688.455536] megasas: [ 0]waiting for 5 commands to complete
[25689.456461] megaraid_sas: no pending cmds after reset
[25689.457146] megasas: reset successful 
[25689.457835] sd 0:2:1:0: [sdb] megasas: RESET -7975463 cmd=28 retries=0
[25689.458527] megaraid_sas: no pending cmds after reset
[25689.459212] megasas: reset successful 
[25689.459900] sd 0:2:2:0: [sdc] megasas: RESET -7975462 cmd=28 retries=0
[25689.460589] megaraid_sas: no pending cmds after reset
[25689.461277] megasas: reset successful 
[25689.461966] sd 0:2:6:0: [sdg] megasas: RESET -7975466 cmd=28 retries=0
[25689.462656] megaraid_sas: no pending cmds after reset
[25689.463350] megasas: reset successful 
[25689.464065] sd 0:2:7:0: [sdh] megasas: RESET -7975465 cmd=28 retries=0
[25689.464777] megaraid_sas: no pending cmds after reset
[25689.465480] megasas: reset successful 
[25871.401889] sd 0:2:0:0: [sda] megasas: RESET -7975673 cmd=28 retries=0
[25871.402622] megasas: [ 0]waiting for 3 commands to complete
[25872.403432] megaraid_sas: no pending cmds after reset
[25872.404172] megasas: reset successful 
[25872.404926] sd 0:2:1:0: [sdb] megasas: RESET -7975672 cmd=28 retries=0
[25872.405629] megaraid_sas: no pending cmds after reset
[25872.406341] megasas: reset successful 
[25872.407084] sd 0:2:6:0: [sdg] megasas: RESET -7975671 cmd=28 retries=0
[25872.407819] megaraid_sas: no pending cmds after reset
[25872.408529] megasas: reset successful 
[26057.347971] sd 0:2:0:0: [sda] megasas: RESET -7979183 cmd=28 retries=0
[26057.348711] megasas: [ 0]waiting for 4 commands to complete
[26058.349559] megaraid_sas: no pending cmds after reset
[26058.350301] megasas: reset successful 
[26058.351055] sd 0:2:1:0: [sdb] megasas: RESET -7979182 cmd=28 retries=0
[26058.351778] megaraid_sas: no pending cmds after reset
[26058.352497] megasas: reset successful 
[26058.353259] sd 0:2:4:0: [sde] megasas: RESET -7979180 cmd=28 retries=0
[26058.353984] megaraid_sas: no pending cmds after reset
[26058.354689] megasas: reset successful 
[26058.355453] sd 0:2:7:0: [sdh] megasas: RESET -7979184 cmd=28 retries=0
[26058.356178] megaraid_sas: no pending cmds after reset
[26058.356889] megasas: reset successful 
[26244.293725] sd 0:2:0:0: [sda] megasas: RESET -7982414 cmd=28 retries=0
[26244.294463] megasas: [ 0]waiting for 6 commands to complete
[26245.295427] megaraid_sas: no pending cmds after reset
[26245.296166] megasas: reset successful 
[26245.296932] sd 0:2:1:0: [sdb] megasas: RESET -7982413 cmd=28 retries=0
[26245.297675] megaraid_sas: no pending cmds after reset
[26245.298390] megasas: reset successful 
[26245.299105] sd 0:2:4:0: [sde] megasas: RESET -7982418 cmd=28 retries=0
[26245.299827] megaraid_sas: no pending cmds after reset
[26245.300536] megasas: reset successful 
[26245.301247] sd 0:2:5:0: [sdf] megasas: RESET -7982417 cmd=28 retries=0
[26245.301963] megaraid_sas: no pending cmds after reset
[26245.302678] megasas: reset successful 
[26245.303390] sd 0:2:6:0: [sdg] megasas: RESET -7982416 cmd=28 retries=0
[26245.304103] megaraid_sas: no pending cmds after reset
[26245.304816] megasas: reset successful 
[26245.305540] sd 0:2:7:0: [sdh] megasas: RESET -7982415 cmd=28 retries=0
[26245.306257] megaraid_sas: no pending cmds after reset
[26245.306974] megasas: reset successful 
[26428.240473] sd 0:2:0:0: [sda] megasas: RESET -7983897 cmd=28 retries=0
[26428.241223] megasas: [ 0]waiting for 4 commands to complete
[26429.242220] megaraid_sas: no pending cmds after reset
[26429.242959] megasas: reset successful 
[26429.243733] sd 0:2:5:0: [sdf] megasas: RESET -7983900 cmd=28 retries=0
[26429.244463] megaraid_sas: no pending cmds after reset
[26429.245174] megasas: reset successful 
[26429.245885] sd 0:2:6:0: [sdg] megasas: RESET -7983899 cmd=28 retries=0
[26429.246602] megaraid_sas: no pending cmds after reset
[26429.247316] megasas: reset successful 
[26429.248030] sd 0:2:7:0: [sdh] megasas: RESET -7983898 cmd=28 retries=0
[26429.248745] megaraid_sas: no pending cmds after reset
[26429.249460] megasas: reset successful 
[26611.187571] sd 0:2:1:0: [sdb] megasas: RESET -7984456 cmd=28 retries=0
[26611.188308] megasas: [ 0]waiting for 2 commands to complete
[26612.189112] megaraid_sas: no pending cmds after reset
[26612.189858] megasas: reset successful 
[26612.190617] sd 0:2:2:0: [sdc] megasas: RESET -7984455 cmd=28 retries=0
[26612.191339] megaraid_sas: no pending cmds after reset
[26612.192066] megasas: reset successful 
[26794.137447] sd 0:2:7:0: [sdh] megasas: RESET -7984860 cmd=28 retries=0
[26794.138188] megasas: [ 0]waiting for 1 commands to complete
[26795.139149] megaraid_sas: no pending cmds after reset
[26795.139889] megasas: reset successful 
[26979.081055] sd 0:2:0:0: [sda] megasas: RESET -7987152 cmd=28 retries=0
[26979.081799] megasas: [ 0]waiting for 5 commands to complete
[26980.082593] megaraid_sas: no pending cmds after reset
[26980.083336] megasas: reset successful 
[26980.084108] sd 0:2:4:0: [sde] megasas: RESET -7987156 cmd=28 retries=0
[26980.084845] megaraid_sas: no pending cmds after reset
[26980.085543] megasas: reset successful 
[26980.086280] sd 0:2:5:0: [sdf] megasas: RESET -7987155 cmd=28 retries=0
[26980.086998] megaraid_sas: no pending cmds after reset
[26980.087710] megasas: reset successful 
[26980.088420] sd 0:2:6:0: [sdg] megasas: RESET -7987154 cmd=28 retries=0
[26980.089140] megaraid_sas: no pending cmds after reset
[26980.089850] megasas: reset successful 
[26980.090620] sd 0:2:7:0: [sdh] megasas: RESET -7987153 cmd=28 retries=0
[26980.091344] megaraid_sas: no pending cmds after reset
[26980.092055] megasas: reset successful 
[27301.987490] sd 0:2:0:0: [sda] megasas: RESET -7990133 cmd=28 retries=0
[27301.988238] megasas: [ 0]waiting for 3 commands to complete
[27302.989129] megaraid_sas: no pending cmds after reset
[27302.989866] megasas: reset successful 
[27302.990626] sd 0:2:1:0: [sdb] megasas: RESET -7990132 cmd=28 retries=0
[27302.991357] megaraid_sas: no pending cmds after reset
[27302.992067] megasas: reset successful 
[27302.992845] sd 0:2:5:0: [sdf] megasas: RESET -7990134 cmd=28 retries=0
[27302.993564] megaraid_sas: no pending cmds after reset
[27302.994282] megasas: reset successful 
[27484.934588] sd 0:2:0:0: [sda] megasas: RESET -7990452 cmd=28 retries=0
[27484.935330] megasas: [ 0]waiting for 5 commands to complete
[27485.936176] megaraid_sas: no pending cmds after reset
[27485.936914] megasas: reset successful 
[27485.937682] sd 0:2:1:0: [sdb] megasas: RESET -7990451 cmd=28 retries=0
[27485.938402] megaraid_sas: no pending cmds after reset
[27485.939107] megasas: reset successful 
[27485.939898] sd 0:2:2:0: [sdc] megasas: RESET -7990450 cmd=28 retries=0
[27485.940619] megaraid_sas: no pending cmds after reset
[27485.941339] megasas: reset successful 
[27485.942110] sd 0:2:6:0: [sdg] megasas: RESET -7990454 cmd=28 retries=0
[27485.942838] megaraid_sas: no pending cmds after reset
[27485.943556] megasas: reset successful 
[27485.944311] sd 0:2:7:0: [sdh] megasas: RESET -7990453 cmd=28 retries=0
[27485.945028] megaraid_sas: no pending cmds after reset
[27485.945739] megasas: reset successful 

The problem reproduced by zfs send data/101_srv_backup | pv | cat > /dev/null (by this command I tried measure zfs read speed and check that I will cause kernel panic for my bug #2945; after 700GB of read data I found my system in state when 4 disks has 100% util (iostat -xmt 5) and no any read/write throughput or I/O ops. pv also showed zero speed then disk 'freeze' happened.

I'm running for i in /dev/sd[a-h]; do ( dd if=${i} of=/dev/null bs=16M & ); done now and have no any freezes (so, may be problem will happen...).

In my case I have 8 disks, no SMART errors, no zpool errors or any other problems (except kernel oops in zfs module).

@alitvak69, did you solve you issue? How?

Thank you.

ADDITIONAL DETAILS.

# /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL

Adapter #0

==============================================================================
                    Versions
                ================
Product Name    : LSI MegaRAID SAS 9260-8i
Serial No       : SV24410013
FW Package Build: 12.15.0-0205

...

                Image Versions in Flash:
                ================
FW Version         : 2.130.403-3835
BIOS Version       : 3.30.02.2_4.16.08.00_0x06060A05
Preboot CLI Version: 04.04-020:#%00009
WebBIOS Version    : 6.0-54-e_50-Rel
NVDATA Version     : 2.09.03-0051
Boot Block Version : 2.02.00.00-0000
BOOT Version       : 09.250.01.219
...
# /opt/MegaRAID/MegaCli/MegaCli64 -LDGetProp -Cache -LALL -aALL

Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 6(target id: 6): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU
Adapter 0-VD 7(target id: 7): Cache Policy:WriteBack, ReadAhead, Cached, No Write Cache if bad BBU

Exit Code: 0x00
# /opt/MegaRAID/MegaCli/MegaCli64 -LDGetProp -DskCache -LAll -aALL

Adapter 0-VD 0(target id: 0): Disk Write Cache : Enabled
Adapter 0-VD 1(target id: 1): Disk Write Cache : Enabled
Adapter 0-VD 2(target id: 2): Disk Write Cache : Enabled
Adapter 0-VD 3(target id: 3): Disk Write Cache : Enabled
Adapter 0-VD 4(target id: 4): Disk Write Cache : Enabled
Adapter 0-VD 5(target id: 5): Disk Write Cache : Enabled
Adapter 0-VD 6(target id: 6): Disk Write Cache : Enabled
Adapter 0-VD 7(target id: 7): Disk Write Cache : Enabled

Exit Code: 0x00
# /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aALL

BBU status for Adapter: 0

BatteryType: iBBU
Voltage: 3959 mV
Current: 0 mA
Temperature: 32 C
Battery State     : Operational

BBU Firmware Status:

  Charging Status              : None
  Voltage                                 : OK
  Temperature                             : OK
  Learn Cycle Requested                   : No
  Learn Cycle Active                      : No
  Learn Cycle Status                      : OK
  Learn Cycle Timeout                     : No
  I2c Errors Detected                     : No
  Battery Pack Missing                    : No
  Battery Replacement required            : No
  Remaining Capacity Low                  : No
  Periodic Learn Required                 : No
  Transparent Learn                       : No
  No space to cache offload               : No
  Pack is about to fail & should be replaced : No
  Cache Offload premium feature required  : No
  Module microcode update required        : No


GasGuageStatus:
  Fully Discharged        : No
  Fully Charged           : No
  Discharging             : Yes
  Initialized             : Yes
  Remaining Time Alarm    : No
  Discharge Terminated    : No
  Over Temperature        : No
  Charging Terminated     : No
  Over Charged            : No
  Relative State of Charge: 67 %
  Charger System State: 49169
  Charger System Ctrl: 0
  Charging current: 512 mA
  Absolute state of charge: 67 %
  Max Error: 2 %

  Battery backup charge time : 48 hours +

BBU Capacity Info for Adapter: 0

  Relative State of Charge: 67 %
  Absolute State of charge: 67 %
  Remaining Capacity: 819 mAh
  Full Charge Capacity: 1222 mAh
  Run time to empty: Battery is not being discharged.  
  Average time to empty: Battery is not being discharged.  
  Estimated Time to full recharge: Battery is not being charged.  
  Cycle Count: 47
Max Error = 2 %
Remaining Capacity Alarm = 120 mAh
Remining Time Alarm = 10 Min

BBU Design Info for Adapter: 0

  Date of Manufacture: 12/26, 2012
  Design Capacity: 1215 mAh
  Design Voltage: 3700 mV
  Specification Info: 33
  Serial Number: 703
  Pack Stat Configuration: 0x64a0
  Manufacture Name: LS1191002A
  Device Name: 3150302
  Device Chemistry: LION
  Battery FRU: N/A
  Transparent Learn = 0
  App Data = 0

BBU Properties for Adapter: 0

  Auto Learn Period: 30 Days
  Next Learn time: Dec 6 2014 15:42:26  
  Learn Delay Interval:0 Hours
  Auto-Learn Mode: Enabled

Exit Code: 0x00

Before I tried zfs this server run ext3/4 on MegaRAID virtual drive (with RAID6), but on outdated Debian and without any problems with hw freezes.

@umask
Copy link

umask commented Dec 7, 2014

for i in /dev/sd[a-h]; do ( dd if=${i} of=/dev/null bs=16M & ); done completed without any message in dmesg.

@umask
Copy link

umask commented Dec 7, 2014

I have strange behaviour now:

  1. dd read all disks content completely and everything is GOOD, no any issues, no warnings in logs.
  2. after (1) I run cat /dev/sda | pv | cat > /dev/null - all disk read speed is OK, no any freezes, at start I have 140MB/s, next speed reducing to 100MB/s until finish.
  3. after (2) I run zfs send data/101_srv_backup | pv | cat > /dev/null and pv shown only 70.6MB of data read, next speed fall to 0 and nothing read anymore. During running zfs send ... I noticed that iostat shown 100% disk util for 6 members of raidz2, 0% util for 2 other raidz2 members, no any data reads\writes or IO ops showed by ioastat. I decided check /proc/interrupts and found that no interrupts increasing for megasas for few minutes. Next I pressed CTRL+C in terminal with zfs send ... but it didn't help and control was not returned (standard situation when process in D-state uninterruptible sleep). Next I issued command for i in /dev/sd[a-h]; do ( dd if=${i} of=/dev/null bs=1M count=3 & ); done in another terminal and zfs send... magically handled CRTL+C. I run zfs send ... again and everything is OK (I suspect that freeze happens again; but unfreeze will be produced by read few megabytes using dd).

Is it bug in megaraid_sas kernel module?

Is somebody know how to produce read operations using dd as it do ZoL?

@umask
Copy link

umask commented Dec 7, 2014

Excellent. zfs send ... frozen, I continuously get messages like

[107014.905718] sd 0:2:6:0: [sdg] megasas: RESET -157374834 cmd=28 retries=0
[107014.906489] megasas: [ 0]waiting for 1 commands to complete
[107015.907506] megaraid_sas: no pending cmds after reset
[107015.908264] megasas: reset successful 

in logs, iostat shows 100% util for 3 of 8 disks (yes, no any reads/writes...), interrupts for megasas also do not growing... and dd do not help.

@thepax
Copy link

thepax commented Dec 7, 2014

I had the same issue. Zpool didn't import at all. Even 'zdb -e' did stick on pread64s randomly.
Resolved by disabling ReadAhead on all devices.

@umask
Copy link

umask commented Dec 7, 2014

@thepax, did you disable ReadAhead using MegaCli or using something else?

@thepax
Copy link

thepax commented Dec 7, 2014

Sure: MegaCli -ldsetprop nora -lall -a0

@umask
Copy link

umask commented Dec 7, 2014

Looks like NORA (NoReadAhaed) helped me:

# /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp NORA -Lall -aALL

Set Read Policy to NoReadAhead on Adapter 0, VD 0 (target id: 0) success
Set Read Policy to NoReadAhead on Adapter 0, VD 1 (target id: 1) success
Set Read Policy to NoReadAhead on Adapter 0, VD 2 (target id: 2) success
Set Read Policy to NoReadAhead on Adapter 0, VD 3 (target id: 3) success
Set Read Policy to NoReadAhead on Adapter 0, VD 4 (target id: 4) success
Set Read Policy to NoReadAhead on Adapter 0, VD 5 (target id: 5) success
Set Read Policy to NoReadAhead on Adapter 0, VD 6 (target id: 6) success
Set Read Policy to NoReadAhead on Adapter 0, VD 7 (target id: 7) success

Exit Code: 0x00
# /opt/MegaRAID/MegaCli/MegaCli64 -LDGetProp -Cache -LALL -aALL

Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 6(target id: 6): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 7(target id: 7): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU

Exit Code: 0x00

Everything is OK with reads now!

@umask
Copy link

umask commented Dec 7, 2014

@thepax, thank you!

@alitvak69
Copy link
Author

I have not made NORA changes. I went through several changes including
controller change, firmware upgrades, zfs updates, and rebuilding zpools.
At some point I could no longer reproduce a problem.

On Sun, Dec 7, 2014 at 1:05 PM, umask notifications@github.com wrote:

Looks like NORA (NoReadAhaed) helped me:

/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp NORA -Lall -aALL

Set Read Policy to NoReadAhead on Adapter 0, VD 0 (target id: 0) success
Set Read Policy to NoReadAhead on Adapter 0, VD 1 (target id: 1) success
Set Read Policy to NoReadAhead on Adapter 0, VD 2 (target id: 2) success
Set Read Policy to NoReadAhead on Adapter 0, VD 3 (target id: 3) success
Set Read Policy to NoReadAhead on Adapter 0, VD 4 (target id: 4) success
Set Read Policy to NoReadAhead on Adapter 0, VD 5 (target id: 5) success
Set Read Policy to NoReadAhead on Adapter 0, VD 6 (target id: 6) success
Set Read Policy to NoReadAhead on Adapter 0, VD 7 (target id: 7) success

Exit Code: 0x00

/opt/MegaRAID/MegaCli/MegaCli64 -LDGetProp -Cache -LALL -aALL

Adapter 0-VD 0(target id: 0): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 6(target id: 6): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU
Adapter 0-VD 7(target id: 7): Cache Policy:WriteBack, ReadAheadNone, Cached, No Write Cache if bad BBU

Exit Code: 0x00

Everything is OK with reads now!


Reply to this email directly or view it on GitHub
#2752 (comment).

@umask
Copy link

umask commented Dec 7, 2014

I have read about ZFS on MegaRAID here https://calomel.org/zfs_raid_speed_capacity.html ('How do you get ZFS to work through the LSI MegaRAID controller ?'). Next I noticed that scrub speed too slow (10-12MB/sec; it's still so low - changes didn't helped) on my zfs partition and decided to do some "tuning" (enable read-ahead and disk cached access). Probably MegaCli creates VD by-default with ReadAhead option.

@alitvak69, which scrub speed do you have on your zfs/zpool?

@thepax
Copy link

thepax commented Dec 8, 2014

Well, it looks like pigeon superstition. In order to experiment with the issue I've turned ReadAhead back and the issue doesn't return. So now I believe that any MegaRAID reconfiguration fixes it. Probably MegaCli -adpcacheflush -a0 does the same magic.

Anyway, this definately not a ZFS issue.

@umask
Copy link

umask commented Dec 8, 2014

@thepax, keep us informed about results of your experiments. Thank you 👍

I agree that problem not in ZFS (it's somewhere in megaraid firmware of kernel module/driver). Nonetheless dd didn't reproduce any the same troubles, but ZFS produce some behaviour in disks access which creates problems. Will be good if we can reproduce this issue using dd and report problem to LSI.

@alitvak69
Copy link
Author

My original issue was not a scrub speed although it is slow regardless of readahead settings on my 12 TB RAID. I had it during tar | pv | untar sessions.

@thepax
Copy link

thepax commented Dec 10, 2014

@umask The issue with ZFS is that it uses many IO threads to work with hard drives. dd uses single IO thread.

My current LD settings (the first two are SSDs):

Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone, Direct, No Write Cache if bad BBU
Adapter 0-VD 1(target id: 1): Cache Policy:WriteThrough, ReadAheadNone, Direct, No Write Cache if bad BBU
Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 6(target id: 6): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 7(target id: 7): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 8(target id: 8): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 9(target id: 9): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 10(target id: 10): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 11(target id: 11): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 12(target id: 12): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 13(target id: 13): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 14(target id: 14): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 15(target id: 15): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 16(target id: 16): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 17(target id: 17): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 18(target id: 18): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 19(target id: 19): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 20(target id: 20): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 21(target id: 21): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 22(target id: 22): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU
Adapter 0-VD 23(target id: 23): Cache Policy:WriteBack, ReadAdaptive, Cached, No Write Cache if bad BBU

Scrub speed was about 260MB/s

Everything works fine. After disabling ReadAhead and enabling it back I can't reproduce the issue.

@umask
Copy link

umask commented Dec 10, 2014

@thepax, as I know Adaptive ReadAhead is deprecated feature in MegaRAID. Just FYI.

I suspect that 260MB\s scrub speed on 22 disks is very slow.

@umask
Copy link

umask commented Dec 12, 2014

Are you sure that after switching ReadAhead to 'off' and turn it back to 'on' again and 'power off'/'power on' server its controller will not fall to problem state?

@behlendorf
Copy link
Contributor

Where does this issue stand? Is it something which has been clearly resolved as not a ZFS issue?

@umask
Copy link

umask commented Dec 19, 2014

@behlendorf Yes, problem not related with ZFS.

@alitvak69
Copy link
Author

Not a ZFS issue IMO. Perhaps should be closed.

On Fri, Dec 19, 2014 at 1:22 AM, umask notifications@github.com wrote:

@behlendorf https://github.com/behlendorf Yes, problem not related with
ZFS.


Reply to this email directly or view it on GitHub
#2752 (comment).

@behlendorf
Copy link
Contributor

OK, thanks. Closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

6 participants
@behlendorf @umask @thepax @kernelOfTruth @alitvak69 and others