Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

px-dev hangs hosts, prevents restarts #45

Closed
svensp opened this issue Jul 27, 2017 · 4 comments
Closed

px-dev hangs hosts, prevents restarts #45

svensp opened this issue Jul 27, 2017 · 4 comments
Assignees

Comments

@svensp
Copy link

svensp commented Jul 27, 2017

Setup:
3 Hosts running Portworx.
docker version: 17.03.1-ce, 17.03.2-ce
Portworx Version: portworx/px-dev:1.2.8
Each has 1 Volume at on a volumegroup, which is raid1 on 2 ssd disks
A volume directly on a 3TB HDD was supposed to be added.

Process:
Following https://docs.portworx.com/maintain/scale-up.html:
sudo /opt/pwx/bin/pxctl service maintenance --enter
-- Wait until portworx is back up in maintenance mode --
sudo /opt/pwx/bin/pxctl service drive add /dev/sdc
sudo /opt/pwx/bin/pxctl service maintenance --exit
-- Wait for portworx to return to normal mode --

Portworx stayed in initializing for about 3 minutes, until all 3 Hosts died simultanously with the following log:
Host which was supposed to receive more storage:

[268973.752296] pxd_control_open: pxd-control-0 open OK
[268973.752331] pxd_process_init_reply: pxd-control-0:5048845 init OK
[268973.775532] blk_update_request: I/O error, dev pxd/pxd226604244376265636, sector 0
[268973.776890] blk_update_request: I/O error, dev pxd/pxd226604244376265636, sector 0
[268973.776951] blk_update_request: I/O error, dev pxd/pxd226604244376265636, sector 21241864
[268973.776996] Buffer I/O error on dev pxd/pxd226604244376265636, logical block 2655233, lost sync page write
[268973.777048] JBD2: Error -5 detected when updating journal superblock for pxd!pxd226604244376265636-8.
[268973.777096] Aborting journal on device pxd!pxd226604244376265636-8.
[268973.777134] blk_update_request: I/O error, dev pxd/pxd226604244376265636, sector 21241864
[268973.777179] Buffer I/O error on dev pxd/pxd226604244376265636, logical block 2655233, lost sync page write
[268973.777229] JBD2: Error -5 detected when updating journal superblock for pxd!pxd226604244376265636-8.
[268973.777301] blk_update_request: I/O error, dev pxd/pxd226604244376265636, sector 0
[268973.777345] Buffer I/O error on dev pxd/pxd226604244376265636, logical block 0, lost sync page write
[268973.777395] EXT4-fs error (device pxd!pxd226604244376265636): ext4_put_super:842: Couldn't clean up the journal
[268973.777444] EXT4-fs (pxd!pxd226604244376265636): Remounting filesystem read-only
[268973.777488] EXT4-fs (pxd!pxd226604244376265636): previous I/O error to superblock detected
[268973.777542] blk_update_request: I/O error, dev pxd/pxd226604244376265636, sector 0
[268973.777586] Buffer I/O error on dev pxd/pxd226604244376265636, logical block 0, lost sync page write
[269032.308667] docker0: port 33(vethr2da89069fb) entered disabled state
[269032.324781] device vethr2da89069fb left promiscuous mode
[269032.324811] docker0: port 33(vethr2da89069fb) entered disabled state
[269297.629050] INFO: task jbd2/pxd!pxd640:19027 blocked for more than 300 seconds.
[269297.629098]       Tainted: G           OE   4.9.0-0.bpo.2-amd64 #1
[269297.629124] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[269297.629168] jbd2/pxd!pxd640 D    0 19027      2 0x00000000
[269297.629197]  ffff8866a1f77800 0000000000000000 ffff8874384b50c0 ffff886f885b5140
[269297.629247]  ffff88743f218700 ffff94830a79fb20 ffffffffa61f784d ffffffffa5ef957c
[269297.629296]  ffff886cb7ee9c00 0000000000000034 0000000055e34000 ffff886f885b5140
[269297.629345] Call Trace:
[269297.629371]  [<ffffffffa61f784d>] ? __schedule+0x23d/0x6d0
[269297.629399]  [<ffffffffa5ef957c>] ? blk_rq_init+0xbc/0xd0
[269297.629425]  [<ffffffffa61f8580>] ? bit_wait_timeout+0x90/0x90
[269297.629451]  [<ffffffffa61f7d12>] ? schedule+0x32/0x80
[269297.629477]  [<ffffffffa61fb249>] ? schedule_timeout+0x249/0x300
[269297.629505]  [<ffffffffa5efe01f>] ? blk_peek_request+0x5f/0x290
[269297.629532]  [<ffffffffa61f8580>] ? bit_wait_timeout+0x90/0x90
[269297.629559]  [<ffffffffa61f7594>] ? io_schedule_timeout+0xb4/0x130
[269297.629587]  [<ffffffffa5cbb4f7>] ? prepare_to_wait+0x57/0x80
[269297.629613]  [<ffffffffa61f8597>] ? bit_wait_io+0x17/0x60
[269297.629639]  [<ffffffffa61f808c>] ? __wait_on_bit+0x5c/0x90
[269297.629665]  [<ffffffffa61f8580>] ? bit_wait_timeout+0x90/0x90
[269297.629692]  [<ffffffffa61f81ee>] ? out_of_line_wait_on_bit+0x7e/0xa0
[269297.629720]  [<ffffffffa5cbb820>] ? autoremove_wake_function+0x40/0x40
[269297.629753]  [<ffffffffc04e5d18>] ? jbd2_journal_commit_transaction+0xd48/0x17e0 [jbd2]
[269297.629798]  [<ffffffffa5cafd67>] ? put_prev_entity+0x47/0x840
[269297.629826]  [<ffffffffa5c2476b>] ? __switch_to+0x2bb/0x700
[269297.629855]  [<ffffffffa5ce65cd>] ? try_to_del_timer_sync+0x4d/0x80
[269297.629884]  [<ffffffffc04ea9ed>] ? kjournald2+0xdd/0x280 [jbd2]
[269297.629911]  [<ffffffffa5cbb7e0>] ? wake_up_atomic_t+0x30/0x30
[269297.629939]  [<ffffffffc04ea910>] ? commit_timeout+0x10/0x10 [jbd2]
[269297.629968]  [<ffffffffa5c97520>] ? kthread+0xe0/0x100
[269297.629993]  [<ffffffffa5c2476b>] ? __switch_to+0x2bb/0x700
[269297.630019]  [<ffffffffa5c97440>] ? kthread_park+0x60/0x60
[269297.630046]  [<ffffffffa61fc835>] ? ret_from_fork+0x25/0x30
[269297.630072] NMI backtrace for cpu 7
[269297.630095] CPU: 7 PID: 81 Comm: khungtaskd Tainted: G           OE   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.18-1~bpo8+1
[269297.630143] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2.T201502251406 02/25/2015
[269297.630194]  0000000000000000 ffffffffa5f29dd5 0000000000000000 0000000000000007
[269297.630243]  ffffffffa5f2e300 0000000000000007 ffffffffa5c4cae0 ffff886f885b5140
[269297.630292]  ffffffffa5f2e40a ffff886f885b5140 00000000003ff18f ffffffffa5d27a90
[269297.630341] Call Trace:
[269297.630363]  [<ffffffffa5f29dd5>] ? dump_stack+0x5c/0x77
[269297.630389]  [<ffffffffa5f2e300>] ? nmi_cpu_backtrace+0x90/0xa0
[269297.630417]  [<ffffffffa5c4cae0>] ? irq_force_complete_move+0x140/0x140
[269297.630445]  [<ffffffffa5f2e40a>] ? nmi_trigger_cpumask_backtrace+0xfa/0x130
[269297.630474]  [<ffffffffa5d27a90>] ? watchdog+0x2b0/0x330
[269297.630499]  [<ffffffffa5d277e0>] ? reset_hung_task_detector+0x10/0x10
[269297.630528]  [<ffffffffa5c97520>] ? kthread+0xe0/0x100
[269297.630553]  [<ffffffffa5c2476b>] ? __switch_to+0x2bb/0x700
[269297.630579]  [<ffffffffa5c97440>] ? kthread_park+0x60/0x60
[269297.630606]  [<ffffffffa61fc835>] ? ret_from_fork+0x25/0x30
[269297.630632] Sending NMI from CPU 7 to CPUs 0-6,8-11:
[269297.630666] NMI backtrace for cpu 8 skipped: idling at pc 0xffffffffa61fc02e
[269297.630697] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffffa61fc02e
[269297.630727] NMI backtrace for cpu 10 skipped: idling at pc 0xffffffffa61fc02e
[269297.630772] NMI backtrace for cpu 11 skipped: idling at pc 0xffffffffa61fc02e
[269297.630817] NMI backtrace for cpu 1 skipped: idling at pc 0xffffffffa61fc02e
[269297.630847] NMI backtrace for cpu 4
[269297.630871] CPU: 4 PID: 5280 Comm: confd Tainted: G           OE   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.18-1~bpo8+1
[269297.630919] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2.T201502251406 02/25/2015
[269297.630971] task: ffff886f9e4c4000 task.stack: ffff948307674000
[269297.630998] RIP: 0033:[<000000000045f98b>] c [<000000000045f98b>] 0x45f98b
[269297.631026] RSP: 002b:000000c82004e9e8  EFLAGS: 00000283
[269297.631052] RAX: 000000c820a2f630 RBX: 000000000045f6a0 RCX: 0000000000000047
[269297.631095] RDX: 000000c82004ea00 RSI: 000000c8218854b8 RDI: 000000c822363518
[269297.631138] RBP: 0000000000bb5d00 R08: 000000c8209069c0 R09: 000000c8209069c0
[269297.631181] R10: 000000c822363518 R11: 0000000000000000 R12: 0000000000000005
[269297.631224] R13: 0000000000a282b8 R14: 0000000000000004 R15: 0000000000000008
[269297.631267] FS:  000000c82003a868(0000) GS:ffff88743f300000(0000) knlGS:0000000000000000
[269297.631311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[269297.631338] CR2: 000000c4282fa0d0 CR3: 0000000b97b4c000 CR4: 00000000001406e0
[269297.631382] NMI backtrace for cpu 5
[269297.631405] CPU: 5 PID: 13343 Comm: confd Tainted: G           OE   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.18-1~bpo8+1
[269297.634145] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2.T201502251406 02/25/2015
[269297.634196] task: ffff886f86685080 task.stack: ffff948324b4c000
[269297.634222] RIP: 0033:[<000000000041338b>] c [<000000000041338b>] 0x41338b
[269297.634250] RSP: 002b:000000c821bd9820  EFLAGS: 00000293
[269297.634276] RAX: 0000000000080000 RBX: 00007fe7706614b0 RCX: 000000c820001200
[269297.634318] RDX: 000000c821ddf2c0 RSI: 0000000000000007 RDI: 000000c821ddf262
[269297.634360] RBP: 0000000000140dfc R08: 0000000000140dfc R09: 000000c821ddf262
[269297.634402] R10: 0000000000000002 R11: 000000c821bd9ce0 R12: 0000000000000032
[269297.634444] R13: 0000000000a28614 R14: 000000000000000a R15: 0000000000000008
[269297.634486] FS:  000000c8205b2068(0000) GS:ffff88743f340000(0000) knlGS:0000000000000000
[269297.634530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[269297.634556] CR2: 000000c427dd0d00 CR3: 0000000e172a4000 CR4: 00000000001406e0
[269297.634600] NMI backtrace for cpu 6 skipped: idling at pc 0xffffffffa61fc02e
[269297.634630] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffa61fc02e
[269297.634661] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffffa61fc02e
[269297.634691] NMI backtrace for cpu 9 skipped: idling at pc 0xffffffffa61fc02e
[269297.634739] Kernel panic - not syncing: hung_task: blocked tasks

Other hosts:

[268342.182875] INFO: task px-ns:22180 blocked for more than 300 seconds.
[268342.182917]       Tainted: G           OE   4.9.0-0.bpo.2-amd64 #1
[268342.182954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[268342.183015] px-ns           D    0 22180  22038 0x00000000
[268342.183054]  ffff89dbd8445800 ffff89ddf3fa5c00 ffff89dc61fe9080 ffff89dea134a000
[268342.184314]  ffff89deee498700 ffff9fbc87d37d40 ffffffffb43f784d ffff89db4fdac080
[268342.184404]  00000000fffffffb 00000000b40fe01f ffff89db588da800 ffff89dea134a000
[268342.184467] Call Trace:
[268342.184500]  [<ffffffffb43f784d>] ? __schedule+0x23d/0x6d0
[268342.184538]  [<ffffffffb4036e80>] ? do_fsync+0x60/0x60
[268342.184573]  [<ffffffffb43f7d12>] ? schedule+0x32/0x80
[268342.184621]  [<ffffffffb43fb249>] ? schedule_timeout+0x249/0x300
[268342.184660]  [<ffffffffb40f95ef>] ? __blk_run_queue+0x2f/0x40
[268342.184698]  [<ffffffffb40fe92a>] ? blk_queue_bio+0x39a/0x3b0
[268342.184746]  [<ffffffffb4036e80>] ? do_fsync+0x60/0x60
[268342.184794]  [<ffffffffb43f7594>] ? io_schedule_timeout+0xb4/0x130
[268342.184831]  [<ffffffffb43f912a>] ? wait_for_completion_io+0xfa/0x130
[268342.184872]  [<ffffffffb3ea2b70>] ? wake_up_q+0x60/0x60
[268342.184908]  [<ffffffffb40f37dc>] ? submit_bio_wait+0x5c/0x80
[268342.184944]  [<ffffffffb4100233>] ? blkdev_issue_flush+0x63/0x90
[268342.184998]  [<ffffffffc066cdfa>] ? ext4_sync_fs+0x14a/0x1c0 [ext4]
[268342.185050]  [<ffffffffb4007007>] ? iterate_supers+0xb7/0x110
[268342.185088]  [<ffffffffb4036f52>] ? sys_sync+0x62/0xb0
[268342.185124]  [<ffffffffb43fc5bb>] ? system_call_fast_compare_end+0xc/0x9b
[268342.185166] NMI backtrace for cpu 4
[268342.185198] CPU: 4 PID: 57 Comm: khungtaskd Tainted: G           OE   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.18-1~bpo8+1
[268342.185268] Hardware name: FUJITSU  /D3401-H2, BIOS V5.0.0.12 R1.5.0 for D3401-H2x                     02/27/2017
[268342.185334]  0000000000000000 ffffffffb4129dd5 0000000000000000 0000000000000004
[268342.185399]  ffffffffb412e300 0000000000000004 ffffffffb3e4cae0 ffff89dc61b8b0c0
[268342.185465]  ffffffffb412e40a ffff89dea134a000 00000000003ffcda ffffffffb3f27a90
[268342.185528] Call Trace:
[268342.185558]  [<ffffffffb4129dd5>] ? dump_stack+0x5c/0x77
[268342.185593]  [<ffffffffb412e300>] ? nmi_cpu_backtrace+0x90/0xa0
[268342.185633]  [<ffffffffb3e4cae0>] ? irq_force_complete_move+0x140/0x140
[268342.185673]  [<ffffffffb412e40a>] ? nmi_trigger_cpumask_backtrace+0xfa/0x130
[268342.185713]  [<ffffffffb3f27a90>] ? watchdog+0x2b0/0x330
[268342.185749]  [<ffffffffb3f277e0>] ? reset_hung_task_detector+0x10/0x10
[268342.185789]  [<ffffffffb3e97520>] ? kthread+0xe0/0x100
[268342.185826]  [<ffffffffb3e2476b>] ? __switch_to+0x2bb/0x700
[268342.185862]  [<ffffffffb3e97440>] ? kthread_park+0x60/0x60
[268342.185902]  [<ffffffffb43fc835>] ? ret_from_fork+0x25/0x30
[268342.185939] Sending NMI from CPU 4 to CPUs 0-3,5-7:
[268342.185988] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffb43fc02e
[268342.186030] NMI backtrace for cpu 7 skipped: idling at pc 0xffffffffb43fc02e
[268342.186074] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffffb43fc02e
[268342.186116] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffffb43fc02e
[268342.186159] NMI backtrace for cpu 6 skipped: idling at pc 0xffffffffb43fc02e
[268342.186201] NMI backtrace for cpu 1 skipped: idling at pc 0xffffffffb43fc02e
[268342.186244] NMI backtrace for cpu 5
[268342.186277] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G           OE   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.18-1~bpo8+1
[268342.186344] Hardware name: FUJITSU  /D3401-H2, BIOS V5.0.0.12 R1.5.0 for D3401-H2x                     02/27/2017
[268342.186412] task: ffff89dea73eb000 task.stack: ffff9fbc862f0000
[268342.186450] RIP: 0010:[<ffffffffb432ac4c>] c [<ffffffffb432ac4c>] netlink_has_listeners+0xc/0x60
[268342.186512] RSP: 0018:ffff89deee543ca0  EFLAGS: 00000202
[268342.186548] RAX: 0000000000000006 RBX: ffff89d8b29b0040 RCX: 0000000000000001
[268342.186606] RDX: ffff89dabaa42980 RSI: 0000000000000001 RDI: ffff89dab19d0000
[268342.186663] RBP: ffff89d8b29b00c8 R08: 0000000000000002 R09: 0000000000000000
[268342.186722] R10: 0000000000000000 R11: 0000000000000000 R12: ffff89cf9c9a440e
[268342.186781] R13: ffff89cf9c9a43fa R14: ffff89d8b29b0040 R15: ffff89d8b29b0040
[268342.186839] FS:  0000000000000000(0000) GS:ffff89deee540000(0000) knlGS:0000000000000000
[268342.186899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[268342.186934] CR2: 00007f41cce8b3a0 CR3: 0000000d54e9b000 CR4: 00000000003406e0
[268342.186992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[268342.187050] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[268342.187108] Stack:
[268342.187134]  ffffffffb42df023c ffff89d464abf100c ffffffffb435ddecc fcf7eefb2401a738c
[268342.187195]  ffff89d31c1a8000c ffff89d464abf100c ffff89d464abf100c ffff89dbfd99a000c
[268342.187255]  ffffffff00000000c ffff89deee543d58c ffff89deee543d28c fcf7eefb2401a738c
[268342.187315] Call Trace:
[268342.187342]  <IRQ> d [<ffffffffb42df023>] ? __sk_free+0x73/0xa0
[268342.187378]  [<ffffffffb435ddec>] ? tcp_v4_rcv+0x8bc/0x9e0
[268342.187413]  [<ffffffffb43378bb>] ? ip_local_deliver_finish+0x8b/0x1c0
[268342.187451]  [<ffffffffb4337b8b>] ? ip_local_deliver+0x6b/0xf0
[268342.187486]  [<ffffffffb435d502>] ? tcp_v4_early_demux+0x112/0x140
[268342.187522]  [<ffffffffb4337830>] ? ip_rcv_finish+0x3e0/0x3e0
[268342.187558]  [<ffffffffb4337e91>] ? ip_rcv+0x281/0x3b0
[268342.187591]  [<ffffffffb4337450>] ? inet_del_offload+0x40/0x40
[268342.187627]  [<ffffffffb42f6cce>] ? __netif_receive_skb_core+0x2be/0xa40
[268342.187665]  [<ffffffffb42f8532>] ? process_backlog+0x92/0x140
[268342.187701]  [<ffffffffb42f7ca5>] ? net_rx_action+0x245/0x380
[268342.187736]  [<ffffffffb43ff0e6>] ? __do_softirq+0x106/0x292
[268342.187771]  [<ffffffffb3e7dbb8>] ? irq_exit+0x98/0xa0
[268342.187805]  [<ffffffffb43fee2f>] ? do_IRQ+0x4f/0xd0
[268342.187839]  [<ffffffffb43fcf42>] ? common_interrupt+0x82/0x82
[268342.187874]  <EOI> d [<ffffffffb42c23c3>] ? cpuidle_enter_state+0x113/0x260
[268342.187915]  [<ffffffffb3ebc00e>] ? cpu_startup_entry+0x17e/0x260
[268342.187952]  [<ffffffffb3e4845d>] ? start_secondary+0x14d/0x190
[268342.187989] Code: c48 cc7 cc1 c00 c93 c69 cb4 c48 cc7 cc7 cc9 ca7 c85 cb4 ce8 c4c cd8 cd4 cff c48 c83 cf8 c01 c19 cc0 c83 ce0 cf4 cc3 c66 c90 c0f c1f c44 c00 c00 cf6 c87 cc4 c02 c00 c00 c01 c<74> c45 c0f cb6
c87 c49 c01 c00 c00 c48 c89 cc2 c48 cc1 ce0 c08 c48 cc1 ce2 c04 c48 c

This in itself is already a Problem, but the following error made it even worse:

Jul 27 15:18:51 matthaes-web02 dockerd[1005]: time="2017-07-27T15:18:51.017058927+02:00" level=warning msg="Unable to locate plugin: pxd, retrying in 1s"
Jul 27 15:18:52 matthaes-web02 dockerd[1005]: time="2017-07-27T15:18:52.017399004+02:00" level=warning msg="Unable to locate plugin: pxd, retrying in 2s"
Jul 27 15:18:54 matthaes-web02 dockerd[1005]: time="2017-07-27T15:18:54.017747120+02:00" level=warning msg="Unable to locate plugin: pxd, retrying in 4s"

Docker does not start because Portworx is not available, which starts only after docker is up.
This is a deadlock and I was only able to solve it by removing /var/lib/docker on the hosts

@svensp
Copy link
Author

svensp commented Jul 28, 2017

The restart block seems to be an issue with the rancher network driver that has been resolved in the internal docker network driver before, I'll move that problem there.

moby/libnetwork#813
rancher/rancher#7043

@svensp
Copy link
Author

svensp commented Jul 28, 2017

Update:
The initial error seems to be btrfs related.
Kernel Version: Debian Backport linux-image-4.9.0-0.bpo.2-amd64 / 4.9.18-1~bpo8+1
Currently testing if swtiching to linux-image-4.9.0-0.bpo.3-amd64 / 4.9.30-2+deb9u2~bpo8+1 helps

[77903.023164] kernel BUG at /home/zumbi/linux-4.9.18/fs/btrfs/ctree.c:3172!
[77903.023193] invalid opcode: 0000 [#1] SMP
[77903.023216] Modules linked in: dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag fuse px(OE) cfg80211 rfkill seqiv nf_conntrack_netlink iptable_raw nfnetlink xt_mark xfrm6_mode_tunnel xfrm4_mode_tunnel esp4
 drbg ansi_cprng xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables
 nf_nat br_netfilter bridge stp llc overlay nls_utf8 cifs ip_vs nf_conntrack sha256_ssse3 libcrc32c cmac crc32c_generic cpufreq_conservative cpufreq_powersave cpufreq_userspace md4 hmac ecb des_generic arc4 dns_r
esolver fscache intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ttm ghash_clmulni_intel intel_cstate
[77903.023603]  drm_kms_helper iTCO_wdt intel_uncore iTCO_vendor_support drm joydev evdev intel_rapl_perf serio_raw lpc_ich i2c_i801 ioatdma i2c_smbus mfd_core ipmi_si ipmi_msghandler shpchp tpm_tis tpm_tis_core
tpm wmi button autofs4 ext4 crc16 jbd2 fscrypto mbcache btrfs xor hid_generic usbhid hid raid6_pq dm_mod raid1 md_mod sg sd_mod crc32c_intel ahci libahci aesni_intel libata aes_x86_64 glue_helper ehci_pci lrw gf1
28mul ehci_hcd ablk_helper cryptd psmouse scsi_mod igb usbcore i2c_algo_bit dca ptp usb_common pps_core
[77903.023882] CPU: 10 PID: 15218 Comm: px-storage Tainted: G           OE   4.9.0-0.bpo.2-amd64 #1 Debian 4.9.18-1~bpo8+1
[77903.023932] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2.T201502251406 02/25/2015
[77903.023983] task: ffff92e6adecd000 task.stack: ffffa858c9550000
[77903.024011] RIP: 0010:[<ffffffffc0551ad2>]  [<ffffffffc0551ad2>] btrfs_set_item_key_safe+0x182/0x190 [btrfs]
[77903.024075] RSP: 0018:ffffa858c9553c48  EFLAGS: 00010246
[77903.024100] RAX: 0000000000000000 RBX: ffff92e14f8a0380 RCX: 0000000e3d65e000
[77903.024129] RDX: 0000000000000000 RSI: ffffa858c9553d66 RDI: ffffa858c9553c5f
[77903.024157] RBP: ffffa858c9553c4e R08: 0000000000000000 R09: ffff92e22f790b90
[77903.024185] R10: 0000000000001000 R11: 0000000000000000 R12: ffff92e934926000
[77903.024213] R13: 0000000000000048 R14: ffff92e22f790af0 R15: ffffa858c9553d66
[77903.024242] FS:  00007efcf3fff700(0000) GS:ffff92e93f480000(0000) knlGS:0000000000000000
[77903.024285] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[77903.024311] CR2: 000000c4215ae000 CR3: 0000000f1d6ee000 CR4: 00000000001406e0
[77903.024339] Stack:
[77903.024359]  010192e14f8a0380 006c000000000000 010000000e3d65e0 6c00000000000001
[77903.024408]  0000000e3d65e000 0000000098be65dd ffff92e14f8a0380 ffff92e22f790af0
[77903.024456]  00000000000030e3 0000000000000001 0000000e3d742000 0000000000000000
[77903.024505] Call Trace:
[77903.024539]  [<ffffffffc0591f17>] ? __btrfs_drop_extents+0xb57/0xe40 [btrfs]
[77903.024578]  [<ffffffffc0592e45>] ? btrfs_fallocate+0xc45/0x1110 [btrfs]
[77903.024608]  [<ffffffffb3000054>] ? vfs_fallocate+0x154/0x220
[77903.024635]  [<ffffffffb3000e0e>] ? SyS_fallocate+0x3e/0x60
[77903.024663]  [<ffffffffb33fc5bb>] ? system_call_fast_compare_end+0xc/0x9b
[77903.024691] Code: 7c 24 17 4c 89 fe 48 89 44 24 20 0f b6 44 24 0e 88 44 24 1f 48 8b 44 24 06 48 89 44 24 17 e8 f6 f2 ff ff 85 c0 0f 8f 3a ff ff ff <0f> 0b 0f 0b e8 05 5c 92 f2 0f 1f 44 00 00 0f 1f 44 00 00 41
55

@svensp
Copy link
Author

svensp commented Jul 28, 2017

The BTRFS kernel BUG still happens still happens with linux-image-4.9.0-0.bpo.3-amd64 / 4.9.30-2+deb9u2~bpo8+1, checking badblocks on the device to make sure there is not a hardware problem

@venkatpx venkatpx self-assigned this Aug 4, 2017
@prabirpaul
Copy link

@svensp this is a kernel bug which was addressed through - https://patchwork.kernel.org/patch/9431679/. To get around this issue you would need to upgrade to kernel v4.10 or higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants