PANIC at arc.c:3760:arc_adapt() #3904

ghost · 2015-10-09T13:19:22Z

Hi!

After upgrading to version 0.6.5.2, I got this after just two days of uptime:

[89514.388761] VERIFY3((arc_stats.arcstat_c.value.ui64) >= 2ULL << 24) failed (33347628 >= 33554432)
[89514.388861] PANIC at arc.c:3760:arc_adapt()

We're running Debian 7.9 with Kernel version 3.16. This error has seemingly appeared multiple times before I noticed it while doing some very heavy read workloads and simultaneously probably doing some random writes.

Full dmesg output:

[89514.388761] VERIFY3((arc_stats.arcstat_c.value.ui64) >= 2ULL << 24) failed (33347628 >= 33554432)
[89514.388861] PANIC at arc.c:3760:arc_adapt()
[89514.388910] Showing stack for process 30702
[89514.388919] CPU: 11 PID: 30702 Comm: qemu-system-x86 Tainted: P O 3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u3bpo70+1
[89514.388923] Hardware name: Supermicro H8DGT /H8DGT , BIOS 3.0 08/31/2012
[89514.388927] 0000000000000000 ffffffffa129d0f0 ffffffff815457d4 ffffffffa12a6120
[89514.388934] ffffffffa17a90d2 0000001000000001 ffffffffa12a61f0 0000001e0000001b
[89514.388939] 0000000000000030 ffff880b722f3968 ffff880b722f3908 ffff880feb570000
[89514.388945] Call Trace:
[89514.388965] [] ? dump_stack+0x41/0x51
[89514.388978] [] ? spl_panic+0xc2/0x100 [spl]
[89514.389013] [] ? zio_vdev_io_start+0xb3/0x300 [zfs]
[89514.389031] [] ? buf_cons+0x8b/0xe0 [zfs]
[89514.389039] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89514.389059] [] ? dbuf_cons+0x98/0x100 [zfs]
[89514.389066] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89514.389084] [] ? arc_get_data_buf.isra.23+0x3ef/0x430 [zfs]
[89514.389102] [] ? arc_buf_alloc+0x133/0x1a0 [zfs]
[89514.389120] [] ? arc_read+0x183/0xa70 [zfs]
[89514.389139] [] ? dbuf_create+0x356/0x4e0 [zfs]
[89514.389159] [] ? dmu_buf_rele+0x20/0x20 [zfs]
[89514.389166] [] ? mutex_lock+0xe/0x2a
[89514.389186] [] ? dbuf_read+0x2d7/0x8c0 [zfs]
[89514.389209] [] ? dmu_buf_hold_array_by_dnode+0x123/0x470 [zfs]
[89514.389233] [] ? dmu_read_uio_dnode+0x40/0xd0 [zfs]
[89514.389256] [] ? dmu_read_uio_dbuf+0x3f/0x60 [zfs]
[89514.389281] [] ? zfs_read+0x140/0x3f0 [zfs]
[89514.389303] [] ? zpl_read_common_iovec+0x85/0xd0 [zfs]
[89514.389325] [] ? zpl_iter_read+0xb7/0x100 [zfs]
[89514.389335] [] ? do_iter_readv_writev+0x5b/0x90
[89514.389358] [] ? zpl_read_common_iovec+0xd0/0xd0 [zfs]
[89514.389363] [] ? do_readv_writev+0xdd/0x300
[89514.389384] [] ? zpl_read_common_iovec+0xd0/0xd0 [zfs]
[89514.389391] [] ? __wake_up_common+0x57/0x90
[89514.389399] [] ? fsnotify+0x1cc/0x260
[89514.389405] [] ? eventfd_write+0x19f/0x210
[89514.389411] [] ? SyS_preadv+0xcb/0xd0
[89514.389418] [] ? system_call_fast_compare_end+0x10/0x15
[89515.128519] VERIFY3((arc_stats.arcstat_c.value.ui64) >= 2ULL << 24) failed (33347628 >= 33554432)
[89515.128617] PANIC at arc.c:3760:arc_adapt()
[89515.128667] Showing stack for process 17109
[89515.128674] CPU: 10 PID: 17109 Comm: qemu-system-x86 Tainted: P O 3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u3bpo70+1
[89515.128678] Hardware name: Supermicro H8DGT /H8DGT , BIOS 3.0 08/31/2012
[89515.128682] 0000000000000000 ffffffffa129d0f0 ffffffff815457d4 ffffffffa12a6120
[89515.128689] ffffffffa17a90d2 ffff880e7de09f00 ffffffffa12a61f0 0000000000000000
[89515.128694] 0000000000000030 ffff88156c0737d8 ffff88156c073778 0000000000000014
[89515.128700] Call Trace:
[89515.128720] [] ? dump_stack+0x41/0x51
[89515.128733] [] ? spl_panic+0xc2/0x100 [spl]
[89515.128771] [] ? __vdev_disk_physio+0x3d5/0x430 [zfs]
[89515.128789] [] ? buf_cons+0x8b/0xe0 [zfs]
[89515.128798] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89515.128816] [] ? arc_get_data_buf.isra.23+0x3ef/0x430 [zfs]
[89515.128835] [] ? arc_buf_alloc+0x133/0x1a0 [zfs]
[89515.128853] [] ? arc_read+0x183/0xa70 [zfs]
[89515.128873] [] ? dmu_buf_rele+0x20/0x20 [zfs]
[89515.128893] [] ? dbuf_read+0x2d7/0x8c0 [zfs]
[89515.128900] [] ? mutex_lock+0xe/0x2a
[89515.128920] [] ? __dbuf_hold_impl+0x434/0x4c0 [zfs]
[89515.128940] [] ? dbuf_hold_impl+0x7b/0xc0 [zfs]
[89515.128960] [] ? dbuf_hold_level+0x1c/0x30 [zfs]
[89515.128986] [] ? dmu_tx_check_ioerr+0x4b/0x110 [zfs]
[89515.129011] [] ? dmu_tx_count_write+0x43f/0x730 [zfs]
[89515.129030] [] ? dbuf_read+0x6bb/0x8c0 [zfs]
[89515.129049] [] ? dbuf_rele_and_unlock+0x240/0x3d0 [zfs]
[89515.129068] [] ? dbuf_hold_impl+0x8f/0xc0 [zfs]
[89515.129074] [] ? mutex_lock+0xe/0x2a
[89515.129082] [] ? kmem_cache_alloc_node_trace+0x1e3/0x1f0
[89515.129090] [] ? spl_kmem_zalloc+0x9f/0x190 [spl]
[89515.129116] [] ? dmu_tx_hold_write+0x41/0x80 [zfs]
[89515.129141] [] ? zfs_write+0x397/0xb10 [zfs]
[89515.129147] [] ? __remove_hrtimer+0x68/0xc0
[89515.129155] [] ? update_rmtp+0x70/0x70
[89515.129177] [] ? zpl_write_common_iovec+0xb0/0x110 [zfs]
[89515.129200] [] ? zpl_write+0x89/0xc0 [zfs]
[89515.129207] [] ? vfs_write+0xc5/0x1f0
[89515.129212] [] ? SyS_pwrite64+0x9b/0xb0
[89515.129219] [] ? system_call_fast_compare_end+0x10/0x15
[89515.822585] VERIFY3((arc_stats.arcstat_c.value.ui64) >= 2ULL << 24) failed (33347628 >= 33554432)
[89515.822677] PANIC at arc.c:3760:arc_adapt()
[89515.822722] Showing stack for process 2878
[89515.822727] CPU: 12 PID: 2878 Comm: txg_sync Tainted: P O 3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u3bpo70+1
[89515.822730] Hardware name: Supermicro H8DGT /H8DGT , BIOS 3.0 08/31/2012
[89515.822733] 0000000000000000 ffffffffa129d0f0 ffffffff815457d4 ffffffffa12a6120
[89515.822738] ffffffffa17a90d2 0000000000000000 ffffffffa12a61f0 0000000000000000
[89515.822744] 0000000000000030 ffff8807fee0b778 ffff8807fee0b718 0000000000000202
[89515.822749] Call Trace:
[89515.822764] [] ? dump_stack+0x41/0x51
[89515.822772] [] ? spl_panic+0xc2/0x100 [spl]
[89515.822791] [] ? buf_cons+0x8b/0xe0 [zfs]
[89515.822795] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89515.822808] [] ? arc_get_data_buf.isra.23+0x3ef/0x430 [zfs]
[89515.822820] [] ? arc_buf_alloc+0x133/0x1a0 [zfs]
[89515.822832] [] ? arc_read+0x183/0xa70 [zfs]
[89515.822845] [] ? dmu_buf_rele+0x20/0x20 [zfs]
[89515.822859] [] ? dbuf_read+0x2d7/0x8c0 [zfs]
[89515.822871] [] ? dbuf_sync_indirect+0x18a/0x1a0 [zfs]
[89515.822881] [] ? dbuf_sync_list+0xca/0xf0 [zfs]
[89515.822892] [] ? dbuf_sync_indirect+0xf3/0x1a0 [zfs]
[89515.822902] [] ? dbuf_sync_list+0xca/0xf0 [zfs]
[89515.822913] [] ? dbuf_sync_indirect+0xf3/0x1a0 [zfs]
[89515.822924] [] ? dbuf_sync_list+0xca/0xf0 [zfs]
[89515.822938] [] ? dnode_sync+0x2d0/0x940 [zfs]
[89515.822949] [] ? dbuf_sync_list+0xca/0xf0 [zfs]
[89515.822963] [] ? dnode_sync+0x2d0/0x940 [zfs]
[89515.822976] [] ? dmu_objset_sync_dnodes+0xb6/0xe0 [zfs]
[89515.822989] [] ? dmu_objset_sync+0x196/0x2f0 [zfs]
[89515.822998] [] ? arc_cksum_compute.isra.20+0xe0/0xe0 [zfs]
[89515.823007] [] ? arc_evictable_memory+0x80/0x80 [zfs]
[89515.823016] [] ? l2arc_feed_thread+0xcb0/0xcb0 [zfs]
[89515.823032] [] ? dsl_dataset_sync+0x57/0xb0 [zfs]
[89515.823049] [] ? dsl_pool_sync+0x94/0x440 [zfs]
[89515.823068] [] ? spa_add+0x640/0x640 [zfs]
[89515.823086] [] ? spa_sync+0x35c/0xb30 [zfs]
[89515.823091] [] ? autoremove_wake_function+0xe/0x30
[89515.823094] [] ? __wake_up+0x48/0x70
[89515.823111] [] ? txg_sync_thread+0x3ca/0x650 [zfs]
[89515.823129] [] ? txg_fini+0x2c0/0x2c0 [zfs]
[89515.823134] [] ? thread_generic_wrapper+0x7a/0x90 [spl]
[89515.823138] [] ? __thread_create+0x160/0x160 [spl]
[89515.823143] [] ? kthread+0xc1/0xe0
[89515.823147] [] ? flush_kthread_worker+0xb0/0xb0
[89515.823151] [] ? ret_from_fork+0x58/0x90
[89515.823154] [] ? flush_kthread_worker+0xb0/0xb0
[89523.178124] VERIFY3((arc_stats.arcstat_c.value.ui64) >= 2ULL << 24) failed (33347628 >= 33554432)
[89523.178217] PANIC at arc.c:3760:arc_adapt()
[89523.178263] Showing stack for process 21712
[89523.178270] CPU: 14 PID: 21712 Comm: python Tainted: P O 3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u3bpo70+1
[89523.178274] Hardware name: Supermicro H8DGT /H8DGT , BIOS 3.0 08/31/2012
[89523.178278] 0000000000000000 ffffffffa129d0f0 ffffffff815457d4 ffffffffa12a6120
[89523.178285] ffffffffa17a90d2 0000000100000001 ffffffffa12a61f0 0000000100000001
[89523.178290] 0000000000000030 ffff880b55e779e8 ffff880b55e77988 ffff88081fdd2ec0
[89523.178297] Call Trace:
[89523.178316] [] ? dump_stack+0x41/0x51
[89523.178329] [] ? spl_panic+0xc2/0x100 [spl]
[89523.178360] [] ? buf_cons+0x8b/0xe0 [zfs]
[89523.178368] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89523.178387] [] ? arc_get_data_buf.isra.23+0x3ef/0x430 [zfs]
[89523.178406] [] ? arc_read+0x43f/0xa70 [zfs]
[89523.178426] [] ? dmu_buf_rele+0x20/0x20 [zfs]
[89523.178446] [] ? dbuf_read+0x2d7/0x8c0 [zfs]
[89523.178469] [] ? dmu_buf_hold+0x70/0x90 [zfs]
[89523.178500] [] ? zap_lockdir+0x75/0x8b0 [zfs]
[89523.178530] [] ? zap_cursor_retrieve+0x1e4/0x2f0 [zfs]
[89523.178563] [] ? sa_lookup+0x86/0xb0 [zfs]
[89523.178571] [] ? filldir+0x9e/0x110
[89523.178596] [] ? zfs_readdir+0x131/0x450 [zfs]
[89523.178604] [] ? handle_mm_fault+0x8f0/0x1140
[89523.178612] [] ? cache_grow+0x15e/0x240
[89523.178620] [] ? __do_page_fault+0x29a/0x540
[89523.178642] [] ? zpl_iterate+0x61/0xa0 [zfs]
[89523.178648] [] ? iterate_dir+0xc7/0x150
[89523.178654] [] ? SyS_getdents+0x98/0x120
[89523.178658] [] ? filldir64+0x110/0x110
[89523.178666] [] ? system_call_fast_compare_end+0x10/0x15
[89710.765193] VERIFY3((arc_stats.arcstat_c.value.ui64) >= 2ULL << 24) failed (33347628 >= 33554432)
[89710.765288] PANIC at arc.c:3760:arc_adapt()
[89710.765334] Showing stack for process 4503
[89710.765341] CPU: 19 PID: 4503 Comm: qemu-system-x86 Tainted: P O 3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u3bpo70+1
[89710.765345] Hardware name: Supermicro H8DGT /H8DGT , BIOS 3.0 08/31/2012
[89710.765349] 0000000000000000 ffffffffa129d0f0 ffffffff815457d4 ffffffffa12a6120
[89710.765356] ffffffffa17a90d2 ffff8807cf40b8f8 ffffffffa12a61f0 00000000ffffffff
[89710.765361] ffff880700000030 ffff8807cf40b9b8 ffff8807cf40b958 ffff88070000001a
[89710.765367] Call Trace:
[89710.765387] [] ? dump_stack+0x41/0x51
[89710.765400] [] ? spl_panic+0xc2/0x100 [spl]
[89710.765440] [] ? buf_cons+0x8b/0xe0 [zfs]
[89710.765448] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89710.765467] [] ? arc_get_data_buf.isra.23+0x3ef/0x430 [zfs]
[89710.765486] [] ? arc_buf_alloc+0x133/0x1a0 [zfs]
[89710.765504] [] ? arc_read+0x183/0xa70 [zfs]
[89710.765524] [] ? dmu_buf_rele+0x20/0x20 [zfs]
[89710.765544] [] ? dbuf_read+0x2d7/0x8c0 [zfs]
[89710.765551] [] ? mutex_lock+0xe/0x2a
[89710.765571] [] ? __dbuf_hold_impl+0x434/0x4c0 [zfs]
[89710.765590] [] ? __dbuf_hold_impl+0x18e/0x4c0 [zfs]
[89710.765610] [] ? dbuf_hold_impl+0x7b/0xc0 [zfs]
[89710.765630] [] ? dbuf_hold+0x1d/0x30 [zfs]
[89710.765652] [] ? dmu_buf_hold_array_by_dnode+0x102/0x470 [zfs]
[89710.765675] [] ? dmu_read_uio_dnode+0x40/0xd0 [zfs]
[89710.765699] [] ? dmu_read_uio_dbuf+0x3f/0x60 [zfs]
[89710.765725] [] ? zfs_read+0x140/0x3f0 [zfs]
[89710.765747] [] ? zpl_read_common_iovec+0x85/0xd0 [zfs]
[89710.765769] [] ? zpl_read+0x89/0xc0 [zfs]
[89710.765777] [] ? vfs_read+0xab/0x180
[89710.765781] [] ? SyS_pread64+0x9b/0xb0
[89710.765788] [] ? system_call_fast_compare_end+0x10/0x15
[89725.471373] INFO: task txg_sync:2878 blocked for more than 120 seconds.
[89725.471436] Tainted: P O 3.16.0-0.bpo.4-amd64 #1
[89725.471487] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89725.471571] txg_sync D ffff88101fd12ec0 0 2878 2 0x00000000
[89725.471579] ffff88081c62eb20 0000000000000046 000000000000000a ffff8800dba7ece0
[89725.471585] 0000000000012ec0 ffff8807fee0bfd8 0000000000012ec0 ffff8800dba7ece0
[89725.471591] ffff880feb570000 ffffffffa12a6120 ffffffffa129d0f0 0000000000000eb0
[89725.471597] Call Trace:
[89725.471623] [] ? spl_panic+0xf5/0x100 [spl]
[89725.471660] [] ? buf_cons+0x8b/0xe0 [zfs]
[89725.471668] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89725.471687] [] ? arc_get_data_buf.isra.23+0x3ef/0x430 [zfs]
[89725.471705] [] ? arc_buf_alloc+0x133/0x1a0 [zfs]
[89725.471724] [] ? arc_read+0x183/0xa70 [zfs]
[89725.471744] [] ? dmu_buf_rele+0x20/0x20 [zfs]
[89725.471764] [] ? dbuf_read+0x2d7/0x8c0 [zfs]
[89725.471785] [] ? dbuf_sync_indirect+0x18a/0x1a0 [zfs]
[89725.471805] [] ? dbuf_sync_list+0xca/0xf0 [zfs]
[89725.471825] [] ? dbuf_sync_indirect+0xf3/0x1a0 [zfs]
[89725.471844] [] ? dbuf_sync_list+0xca/0xf0 [zfs]
[89725.471864] [] ? dbuf_sync_indirect+0xf3/0x1a0 [zfs]
[89725.471884] [] ? dbuf_sync_list+0xca/0xf0 [zfs]
[89725.471910] [] ? dnode_sync+0x2d0/0x940 [zfs]
[89725.471930] [] ? dbuf_sync_list+0xca/0xf0 [zfs]
[89725.471957] [] ? dnode_sync+0x2d0/0x940 [zfs]
[89725.471981] [] ? dmu_objset_sync_dnodes+0xb6/0xe0 [zfs]
[89725.472005] [] ? dmu_objset_sync+0x196/0x2f0 [zfs]
[89725.472022] [] ? arc_cksum_compute.isra.20+0xe0/0xe0 [zfs]
[89725.472039] [] ? arc_evictable_memory+0x80/0x80 [zfs]
[89725.472056] [] ? l2arc_feed_thread+0xcb0/0xcb0 [zfs]
[89725.472085] [] ? dsl_dataset_sync+0x57/0xb0 [zfs]
[89725.472116] [] ? dsl_pool_sync+0x94/0x440 [zfs]
[89725.472150] [] ? spa_add+0x640/0x640 [zfs]
[89725.472184] [] ? spa_sync+0x35c/0xb30 [zfs]
[89725.472194] [] ? autoremove_wake_function+0xe/0x30
[89725.472200] [] ? __wake_up+0x48/0x70
[89725.472232] [] ? txg_sync_thread+0x3ca/0x650 [zfs]
[89725.472266] [] ? txg_fini+0x2c0/0x2c0 [zfs]
[89725.472275] [] ? thread_generic_wrapper+0x7a/0x90 [spl]
[89725.472283] [] ? __thread_create+0x160/0x160 [spl]
[89725.472292] [] ? kthread+0xc1/0xe0
[89725.472299] [] ? flush_kthread_worker+0xb0/0xb0
[89725.472306] [] ? ret_from_fork+0x58/0x90
[89725.472312] [] ? flush_kthread_worker+0xb0/0xb0
[89725.472335] INFO: task python:21712 blocked for more than 120 seconds.
[89725.472389] Tainted: P O 3.16.0-0.bpo.4-amd64 #1
[89725.472439] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89725.472550] python D ffff88101fd92ec0 0 21712 1 0x00000000
[89725.472556] ffff88081c6414b0 0000000000000086 000000000000000a ffff880a177d09a0
[89725.472561] 0000000000012ec0 ffff880b55e77fd8 0000000000012ec0 ffff880a177d09a0
[89725.472566] ffff880feb570000 ffffffffa12a6120 ffffffffa129d0f0 0000000000000eb0
[89725.472572] Call Trace:
[89725.472583] [] ? spl_panic+0xf5/0x100 [spl]
[89725.472603] [] ? buf_cons+0x8b/0xe0 [zfs]
[89725.472612] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89725.472630] [] ? arc_get_data_buf.isra.23+0x3ef/0x430 [zfs]
[89725.472648] [] ? arc_read+0x43f/0xa70 [zfs]
[89725.472668] [] ? dmu_buf_rele+0x20/0x20 [zfs]
[89725.472688] [] ? dbuf_read+0x2d7/0x8c0 [zfs]
[89725.472712] [] ? dmu_buf_hold+0x70/0x90 [zfs]
[89725.472742] [] ? zap_lockdir+0x75/0x8b0 [zfs]
[89725.472773] [] ? zap_cursor_retrieve+0x1e4/0x2f0 [zfs]
[89725.472806] [] ? sa_lookup+0x86/0xb0 [zfs]
[89725.472813] [] ? filldir+0x9e/0x110
[89725.472839] [] ? zfs_readdir+0x131/0x450 [zfs]
[89725.472847] [] ? handle_mm_fault+0x8f0/0x1140
[89725.472854] [] ? cache_grow+0x15e/0x240
[89725.472862] [] ? __do_page_fault+0x29a/0x540
[89725.472886] [] ? zpl_iterate+0x61/0xa0 [zfs]
[89725.472891] [] ? iterate_dir+0xc7/0x150
[89725.472897] [] ? SyS_getdents+0x98/0x120
[89725.472902] [] ? filldir64+0x110/0x110
[89725.472909] [] ? system_call_fast_compare_end+0x10/0x15
[89725.472946] INFO: task qemu-system-x86:30702 blocked for more than 120 seconds.
[89725.473028] Tainted: P O 3.16.0-0.bpo.4-amd64 #1
[89725.473079] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89725.473161] qemu-system-x86 D ffff88101fcd2ec0 0 30702 1 0x00000000
[89725.473166] ffff88081c62f470 0000000000000082 000000000000000a ffff880ffea842d0
[89725.473171] 0000000000012ec0 ffff880b722f3fd8 0000000000012ec0 ffff880ffea842d0
[89725.473176] ffff880feb570000 ffffffffa12a6120 ffffffffa129d0f0 0000000000000eb0
[89725.473181] Call Trace:
[89725.473192] [] ? spl_panic+0xf5/0x100 [spl]
[89725.473215] [] ? zio_vdev_io_start+0xb3/0x300 [zfs]
[89725.473234] [] ? buf_cons+0x8b/0xe0 [zfs]
[89725.473242] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89725.473261] [] ? dbuf_cons+0x98/0x100 [zfs]
[89725.473268] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89725.473286] [] ? arc_get_data_buf.isra.23+0x3ef/0x430 [zfs]
[89725.473304] [] ? arc_buf_alloc+0x133/0x1a0 [zfs]
[89725.473322] [] ? arc_read+0x183/0xa70 [zfs]
[89725.473340] [] ? dbuf_create+0x356/0x4e0 [zfs]
[89725.473360] [] ? dmu_buf_rele+0x20/0x20 [zfs]
[89725.473365] [] ? mutex_lock+0xe/0x2a
[89725.473385] [] ? dbuf_read+0x2d7/0x8c0 [zfs]
[89725.473409] [] ? dmu_buf_hold_array_by_dnode+0x123/0x470 [zfs]
[89725.473432] [] ? dmu_read_uio_dnode+0x40/0xd0 [zfs]
[89725.473455] [] ? dmu_read_uio_dbuf+0x3f/0x60 [zfs]
[89725.473479] [] ? zfs_read+0x140/0x3f0 [zfs]
[89725.473501] [] ? zpl_read_common_iovec+0x85/0xd0 [zfs]
[89725.473524] [] ? zpl_iter_read+0xb7/0x100 [zfs]
[89725.473533] [] ? do_iter_readv_writev+0x5b/0x90
[89725.473555] [] ? zpl_read_common_iovec+0xd0/0xd0 [zfs]
[89725.473561] [] ? do_readv_writev+0xdd/0x300
[89725.473582] [] ? zpl_read_common_iovec+0xd0/0xd0 [zfs]
[89725.473588] [] ? __wake_up_common+0x57/0x90
[89725.473596] [] ? fsnotify+0x1cc/0x260
[89725.473602] [] ? eventfd_write+0x19f/0x210
[89725.473608] [] ? SyS_preadv+0xcb/0xd0
[89725.473614] [] ? system_call_fast_compare_end+0x10/0x15
[89725.473620] INFO: task qemu-system-x86:17238 blocked for more than 120 seconds.
[89725.473701] Tainted: P O 3.16.0-0.bpo.4-amd64 #1
[89725.473773] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89725.473855] qemu-system-x86 D ffff88201ed52ec0 0 17238 1 0x00000000
[89725.473860] ffff88081c67f5f0 0000000000000086 ffff88101cbdfb18 ffff8800b03ba1d0
[89725.473865] 0000000000012ec0 ffff8803e8933fd8 0000000000012ec0 ffff8800b03ba1d0
[89725.473870] 0000000000000202 ffff88101cbdfb68 ffff88101cbdfa20 ffff88101cbdfb70
[89725.473875] Call Trace:
[89725.473885] [] ? cv_wait_common+0xf5/0x130 [spl]
[89725.473890] [] ? __wake_up_sync+0x20/0x20
[89725.473922] [] ? txg_wait_open+0xbb/0x100 [zfs]
[89725.473947] [] ? dmu_tx_wait+0x392/0x3a0 [zfs]
[89725.473973] [] ? dmu_tx_assign+0x96/0x540 [zfs]
[89725.473998] [] ? zfs_write+0x3b1/0xb10 [zfs]
[89725.474004] [] ? __remove_hrtimer+0x68/0xc0
[89725.474012] [] ? update_rmtp+0x70/0x70
[89725.474034] [] ? zpl_write_common_iovec+0xb0/0x110 [zfs]
[89725.474057] [] ? zpl_write+0x89/0xc0 [zfs]
[89725.474063] [] ? vfs_write+0xc5/0x1f0
[89725.474067] [] ? SyS_pwrite64+0x9b/0xb0
[89725.474073] [] ? system_call_fast_compare_end+0x10/0x15
[89725.474078] INFO: task qemu-system-x86:17253 blocked for more than 120 seconds.
[89725.474160] Tainted: P O 3.16.0-0.bpo.4-amd64 #1
[89725.474210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89725.474292] qemu-system-x86 D ffff88181fd12ec0 0 17253 1 0x00000000
[89725.474297] ffff88081c65b530 0000000000000086 ffff88101cbdfb18 ffff881ffd402010
[89725.474303] 0000000000012ec0 ffff8819ea2dffd8 0000000000012ec0 ffff881ffd402010
[89725.474308] 0000000000000202 ffff88101cbdfb68 ffff88101cbdfa20 ffff88101cbdfb70
[89725.474313] Call Trace:
[89725.474322] [] ? cv_wait_common+0xf5/0x130 [spl]
[89725.474327] [] ? __wake_up_sync+0x20/0x20
[89725.474359] [] ? txg_wait_open+0xbb/0x100 [zfs]
[89725.474385] [] ? dmu_tx_wait+0x392/0x3a0 [zfs]
[89725.474410] [] ? dmu_tx_assign+0x96/0x540 [zfs]
[89725.474435] [] ? zfs_write+0x3b1/0xb10 [zfs]
[89725.474440] [] ? __remove_hrtimer+0x68/0xc0
[89725.474448] [] ? update_rmtp+0x70/0x70
[89725.474470] [] ? zpl_write_common_iovec+0xb0/0x110 [zfs]
[89725.474492] [] ? zpl_write+0x89/0xc0 [zfs]
[89725.474497] [] ? vfs_write+0xc5/0x1f0
[89725.474502] [] ? SyS_pwrite64+0x9b/0xb0
[89725.474508] [] ? system_call_fast_compare_end+0x10/0x15
[89725.474514] INFO: task qemu-system-x86:11716 blocked for more than 120 seconds.
[89725.474595] Tainted: P O 3.16.0-0.bpo.4-amd64 #1
[89725.474645] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89725.474727] qemu-system-x86 D ffff88181fd52ec0 0 11716 1 0x00000000
[89725.474732] ffff88081c65abe0 0000000000000086 ffff88101cbdfb18 ffff8807fe1eeba0
[89725.474737] 0000000000012ec0 ffff8802d238bfd8 0000000000012ec0 ffff8807fe1eeba0
[89725.474742] 0000000000000202 ffff88101cbdfb68 ffff88101cbdfa20 ffff88101cbdfb70
[89725.474747] Call Trace:
[89725.474756] [] ? cv_wait_common+0xf5/0x130 [spl]
[89725.474762] [] ? __wake_up_sync+0x20/0x20
[89725.474793] [] ? txg_wait_open+0xbb/0x100 [zfs]
[89725.474818] [] ? dmu_tx_wait+0x392/0x3a0 [zfs]
[89725.474844] [] ? dmu_tx_assign+0x96/0x540 [zfs]
[89725.474869] [] ? zfs_write+0x3b1/0xb10 [zfs]
[89725.474878] [] ? native_sched_clock+0x2d/0x80
[89725.474884] [] ? sched_clock+0x5/0x10
[89725.474889] [] ? __schedule+0x2de/0x770
[89725.474894] [] ? __remove_hrtimer+0x68/0xc0
[89725.474916] [] ? zpl_write_common_iovec+0xb0/0x110 [zfs]
[89725.474938] [] ? zpl_iter_write+0xb7/0x100 [zfs]
[89725.474945] [] ? do_iter_readv_writev+0x5b/0x90
[89725.474968] [] ? zpl_write_common_iovec+0x110/0x110 [zfs]
[89725.474972] [] ? do_readv_writev+0xdd/0x300
[89725.474994] [] ? zpl_write_common_iovec+0x110/0x110 [zfs]
[89725.474999] [] ? __wake_up_common+0x57/0x90
[89725.475020] [] ? zpl_read+0xc0/0xc0 [zfs]
[89725.475026] [] ? fsnotify+0x1cc/0x260
[89725.475031] [] ? eventfd_write+0x19f/0x210
[89725.475038] [] ? SyS_pwritev+0xcb/0xd0
[89725.475044] [] ? system_call_fast_compare_end+0x10/0x15
[89725.475050] INFO: task qemu-system-x86:15748 blocked for more than 120 seconds.
[89725.475189] Tainted: P O 3.16.0-0.bpo.4-amd64 #1
[89725.475243] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89725.475331] qemu-system-x86 D ffff88181fc12ec0 0 15748 1 0x00000000
[89725.475366] ffff88081c640210 0000000000000086 ffff88101cbdfb18 ffff880b8d61e3d0
[89725.475376] 0000000000012ec0 ffff880f5718bfd8 0000000000012ec0 ffff880b8d61e3d0
[89725.475382] 0000000000000202 ffff88101cbdfb68 ffff88101cbdfa20 ffff88101cbdfb70
[89725.475387] Call Trace:
[89725.475397] [] ? cv_wait_common+0xf5/0x130 [spl]
[89725.475403] [] ? __wake_up_sync+0x20/0x20
[89725.475434] [] ? txg_wait_open+0xbb/0x100 [zfs]
[89725.475459] [] ? dmu_tx_wait+0x392/0x3a0 [zfs]
[89725.475485] [] ? dmu_tx_assign+0x96/0x540 [zfs]
[89725.475509] [] ? zfs_write+0x3b1/0xb10 [zfs]
[89725.475529] [] ? dbuf_dirty+0x456/0x9a0 [zfs]
[89725.475534] [] ? mutex_lock+0xe/0x2a
[89725.475542] [] ? set_next_entity+0x3a/0x80
[89725.475548] [] ? __schedule+0x2de/0x770
[89725.475553] [] ? __remove_hrtimer+0x68/0xc0
[89725.475575] [] ? zpl_write_common_iovec+0xb0/0x110 [zfs]
[89725.475597] [] ? zpl_iter_write+0xb7/0x100 [zfs]
[89725.475605] [] ? do_iter_readv_writev+0x5b/0x90
[89725.475627] [] ? zpl_write_common_iovec+0x110/0x110 [zfs]
[89725.475631] [] ? do_readv_writev+0xdd/0x300
[89725.475653] [] ? zpl_write_common_iovec+0x110/0x110 [zfs]
[89725.475658] [] ? __wake_up_common+0x57/0x90
[89725.475679] [] ? zpl_read+0xc0/0xc0 [zfs]
[89725.475685] [] ? fsnotify+0x1cc/0x260
[89725.475690] [] ? eventfd_write+0x19f/0x210
[89725.475709] [] ? SyS_pwritev+0xcb/0xd0
[89725.475736] [] ? system_call_fast_compare_end+0x10/0x15
[89725.475755] INFO: task qemu-system-x86:17105 blocked for more than 120 seconds.
[89725.475841] Tainted: P O 3.16.0-0.bpo.4-amd64 #1
[89725.475893] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89725.475975] qemu-system-x86 D ffff88081fc92ec0 0 17105 1 0x00000000
[89725.475981] ffff88081c6033b0 0000000000000082 ffff88101cbdfb18 ffff8817f69d33b0
[89725.475986] 0000000000012ec0 ffff8811740abfd8 0000000000012ec0 ffff8817f69d33b0
[89725.475991] 0000000000000202 ffff88101cbdfb68 ffff88101cbdfa20 ffff88101cbdfb70
[89725.475996] Call Trace:
[89725.476005] [] ? cv_wait_common+0xf5/0x130 [spl]
[89725.476011] [] ? __wake_up_sync+0x20/0x20
[89725.476042] [] ? txg_wait_open+0xbb/0x100 [zfs]
[89725.476068] [] ? dmu_tx_wait+0x392/0x3a0 [zfs]
[89725.476094] [] ? dmu_tx_assign+0x96/0x540 [zfs]
[89725.476141] [] ? zfs_write+0x3b1/0xb10 [zfs]
[89725.476168] [] ? __remove_hrtimer+0x68/0xc0
[89725.476197] [] ? update_rmtp+0x70/0x70
[89725.476241] [] ? zpl_write_common_iovec+0xb0/0x110 [zfs]
[89725.476285] [] ? zpl_write+0x89/0xc0 [zfs]
[89725.476304] [] ? vfs_write+0xc5/0x1f0
[89725.476309] [] ? SyS_pwrite64+0x9b/0xb0
[89725.476315] [] ? system_call_fast_compare_end+0x10/0x15
[89725.476320] INFO: task qemu-system-x86:17109 blocked for more than 120 seconds.
[89725.476401] Tainted: P O 3.16.0-0.bpo.4-amd64 #1
[89725.476452] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[89725.476533] qemu-system-x86 D ffff88101fc92ec0 0 17109 1 0x00000000
[89725.476538] ffff88081c616190 0000000000000082 000000000000000a ffff8817fa8dab60
[89725.476543] 0000000000012ec0 ffff88156c073fd8 0000000000012ec0 ffff8817fa8dab60
[89725.476548] ffff880feb570000 ffffffffa12a6120 ffffffffa129d0f0 0000000000000eb0
[89725.476553] Call Trace:
[89725.476564] [] ? spl_panic+0xf5/0x100 [spl]
[89725.476595] [] ? __vdev_disk_physio+0x3d5/0x430 [zfs]
[89725.476635] [] ? buf_cons+0x8b/0xe0 [zfs]
[89725.476664] [] ? spl_kmem_cache_alloc+0x133/0x710 [spl]
[89725.476703] [] ? arc_get_data_buf.isra.23+0x3ef/0x430 [zfs]
[89725.476742] [] ? arc_buf_alloc+0x133/0x1a0 [zfs]
[89725.476781] [] ? arc_read+0x183/0xa70 [zfs]
[89725.476820] [] ? dmu_buf_rele+0x20/0x20 [zfs]
[89725.476840] [

ryao · 2015-10-09T14:32:10Z

Thanks for your report. This is caused by 121b3ca. @dweeezil appears to have missed a case where arc_c can drop below 2ULL << SPA_MAXBLOCKSHIFT. As long as you are not setting recordsize above 128K, you can probably safely delete the VERIFY statement and recompile. The actual fix will not be quite as simple though. I do not expect anyone to do it this week.

dweeezil · 2015-10-09T14:50:42Z

I think we may need to modify arc_shrink() to not let arc_c fall too low. There are some other ASSERTs related to this which would also likely be triggered in a debug build were this to happen.

dweeezil · 2015-10-09T14:51:39Z

As an aside, I can't imagine a system on which arc_c fell so low would have very pleasant performance.

ghost · 2015-10-13T07:10:25Z

Thanks for the information. I found a bug in our configuration: I had set the maximum ARC size as module parameters, but had set the minimum size for some unknown reason commented out, which (I guess) under high memory pressure from virtual machines could cause ARC size to be reduced below 32 MB.

behlendorf · 2015-10-13T16:17:30Z

@tienju thanks for following up, I'm going to reopen this issue. We shouldn't allow the system to panic like this regardless of how you've tuned the system. As a short term fix for a point release let's just change this from a VERIFY to an ASSERT and impose a hard floor here. @dweezil @ryao are you OK with this?

diff --git a/module/zfs/arc.c b/module/zfs/arc.c
index b759e64..fa1434e 100644
--- a/module/zfs/arc.c
+++ b/module/zfs/arc.c
@@ -3757,7 +3757,8 @@ arc_adapt(int bytes, arc_state_t *state)
         * If we're within (2 * maxblocksize) bytes of the target
         * cache size, increment the target cache size
         */
-       VERIFY3U(arc_c, >=, 2ULL << SPA_MAXBLOCKSHIFT);
+       ASSERT3U(arc_c, >=, 2ULL << SPA_MAXBLOCKSHIFT);
+       arc_c = MAX(arc_c, 2ULL << SPA_MAXBLOCKSHIFT);
        if (arc_size >= arc_c - (2ULL << SPA_MAXBLOCKSHIFT)) {
                atomic_add_64(&arc_c, (int64_t)bytes);
                if (arc_c > arc_c_max)

behlendorf · 2015-10-13T16:21:21Z

Oh, and I also wanted to comment that while arc_c is updated atomically there's no locking around some of this logic. So it may not be that a case was overlooked so much as that two updates ran concurrently to push us over the limit.

Strictly enforce keeping 'arc_c >= arc_c_min'. The ASSERTs are left in place to catch this in a debug build but logic has been added to gracefully handle in a production build. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3904

dweeezil · 2015-10-14T13:49:30Z

@behlendorf Your explanation of concurrent updates, likely from the kernel-called shrinker and the arc reclaim thread make sense. The 935434e commit (already committed, I see) seems reasonable for the time being. However, since arc_shrink() is being called from multiple threads and, therefore, arc_c is being modified from multiple threads, there seem to be a bunch of other related issues involving checking and then setting arc_c (toctotou bugs, I suppose). We'll probably want to add an update helper (maybe a macro similar to ARCSTAT_MAX()) which uses atomic_cas_64() to more safely update it.

behlendorf · 2015-10-14T15:02:37Z

Yes, I jumped the gun a bit. Let's call it a place holder until we can address this concurrent update properly. I think a helper macro could be a nice way to go.

Since arc_c can be updated concurrently from multiple threads, there's a race condition between any check of its value and the subsequent updating of it. This patch updates its value using atomic_cas_64() and re-tries the test if it has been changed by another thread. Fixes: openzfs#3904 Fixes: openzfs#4161

Since arc_c can be updated concurrently from multiple threads, there's a race condition between any check of its value and the subsequent updating of it. This patch updates its value using atomic_cas_64() and re-tries the test if it has been changed by another thread. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161

Since arc_c can be updated concurrently from multiple threads, there's a race condition between any check of its value and the subsequent updating of it. This patch updates it under a mutex. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161

The arc_c value can be updated concurrently by multiple threads including the arc reclaim thread, kswapd and also user processes. This patch updates it under a mutex to close the race between the checking of its value and subsequent updates. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161

The arc_c value can be updated concurrently by multiple threads including the arc reclaim thread, kswapd, kthreadd and user processes. This patch updates it under a mutex to close the race between the checking of its value and subsequent updates. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161

The arc_c value can be updated concurrently by multiple threads including the arc reclaim thread, kswapd, kthreadd and user processes. This patch updates it under a mutex to close the race between the checking of its value and subsequent updates. Also, since tunables can cause arc_c_min and arc_c_max to change at any time, they are also updated under the mutex. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161

The arc_c value can be updated concurrently by multiple threads including the arc reclaim thread, kswapd, kthreadd and user processes. This patch updates it under a mutex to close the race between the checking of its value and subsequent updates. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161

Since arc_c can be updated concurrently from multiple threads, there's a race condition between any check of its value and the subsequent updating of it. This patch updates it under a mutex. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161

Adjusting arc_c directly is racy because it can happen in the context of multiple threads. It should always be >= 2 * maxblocksize. Set it to a known valid value rather than adjusting it directly. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161

The arc_c value can be updated concurrently by multiple threads including the arc reclaim thread, kswapd, kthreadd and user processes. This patch updates it under a mutex to close the race between the checking of its value and subsequent updates. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161

Adjusting arc_c directly is racy because it can happen in the context of multiple threads. It should always be >= 2 * maxblocksize. Set it to a known valid value rather than adjusting it directly. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161

Adjusting arc_c directly is racy because it can happen in the context of multiple threads. It should always be >= 2 * maxblocksize. Set it to a known valid value rather than adjusting it directly. In addition refactor arc_shrink() to a simpler structure, protect against underflow in the calculation of the new arc_c value. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reverts: 935434e Closes: #3904 Closes: #4161

Adjusting arc_c directly is racy because it can happen in the context of multiple threads. It should always be >= 2 * maxblocksize. Set it to a known valid value rather than adjusting it directly. In addition refactor arc_shrink() to a simpler structure, protect against underflow in the calculation of the new arc_c value. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Reverts: 935434e Closes: openzfs#3904 Closes: openzfs#4161

ghost closed this as completed Oct 13, 2015

behlendorf reopened this Oct 13, 2015

behlendorf added the Bug - Minor label Oct 13, 2015

behlendorf added this to the 0.6.5.3 milestone Oct 13, 2015

behlendorf modified the milestones: 0.7.0, 0.6.5.3 Oct 15, 2015

behlendorf mentioned this issue Dec 10, 2015

txg_sync and z_rd_int/* become top I/O consumers, the host renders unusable for a period of time #1538

Closed

dweeezil mentioned this issue Jan 3, 2016

PANIC at arc.c:3194:arc_shrink() #4161

Closed

dweeezil mentioned this issue Jan 5, 2016

Update arc_c atomically #4165

Closed

This was referenced Jan 10, 2016

Update arc_c under a mutex #4197

Closed

Ran 'echo 3 > /proc/sys/vm/drop_caches', got a panic #4198

Closed

kernelOfTruth mentioned this issue Jan 16, 2016

[testing] ABD2: linear/scatter dual typed buffer for ARC ([upstream] rebase master January 13th 2016) #4225

Closed

kernelOfTruth mentioned this issue Jan 16, 2016

[single commit buildbot test] Update arc_c under a mutex #4232

Closed

dweeezil mentioned this issue Jan 22, 2016

Prevent arc_c collapse #4256

Closed

behlendorf closed this as completed in 1b8951b Jan 25, 2016

behlendorf mentioned this issue Feb 3, 2016

Kernel panic on sys_read/sys_epoll_wait (zfs 0.6.5.2+?) #4301

Closed

behlendorf modified the milestones: 0.6.5.5, 0.7.0 Mar 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PANIC at arc.c:3760:arc_adapt() #3904

PANIC at arc.c:3760:arc_adapt() #3904

ghost commented Oct 9, 2015

ryao commented Oct 9, 2015

dweeezil commented Oct 9, 2015

dweeezil commented Oct 9, 2015

ghost commented Oct 13, 2015

behlendorf commented Oct 13, 2015

behlendorf commented Oct 13, 2015

dweeezil commented Oct 14, 2015

behlendorf commented Oct 14, 2015

PANIC at arc.c:3760:arc_adapt() #3904

PANIC at arc.c:3760:arc_adapt() #3904

Comments

ghost commented Oct 9, 2015

ryao commented Oct 9, 2015

dweeezil commented Oct 9, 2015

dweeezil commented Oct 9, 2015

ghost commented Oct 13, 2015

behlendorf commented Oct 13, 2015

behlendorf commented Oct 13, 2015

dweeezil commented Oct 14, 2015

behlendorf commented Oct 14, 2015