(kmem.h:91:sanitize_flags()) FATAL allocation for task txg_sync (5947) which used GFP flags 0x79123a54 with PF_NOFS set #1268

nedbass · 2013-02-06T08:57:35Z

Ran into this trying to upgrade a pool from version=15.

[root@fedora-18-amd64 ~]# zpool create -f -o version=15 tank /tmp/a
[root@fedora-18-amd64 ~]# zpool upgrade tank
This system supports ZFS pool feature flags.


Message from syslogd@fedora-18-amd64 at Feb  6 00:47:50 ...
 kernel:[25810.183422] SPLError: 5947:0:(kmem.h:91:sanitize_flags()) FATAL allocation for task txg_sync (5947) which used GFP flags 0x79123a54 with PF_NOFS set

Message from syslogd@fedora-18-amd64 at Feb  6 00:47:50 ...
 kernel:[25810.183692] SPLError: 5947:0:(kmem.h:91:sanitize_flags()) SPL PANIC

The text was updated successfully, but these errors were encountered:

behlendorf · 2013-02-06T17:25:19Z

Did you happen to grab the stack from the console, it will include the offending function which needs its flags fixed. If not I'll just reproduce it in a VM.

nedbass · 2013-02-06T17:29:15Z

Yeah, it was late so I just did a quick cut and paste from my shell window.

[25810.183422] SPLError: 5947:0:(kmem.h:91:sanitize_flags()) FATAL allocation for task txg_sync (5947) which used GFP flags 0x79123a54 with PF_NOFS set
[25810.183692] SPLError: 5947:0:(kmem.h:91:sanitize_flags()) SPL PANIC
[25810.183871] SPL: Showing stack for process 5947
[25810.183873] Pid: 5947, comm: txg_sync Tainted: PF          O 3.7.5-201.fc18.x86_64 #1
[25810.183875] Call Trace:
[25810.183889]  [<ffffffffa01714d7>] spl_debug_dumpstack+0x27/0x40 [spl]
[25810.183893]  [<ffffffffa0172aa1>] spl_debug_bug+0x81/0xe0 [spl]
[25810.183899]  [<ffffffffa018939f>] sanitize_flags.part.6+0x66/0xcc7 [spl]
[25810.183903]  [<ffffffffa0176a87>] kmem_alloc_debug+0x307/0x420 [spl]
[25810.183928]  [<ffffffffa02be899>] dsl_dir_open_spa+0x69/0x840 [zfs]
[25810.183940]  [<ffffffffa0274c16>] ? dbuf_create_bonus+0x96/0x160 [zfs]
[25810.183954]  [<ffffffffa028df76>] dmu_objset_find_spa+0x56/0x6a0 [zfs]
[25810.183971]  [<ffffffffa02bf3a0>] ? dsl_dir_set_quota+0x140/0x140 [zfs]
[25810.183992]  [<ffffffffa031932d>] ? zap_add+0xed/0x1a0 [zfs]
[25810.184014]  [<ffffffffa02c2247>] dsl_pool_upgrade_dir_clones+0xe7/0x2e0 [zfs]
[25810.184065]  [<ffffffffa02f89ae>] ? txg_list_remove+0x6e/0xe0 [zfs]
[25810.184086]  [<ffffffffa02e1eff>] spa_sync+0x7df/0xd10 [zfs]
[25810.184100]  [<ffffffff81044b91>] ? pvclock_clocksource_read+0x61/0xf0
[25810.184118]  [<ffffffff810aea0c>] ? ktime_get_ts+0x4c/0xf0
[25810.184129]  [<ffffffffa02f8011>] txg_sync_thread+0x311/0x640 [zfs]
[25810.184149]  [<ffffffffa02f7d00>] ? txg_fini+0x3b0/0x3b0 [zfs]
[25810.184154]  [<ffffffffa017b8c1>] thread_generic_wrapper+0x81/0xe0 [spl]
[25810.184169]  [<ffffffffa017b840>] ? __thread_create+0x3a0/0x3a0 [spl]
[25810.184176]  [<ffffffff81081dd0>] kthread+0xc0/0xd0
[25810.184189]  [<ffffffff81010000>] ? ftrace_raw_event_xen_mmu_flush_tlb_others+0x50/0xe0
[25810.184189]  [<ffffffff81081d10>] ? kthread_create_on_node+0x120/0x120
[25810.184198]  [<ffffffff8163e0ec>] ret_from_fork+0x7c/0xb0
[25810.184198]  [<ffffffff81081d10>] ? kthread_create_on_node+0x120/0x120
[25810.184913] SPL: Dumping log to /tmp/spl-log.1360140470.5947

Two more locations where KM_SLEEP was used in a call which must use KM_PUSHPAGE were found while using the zpool upgrade command. See commit b8d06fc for additional details. Also make a small correction to the comment block above dsl_dir_open_spa(). Closes openzfs#1268

zrav · 2013-09-15T09:34:16Z

Running 0.6.2 on Ubuntu 12.04 LTS with 3.8.0-30 kernel:
[40557.683345] SPL: Fixing allocation for task txg_sync (616) which used GFP flags 0xfb9e9ba4 with PF_NOFS set [40557.683351] SPL: Showing stack for process 616 [40557.683355] Pid: 616, comm: txg_sync Tainted: PF O 3.8.0-30-generic #44~precise1-Ubuntu [40557.683358] Call Trace: [40557.683381] [<ffffffffa00724d7>] spl_debug_dumpstack+0x27/0x40 [spl] [40557.683393] [<ffffffffa00853e4>] sanitize_flags.part.10+0x68/0xc84 [spl] [40557.683400] [<ffffffffa00774d3>] kmem_alloc_debug+0x303/0x3b0 [spl] [40557.683404] [<ffffffff8135bbe2>] ? put_dec+0x72/0x90 [40557.683407] [<ffffffff8135cb05>] ? number.isra.2+0x355/0x390 [40557.683414] [<ffffffffa007725e>] ? kmem_alloc_debug+0x8e/0x3b0 [spl] [40557.683421] [<ffffffffa007b370>] task_alloc+0x1a0/0x350 [spl] [40557.683427] [<ffffffffa007725e>] ? kmem_alloc_debug+0x8e/0x3b0 [spl] [40557.683463] [<ffffffffa0184f20>] ? spa_add+0x4f0/0x4f0 [zfs] [40557.683470] [<ffffffffa007bbeb>] taskq_dispatch_delay+0x19b/0x2b0 [spl] [40557.683478] [<ffffffffa007d1ab>] ? taskq_cancel_id+0xeb/0x1e0 [spl] [40557.683481] [<ffffffff810ae88c>] ? ktime_get_ts+0x4c/0xe0 [40557.683507] [<ffffffffa0177a82>] spa_sync+0x1f2/0xae0 [zfs] [40557.683510] [<ffffffff810ae88c>] ? ktime_get_ts+0x4c/0xe0 [40557.683537] [<ffffffffa0188cef>] txg_sync_thread+0x2df/0x540 [zfs] [40557.683564] [<ffffffffa0188a10>] ? txg_init+0x250/0x250 [zfs] [40557.683572] [<ffffffffa007ac08>] thread_generic_wrapper+0x78/0x90 [spl] [40557.683579] [<ffffffffa007ab90>] ? __thread_create+0x310/0x310 [spl] [40557.683582] [<ffffffff8107f1b0>] kthread+0xc0/0xd0 [40557.683585] [<ffffffff8107f0f0>] ? flush_kthread_worker+0xb0/0xb0 [40557.683589] [<ffffffff816fcaec>] ret_from_fork+0x7c/0xb0 [40557.683591] [<ffffffff8107f0f0>] ? flush_kthread_worker+0xb0/0xb0
The call trace appears a few dozen more times in the syslog, 5 seconds apart. The pool was under load from about a dozen concurrent rsync jobs.

zrav · 2013-09-16T05:22:47Z

Negative, I'm running vanilla ZoL 0.6.2 stable, which I assume does
little to explain the issue...

On 16.09.2013 03:22, Tim Chase wrote:

@zrav https://github.com/zrav No doubt this is yet more fallout from
the restructured sync task. I presume you're running code that contains
13fe019
13fe019? I looked at this
briefly this morning but didn't have time to investigate any further
other than to tell that it has something to do with the management of
the spa deadman.

—
Reply to this email directly or view it on GitHub
#1268 (comment).

dweeezil · 2013-09-16T11:55:46Z

@zrav Yes, I must have been asleep when I sent my original reply which is why I withdrew it. Sorry for the noise but I've been busy chasing a few other allocation issues that fell out of the sync task restructuring. I will, however, take a fresh look at your stack trace in light of the 0.6.2 code.

behlendorf · 2013-09-17T21:53:53Z

@zrav I've filed your issue as #1729

Two more locations where KM_SLEEP was used in a call which must use KM_PUSHPAGE were found while using the zpool upgrade command. See commit b8d06fc for additional details. Also make a small correction to the comment block above dsl_dir_open_spa(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#1268

This was referenced Feb 6, 2013

Switch KM_SLEEP to KM_PUSHPAGE #1269

Closed

chown erases file permissions for v2-v4 filesystems #1264

Closed

behlendorf closed this as completed in ed2e157 Feb 6, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(kmem.h:91:sanitize_flags()) FATAL allocation for task txg_sync (5947) which used GFP flags 0x79123a54 with PF_NOFS set #1268

(kmem.h:91:sanitize_flags()) FATAL allocation for task txg_sync (5947) which used GFP flags 0x79123a54 with PF_NOFS set #1268

nedbass commented Feb 6, 2013

behlendorf commented Feb 6, 2013

nedbass commented Feb 6, 2013

zrav commented Sep 15, 2013

zrav commented Sep 16, 2013

dweeezil commented Sep 16, 2013

behlendorf commented Sep 17, 2013

(kmem.h:91:sanitize_flags()) FATAL allocation for task txg_sync (5947) which used GFP flags 0x79123a54 with PF_NOFS set #1268

(kmem.h:91:sanitize_flags()) FATAL allocation for task txg_sync (5947) which used GFP flags 0x79123a54 with PF_NOFS set #1268

Comments

nedbass commented Feb 6, 2013

behlendorf commented Feb 6, 2013

nedbass commented Feb 6, 2013

zrav commented Sep 15, 2013

zrav commented Sep 16, 2013

dweeezil commented Sep 16, 2013

behlendorf commented Sep 17, 2013