Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(kmem.h:91:sanitize_flags()) FATAL allocation for task txg_sync (5947) which used GFP flags 0x79123a54 with PF_NOFS set #1268

Closed
nedbass opened this issue Feb 6, 2013 · 6 comments
Milestone

Comments

@nedbass
Copy link
Contributor

nedbass commented Feb 6, 2013

Ran into this trying to upgrade a pool from version=15.

[root@fedora-18-amd64 ~]# zpool create -f -o version=15 tank /tmp/a
[root@fedora-18-amd64 ~]# zpool upgrade tank
This system supports ZFS pool feature flags.


Message from syslogd@fedora-18-amd64 at Feb  6 00:47:50 ...
 kernel:[25810.183422] SPLError: 5947:0:(kmem.h:91:sanitize_flags()) FATAL allocation for task txg_sync (5947) which used GFP flags 0x79123a54 with PF_NOFS set

Message from syslogd@fedora-18-amd64 at Feb  6 00:47:50 ...
 kernel:[25810.183692] SPLError: 5947:0:(kmem.h:91:sanitize_flags()) SPL PANIC
@behlendorf
Copy link
Contributor

Did you happen to grab the stack from the console, it will include the offending function which needs its flags fixed. If not I'll just reproduce it in a VM.

@nedbass
Copy link
Contributor Author

nedbass commented Feb 6, 2013

Yeah, it was late so I just did a quick cut and paste from my shell window.

[25810.183422] SPLError: 5947:0:(kmem.h:91:sanitize_flags()) FATAL allocation for task txg_sync (5947) which used GFP flags 0x79123a54 with PF_NOFS set
[25810.183692] SPLError: 5947:0:(kmem.h:91:sanitize_flags()) SPL PANIC
[25810.183871] SPL: Showing stack for process 5947
[25810.183873] Pid: 5947, comm: txg_sync Tainted: PF          O 3.7.5-201.fc18.x86_64 #1
[25810.183875] Call Trace:
[25810.183889]  [<ffffffffa01714d7>] spl_debug_dumpstack+0x27/0x40 [spl]
[25810.183893]  [<ffffffffa0172aa1>] spl_debug_bug+0x81/0xe0 [spl]
[25810.183899]  [<ffffffffa018939f>] sanitize_flags.part.6+0x66/0xcc7 [spl]
[25810.183903]  [<ffffffffa0176a87>] kmem_alloc_debug+0x307/0x420 [spl]
[25810.183928]  [<ffffffffa02be899>] dsl_dir_open_spa+0x69/0x840 [zfs]
[25810.183940]  [<ffffffffa0274c16>] ? dbuf_create_bonus+0x96/0x160 [zfs]
[25810.183954]  [<ffffffffa028df76>] dmu_objset_find_spa+0x56/0x6a0 [zfs]
[25810.183971]  [<ffffffffa02bf3a0>] ? dsl_dir_set_quota+0x140/0x140 [zfs]
[25810.183992]  [<ffffffffa031932d>] ? zap_add+0xed/0x1a0 [zfs]
[25810.184014]  [<ffffffffa02c2247>] dsl_pool_upgrade_dir_clones+0xe7/0x2e0 [zfs]
[25810.184065]  [<ffffffffa02f89ae>] ? txg_list_remove+0x6e/0xe0 [zfs]
[25810.184086]  [<ffffffffa02e1eff>] spa_sync+0x7df/0xd10 [zfs]
[25810.184100]  [<ffffffff81044b91>] ? pvclock_clocksource_read+0x61/0xf0
[25810.184118]  [<ffffffff810aea0c>] ? ktime_get_ts+0x4c/0xf0
[25810.184129]  [<ffffffffa02f8011>] txg_sync_thread+0x311/0x640 [zfs]
[25810.184149]  [<ffffffffa02f7d00>] ? txg_fini+0x3b0/0x3b0 [zfs]
[25810.184154]  [<ffffffffa017b8c1>] thread_generic_wrapper+0x81/0xe0 [spl]
[25810.184169]  [<ffffffffa017b840>] ? __thread_create+0x3a0/0x3a0 [spl]
[25810.184176]  [<ffffffff81081dd0>] kthread+0xc0/0xd0
[25810.184189]  [<ffffffff81010000>] ? ftrace_raw_event_xen_mmu_flush_tlb_others+0x50/0xe0
[25810.184189]  [<ffffffff81081d10>] ? kthread_create_on_node+0x120/0x120
[25810.184198]  [<ffffffff8163e0ec>] ret_from_fork+0x7c/0xb0
[25810.184198]  [<ffffffff81081d10>] ? kthread_create_on_node+0x120/0x120
[25810.184913] SPL: Dumping log to /tmp/spl-log.1360140470.5947

nedbass added a commit to nedbass/zfs that referenced this issue Feb 6, 2013
Two more locations where KM_SLEEP was used in a call which must use
KM_PUSHPAGE were found while using the zpool upgrade command.
See commit b8d06fc for additional details.

Also make a small correction to the comment block above
dsl_dir_open_spa().

Closes openzfs#1268
@zrav
Copy link

zrav commented Sep 15, 2013

Running 0.6.2 on Ubuntu 12.04 LTS with 3.8.0-30 kernel:
[40557.683345] SPL: Fixing allocation for task txg_sync (616) which used GFP flags 0xfb9e9ba4 with PF_NOFS set [40557.683351] SPL: Showing stack for process 616 [40557.683355] Pid: 616, comm: txg_sync Tainted: PF O 3.8.0-30-generic #44~precise1-Ubuntu [40557.683358] Call Trace: [40557.683381] [<ffffffffa00724d7>] spl_debug_dumpstack+0x27/0x40 [spl] [40557.683393] [<ffffffffa00853e4>] sanitize_flags.part.10+0x68/0xc84 [spl] [40557.683400] [<ffffffffa00774d3>] kmem_alloc_debug+0x303/0x3b0 [spl] [40557.683404] [<ffffffff8135bbe2>] ? put_dec+0x72/0x90 [40557.683407] [<ffffffff8135cb05>] ? number.isra.2+0x355/0x390 [40557.683414] [<ffffffffa007725e>] ? kmem_alloc_debug+0x8e/0x3b0 [spl] [40557.683421] [<ffffffffa007b370>] task_alloc+0x1a0/0x350 [spl] [40557.683427] [<ffffffffa007725e>] ? kmem_alloc_debug+0x8e/0x3b0 [spl] [40557.683463] [<ffffffffa0184f20>] ? spa_add+0x4f0/0x4f0 [zfs] [40557.683470] [<ffffffffa007bbeb>] taskq_dispatch_delay+0x19b/0x2b0 [spl] [40557.683478] [<ffffffffa007d1ab>] ? taskq_cancel_id+0xeb/0x1e0 [spl] [40557.683481] [<ffffffff810ae88c>] ? ktime_get_ts+0x4c/0xe0 [40557.683507] [<ffffffffa0177a82>] spa_sync+0x1f2/0xae0 [zfs] [40557.683510] [<ffffffff810ae88c>] ? ktime_get_ts+0x4c/0xe0 [40557.683537] [<ffffffffa0188cef>] txg_sync_thread+0x2df/0x540 [zfs] [40557.683564] [<ffffffffa0188a10>] ? txg_init+0x250/0x250 [zfs] [40557.683572] [<ffffffffa007ac08>] thread_generic_wrapper+0x78/0x90 [spl] [40557.683579] [<ffffffffa007ab90>] ? __thread_create+0x310/0x310 [spl] [40557.683582] [<ffffffff8107f1b0>] kthread+0xc0/0xd0 [40557.683585] [<ffffffff8107f0f0>] ? flush_kthread_worker+0xb0/0xb0 [40557.683589] [<ffffffff816fcaec>] ret_from_fork+0x7c/0xb0 [40557.683591] [<ffffffff8107f0f0>] ? flush_kthread_worker+0xb0/0xb0
The call trace appears a few dozen more times in the syslog, 5 seconds apart. The pool was under load from about a dozen concurrent rsync jobs.

@zrav
Copy link

zrav commented Sep 16, 2013

Negative, I'm running vanilla ZoL 0.6.2 stable, which I assume does
little to explain the issue...

On 16.09.2013 03:22, Tim Chase wrote:

@zrav https://github.com/zrav No doubt this is yet more fallout from
the restructured sync task. I presume you're running code that contains
13fe019
13fe019? I looked at this
briefly this morning but didn't have time to investigate any further
other than to tell that it has something to do with the management of
the spa deadman.


Reply to this email directly or view it on GitHub
#1268 (comment).

@dweeezil
Copy link
Contributor

@zrav Yes, I must have been asleep when I sent my original reply which is why I withdrew it. Sorry for the noise but I've been busy chasing a few other allocation issues that fell out of the sync task restructuring. I will, however, take a fresh look at your stack trace in light of the 0.6.2 code.

@behlendorf
Copy link
Contributor

@zrav I've filed your issue as #1729

unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
Two more locations where KM_SLEEP was used in a call which must
use KM_PUSHPAGE were found while using the zpool upgrade command.
See commit b8d06fc for additional details.

Also make a small correction to the comment block above
dsl_dir_open_spa().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#1268
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants