kernel BUG at mm/filemap.c:2047! #5

cmock · 2012-08-12T22:15:43Z

There's a reproducible kernel bug in 3.2.23-rpi1+ -- whenever cricket (http://cricket.sourceforge.net/) tries to create an RRD file, this occurs; it does not happen on other file creations...

[  589.838325] BUG: sleeping function called from invalid context at kernel/mute
x.c:85
[  589.838355] in_atomic(): 1, irqs_disabled(): 0, pid: 2330, name: collector
[  589.838439] [<c0012100>] (unwind_backtrace+0x0/0xe0) from [<c032c240>] (mutex
_lock+0x18/0x40)
[  589.838486] [<c032c240>] (mutex_lock+0x18/0x40) from [<c0090714>] (generic_fi
le_aio_write+0x44/0xc8)
[  589.838539] [<c0090714>] (generic_file_aio_write+0x44/0xc8) from [<c0128dfc>]
 (ext4_file_write+0x230/0x294)
[  589.838594] [<c0128dfc>] (ext4_file_write+0x230/0x294) from [<c00c5e74>] (do_
sync_write+0x9c/0xd8)
[  589.838641] [<c00c5e74>] (do_sync_write+0x9c/0xd8) from [<c00c6728>] (vfs_wri
te+0xa8/0x180)
[  589.838671] [<c00c6728>] (vfs_write+0xa8/0x180) from [<c00c6a18>] (sys_write+
0x34/0x68)
[  589.838709] [<c00c6a18>] (sys_write+0x34/0x68) from [<c000d580>] (ret_fast_sy
scall+0x0/0x30)
[  589.838824] ------------[ cut here ]------------
[  589.846909] kernel BUG at mm/filemap.c:2047!
[  589.854554] Internal error: Oops - undefined instruction: 0 [#1]
[  589.863946] Modules linked in: tun nls_utf8 nls_cp437 vfat fat spidev smsc95x
x usbnet leds_gpio spi_bcm2708 i2c_bcm2708
[  589.881948] CPU: 0    Not tainted  (3.2.23-rpi1+ #3)
[  589.890476] PC is at iov_iter_copy_from_user_atomic+0x24/0xb4
[  589.899789] LR is at 0x0
[  589.905814] pc : [<c008e878>]    lr : [<00000000>]    psr: 40000013
[  589.905826] sp : cdb41d38  ip : cdb41d78  fp : 00000000
[  589.924437] r10: 00000000  r9 : cdb40000  r8 : 00001000
[  589.933187] r7 : cdb40000  r6 : ce652ea0  r5 : 00000000  r4 : 00001000
[  589.943248] r3 : 00001000  r2 : 00000000  r1 : cdb40000  r0 : c0687a20
[  589.953231] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  589.963801] Control: 00c5387d  Table: 0e948008  DAC: 00000015
[  589.973045] Process collector (pid: 2330, stack limit = 0xcdb40268)
[  589.982858] Stack: (0xcdb41d38 to 0xcdb42000)
[  589.990740] 1d20:                                                       00001000 00000000
[  590.005879] 1d40: 00000000 c008ea28 00001000 00000000 cdb41d70 cdb41d74 00000000 cdb13e40
[  590.021207] 1d60: c0344410 00001000 00000000 00000000 c0687a20 00000000 cdb41ec0 00000001
[  590.036709] 1d80: 00000000 00001000 ce652de0 00000000 00000000 ce652de0 ce652e44 00000000
[  590.052632] 1da0: cdb13e40 cdb13e40 00001000 c0090660 00000000 00000000 cdb41f00 00001000
[  590.068918] 1dc0: 00000000 c000d580 00000000 cdb41ec8 cdb41ec0 cdb41f00 ce652ea0 00000001
[  590.085584] 1de0: 00001000 ce652e44 cdb41ec8 00000001 cdb13e40 c00121b4 be8b38d4 00000000
[  590.102480] 1e00: 00000000 00000000 ce652e44 cdb41ec8 00000001 cdb13e40 cdb41ec0 c0090730
[  590.119572] 1e20: 00000001 00000000 91827364 cdb41e2c cdb41e2c cdb41e34 cdb41e34 00000000
[  590.136744] 1e40: 00000000 00000000 00000000 00000000 cdb13e40 ce652de0 cdb41ec8 cdb41ec0
[  590.154185] 1e60: 00000001 c0128dfc 00000000 00000000 cdb41ef8 cdb13e40 00000042 c00d3230
[  590.171889] 1e80: ce64e5d8 cdad1b40 c050d5c0 2cdd052d 00000008 fffffdee 00000000 cdb41f88
[  590.189984] 1ea0: cdb13e40 00001000 cdb40000 00000000 be8b38d4 c00c5e74 00000
[  590.189984] 1ea0: cdb13e40 00001000 cdb40000 00000000 be8b38d4 c00c5e74 00000000 00000000
[  590.208384] 1ec0: be8b2818 00001000 ce80e1c0 ce623238 00000000 00000001 ffffffff cdb13e40
[  590.226938] 1ee0: 00000000 00000000 00000000 00000000 cdb02c00 00003be0 00000000 00000000
[  590.245733] 1f00: 00000000 00000000 00000000 00000000 00001000 ce64f330 00001000 00000000
[  590.264786] 1f20: 00000000 00000001 00000000 7ff80000 00000001 cdb13e40 cdb41f88 00001000
[  590.283864] 1f40: cdb13e40 be8b2818 cdb41f88 cdb13e40 be8b2818 cdb41f88 be8b2818 c00c6728
[  590.302927] 1f60: cdb13e40 be8b2818 00000000 00000000 cdb13e40 be8b2818 00001000 c00c6a18
[  590.321992] 1f80: 00000004 00000000 00000000 00000000 01796e78 01763838 015defec 00000004
[  590.341069] 1fa0: c000d704 c000d580 01796e78 01763838 00000004 be8b2818 00001000 be8b2818
[  590.360124] 1fc0: 01796e78 01763838 015defec 00000004 00000003 00000000 00000001 be8b38d4
[  590.379187] 1fe0: 402ae000 be8b2800 404a7c14 4036b54c 60000010 00000004 00000000 00000000
[  590.398279] [<c008e878>] (iov_iter_copy_from_user_atomic+0x24/0xb4) from [<c008ea28>] (generic_file_buffered_write+0x120/0x264)
[  590.420789] [<c008ea28>] (generic_file_buffered_write+0x120/0x264) from [<c0090660>] (__generic_file_aio_write+0x358/0x3c8)
[  590.442962] [<c0090660>] (__generic_file_aio_write+0x358/0x3c8) from [<c0090730>] (generic_file_aio_write+0x60/0xc8)
[  590.464513] [<c0090730>] (generic_file_aio_write+0x60/0xc8) from [<c0128dfc>] (ext4_file_write+0x230/0x294)
[  590.485288] [<c0128dfc>] (ext4_file_write+0x230/0x294) from [<c00c5e74>] (do_sync_write+0x9c/0xd8)
[  590.505262] [<c00c5e74>] (do_sync_write+0x9c/0xd8) from [<c00c6728>] (vfs_write+0xa8/0x180)
[  590.524586] [<c00c6728>] (vfs_write+0xa8/0x180) from [<c00c6a18>] (sys_write+0x34/0x68)
[  590.538200] [<c00c6a18>] (sys_write+0x34/0x68) from [<c000d580>] (ret_fast_syscall+0x0/0x30)
[  590.557486] Code: e1a04003 e591e004 e3de5101 1a000000 (e7f001f2) 
[  590.569294] ---[ end trace a5ea58d4db817366 ]---

The text was updated successfully, but these errors were encountered:

bootc · 2012-08-13T07:25:32Z

Interesting. Give me a couple of days to build and test 3.2.27 and let's see if it still occurs in that one.

bootc · 2012-08-13T07:29:09Z

One more thing - does cricket reliably trigger this bug? If I install it on my RPi should I be able to replicate this easily?

bootc · 2012-08-13T07:31:09Z

And lastly, I notice cricket hasn't been updated since 2004. You may want to give Munin or Cacti a look if you want something newer - not that this is in any way a fix to your problem.

cmock · 2012-08-13T13:20:14Z

I have a simple and reproducible test case now:

rrdtool create /tmp/test.rrd --start 1344258765 --step 300 DS:ds0:GAUGE:1800:0:U DS:ds1:GAUGE:1800:0:U DS:ds2:GAUGE:1800:0:U DS:ds3:GAUGE:1800:0:U DS:ds4:DERIVE:1800:0:U DS:ds5:DERIVE:1800:0:U DS:ds6:DERIVE:1800:0:U RRA:AVERAGE:0.5:1:600 RRA:AVERAGE:0.5:6:600 RRA:AVERAGE:0.5:24:600 RRA:MAX:0.5:24:600 RRA:AVERAGE:0.5:288:600 RRA:MAX:0.5:288:600

You need to apt-get install rrdtool, obviously, and it can be run as an ordinary user.

Note that I'm using raspian, so this is armhf -- no idea if that is relevant.

Package versions:

ii  rrdtool        1.4.7-1        time-series data storage and display system 
ii  libc6:armhf    2.13-33        Embedded GNU C Library: Shared libraries
ii  libcairo2:armh 1.12.2-2       The Cairo 2D vector graphics library
ii  libdbi1        0.8.4-6        DB Independent Abstraction Layer for C -- sh
ii  libglib2.0-0:a 2.32.3-1       GLib library of C routines
ii  libpango1.0-0: 1.30.0-1       Layout and rendering of internationalized te
ii  libpng12-0:arm 1.2.49-1       PNG library - runtime
ii  librrd4        1.4.7-1        time-series data storage and display system 
ii  libxml2:armhf  2.8.0+dfsg1-5  GNOME XML library

commit 3cf003c upstream. Jian found that when he ran fsx on a 32 bit arch with a large wsize the process and one of the bdi writeback kthreads would sometimes deadlock with a stack trace like this: crash> bt PID: 2789 TASK: f02edaa0 CPU: 3 COMMAND: "fsx" #0 [eed63cbc] schedule at c083c5b3 #1 [eed63d80] kmap_high at c0500ec8 #2 [eed63db] cifs_async_writev at f7fabcd7 [cifs] #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs] #4 [eed63e50] do_writepages at c04f3e32 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a #6 [eed63ea4] filemap_fdatawrite at c04e1b3e #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs] torvalds#8 [eed63ecc] do_sync_write at c052d202 torvalds#9 [eed63f74] vfs_write at c052d4ee torvalds#10 [eed63f94] sys_write at c052df4c torvalds#11 [eed63fb0] ia32_sysenter_target at c0409a98 EAX: 00000004 EBX: 00000003 ECX: abd73b73 EDX: 012a65c6 DS: 007b ESI: 012a65c6 ES: 007b EDI: 00000000 SS: 007b ESP: bf8db178 EBP: bf8db1f8 GS: 0033 CS: 0073 EIP: 40000424 ERR: 00000004 EFLAGS: 00000246 Each task would kmap part of its address array before getting stuck, but not enough to actually issue the write. This patch fixes this by serializing the marshal_iov operations for async reads and writes. The idea here is to ensure that cifs aggressively tries to populate a request before attempting to fulfill another one. As soon as all of the pages are kmapped for a request, then we can unlock and allow another one to proceed. There's no need to do this serialization on non-CONFIG_HIGHMEM arches however, so optimize all of this out when CONFIG_HIGHMEM isn't set. Reported-by: Jian Li <jiali@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 torvalds#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 torvalds#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f torvalds#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e torvalds#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f torvalds#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad torvalds#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 torvalds#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a torvalds#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 torvalds#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b torvalds#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 torvalds#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c torvalds#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 torvalds#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 torvalds#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] torvalds#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] torvalds#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 torvalds#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 torvalds#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel BUG at mm/filemap.c:2047! #5

kernel BUG at mm/filemap.c:2047! #5

cmock commented Aug 12, 2012

bootc commented Aug 13, 2012

bootc commented Aug 13, 2012

bootc commented Aug 13, 2012

cmock commented Aug 13, 2012

kernel BUG at mm/filemap.c:2047! #5

kernel BUG at mm/filemap.c:2047! #5

Comments

cmock commented Aug 12, 2012

bootc commented Aug 13, 2012

bootc commented Aug 13, 2012

bootc commented Aug 13, 2012

cmock commented Aug 13, 2012