Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partialy convert lirc_rpi over to gpiolib, only the tx side #1

Merged
merged 4 commits into from
Aug 27, 2012
Merged

Partialy convert lirc_rpi over to gpiolib, only the tx side #1

merged 4 commits into from
Aug 27, 2012

Conversation

cleverca22
Copy link

TX was completely non-functional when i started

TX appears valid when using a scope, but hasnt been tested with actual hardware yet, RX unchanged and untested

based the patch on my own attempts to do SPDIF from the pi, turns out 6mhz is too fast to maintain while allowing scheduling
decided to do it after reading raspberrypi#38

TX was completely non-functional when i started

TX appears valid when using a scope, but hasnt been tested with actual hardware yet, RX unchanged and untested
@cleverca22
Copy link
Author

RX confirmed working, but still using raw GPIO

@ar0n
Copy link
Owner

ar0n commented Aug 16, 2012

Hi!

I will test it today and look into it.

Thanks!

@cleverca22
Copy link
Author

the TX carrier wave is showing as about 40khz on my pi, and it doesn't seem to be able to control anything so far, got any idea how i can debug it?

edit:
i put the scope on the rx pin and arranged it so it would rx everything it tx'd
now i can easily compare the signal after removing the 38khz carrier, and there is a long gap at the start, like a pre-amble, that lirc isn't sending
edit2:
on a closer look, only the remotes i recorded on the pi with irrecord have that missing, pre-made remote definitions look better (but the wires wont reach to actually test those)
seems to be a problem with irrecord not capturing the data properly
edit3:
i modified the remote definition to add a header enter and that got the signal to match
and changed the current limiting resistor to give more power, but SEND_ONCE still didnt work
needed to add min_repeat 2 to the remote def

@datscharf
Copy link

Hi cleverca22, I read it as you are only converting tx-side of the driver to gpiolib, will the patch be accepted into the kernel then?

You wrote that tx was not working when you started the port, I can confirm that it actually works with my setup. Only thing was that the IR was left 'on' after transmission, this was solved by Xxyzzy in http://www.raspberrypi.org/phpBB3/viewtopic.php?p=130248#p130248.

If you need any testing please let me know.

@cleverca22
Copy link
Author

i think the reason tx wasn't working because irrecord left min_repeat out of the file
i can convert most of the receive over also, but i don't think libgpio has the irq stuff at all
i'll add that fix to turn tx led off when its done

…30248#p130248

after reading the code, i can see how it would happen
if the code ends with a pulse and no space with hardware carrier or if the softcarrier ends on a high due to its lenght
while testing to see if it sanely handled errors, i found that it leaves the io region mapped, causing it to lock itself out
fixed
@cleverca22
Copy link
Author

if we want it to be fully libGPIO, i think the libgpio code needs to be modified to allow setting irq modes, which is a bit more major of a change then adding an isolated .c file

@datscharf
Copy link

actually I just tested the fix from Zxyzzy, and it seems that it does the exact opposite of its intention, if I replace the GPIO_CLEAR_PIN with GPIO_SET_PIN it works. Maybe the pin is inverted?

@cleverca22
Copy link
Author

is your led connected to gnd or vcc?

GPIO_CLEAR_PIN should make the pin go down to gnd while GPIO_SET_PIN should drive it to 3.3v, so the end result depends on what the led is connected to on the other side

@cleverca22
Copy link
Author

i stumbled upon BCM_GPIO_USE_IRQ while reading the source today, let me see what all it can do
http://kernel.org/doc/htmldocs/genericirq.html

ar0n added a commit that referenced this pull request Aug 27, 2012
Partialy convert lirc_rpi over to gpiolib, only the tx side
@ar0n ar0n merged commit 5d6beac into ar0n:rpi-patches Aug 27, 2012
@ar0n
Copy link
Owner

ar0n commented Aug 27, 2012

Actually setting up interrupt handling with the new gpio lib will work, thanks to this commit: 2f3523e. When I started working on this only a placeholder comment was placed in the IRQ enable function by Broadcom.

ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit afbca95 upstream.

The libertas scan thread expects priv->scan_req to be non-NULL.  In theory,
it should always be set.  In practice, we've seen the following oops:

[ 8363.067444] Unable to handle kernel NULL pointer dereference at virtual address 00000004
[ 8363.067490] pgd = c0004000
[ 8363.078393] [00000004] *pgd=00000000
[ 8363.086711] Internal error: Oops: 17 [#1] PREEMPT
[ 8363.091375] Modules linked in: fuse libertas_sdio libertas psmouse mousedev ov7670 mmp_camera joydev videobuf2_core videobuf2_dma_sg videobuf2_memops [last unloaded: scsi_wait_scan]
[ 8363.107490] CPU: 0    Not tainted  (3.0.0-gf7ccc69 raspberrypi#671)
[ 8363.112799] PC is at lbs_scan_worker+0x108/0x5a4 [libertas]
[ 8363.118326] LR is at 0x0
[ 8363.120836] pc : [<bf03a854>]    lr : [<00000000>]    psr: 60000113
[ 8363.120845] sp : ee66bf48  ip : 00000000  fp : 00000000
[ 8363.120845] r10: ee2c2088  r9 : c04e2efc  r8 : eef97005
[ 8363.132231] r7 : eee0716f  r6 : ee2c02c0  r5 : ee2c2088  r4 : eee07160
[ 8363.137419] r3 : 00000000  r2 : a0000113  r1 : 00000001  r0 : eee07160
[ 8363.143896] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[ 8363.157630] Control: 10c5387d  Table: 2e754019  DAC: 00000015
[ 8363.163334] Process kworker/u:1 (pid: 25, stack limit = 0xee66a2f8)

While I've not found a smoking gun, there are two places that raised red flags
for me.  The first is in _internal_start_scan, when we queue up a scan; we
first queue the worker, and then set priv->scan_req.  There's theoretically
a 50mS delay which should be plenty, but doing things that way just seems
racy (and not in the good way).

The second is in the scan worker thread itself.  Depending on the state of
priv->scan_channel, we cancel pending scan runs and then requeue a run in
300mS.  We then send the scan command down to the hardware, sleep, and if
we get scan results for all the desired channels, we set priv->scan_req to
NULL.  However, it that's happened in less than 300mS, what happens with
the pending scan run?

This patch addresses both of those concerns.  With the patch applied, we
have not seen the oops in the past two weeks.

Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 1b41c83 upstream.

When running the Point Grey "flycap" program for their USB 3.0 camera
(which was running as a USB 2.0 device for some reason), I trigger this
oops whenever I try to open a video stream:

Dec 15 16:48:34 puck kernel: [ 1798.715559] BUG: unable to handle kernel NULL pointer dereference at           (null)
Dec 15 16:48:34 puck kernel: [ 1798.719153] IP: [<ffffffff8147841e>] free_async+0x1e/0x70
Dec 15 16:48:34 puck kernel: [ 1798.720991] PGD 6f833067 PUD 6fc56067 PMD 0
Dec 15 16:48:34 puck kernel: [ 1798.722815] Oops: 0002 [#1] SMP
Dec 15 16:48:34 puck kernel: [ 1798.724627] CPU 0
Dec 15 16:48:34 puck kernel: [ 1798.724636] Modules linked in: ecryptfs encrypted_keys sha1_generic trusted binfmt_misc sha256_generic aesni_intel cryptd aes_x86_64 aes_generic parport_pc dm_crypt ppdev joydev snd_hda_codec_hdmi snd_hda_codec_conexant arc4 iwlwifi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm thinkpad_acpi mac80211 snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer btusb uvcvideo snd_seq_device bluetooth videodev psmouse snd v4l2_compat_ioctl32 serio_raw tpm_tis cfg80211 tpm tpm_bios nvram soundcore snd_page_alloc lp parport i915 xhci_hcd ahci libahci drm_kms_helper drm sdhci_pci sdhci e1000e i2c_algo_bit video
Dec 15 16:48:34 puck kernel: [ 1798.734212]
Dec 15 16:48:34 puck kernel: [ 1798.736162] Pid: 2713, comm: FlyCap2 Not tainted 3.2.0-rc5+ raspberrypi#28 LENOVO 4286CTO/4286CTO
Dec 15 16:48:34 puck kernel: [ 1798.738148] RIP: 0010:[<ffffffff8147841e>]  [<ffffffff8147841e>] free_async+0x1e/0x70
Dec 15 16:48:34 puck kernel: [ 1798.740134] RSP: 0018:ffff88005715fd78  EFLAGS: 00010296
Dec 15 16:48:34 puck kernel: [ 1798.742118] RAX: 00000000fffffff4 RBX: ffff88006fe8f900 RCX: 0000000000004118
Dec 15 16:48:34 puck kernel: [ 1798.744116] RDX: 0000000001000000 RSI: 0000000000016390 RDI: 0000000000000000
Dec 15 16:48:34 puck kernel: [ 1798.746087] RBP: ffff88005715fd88 R08: 0000000000000000 R09: ffffffff8146f22e
Dec 15 16:48:34 puck kernel: [ 1798.748018] R10: ffff88006e520ac0 R11: 0000000000000001 R12: ffff88005715fe28
Dec 15 16:48:34 puck kernel: [ 1798.749916] R13: ffff88005d31df00 R14: ffff88006fe8f900 R15: 00007f688c995cb8
Dec 15 16:48:34 puck kernel: [ 1798.751785] FS:  00007f68a366da40(0000) GS:ffff880100200000(0000) knlGS:0000000000000000
Dec 15 16:48:34 puck kernel: [ 1798.753659] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 15 16:48:34 puck kernel: [ 1798.755509] CR2: 0000000000000000 CR3: 00000000706bb000 CR4: 00000000000406f0
Dec 15 16:48:34 puck kernel: [ 1798.757334] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 15 16:48:34 puck kernel: [ 1798.759124] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 15 16:48:34 puck kernel: [ 1798.760871] Process FlyCap2 (pid: 2713, threadinfo ffff88005715e000, task ffff88006c675b80)
Dec 15 16:48:34 puck kernel: [ 1798.762605] Stack:
Dec 15 16:48:34 puck kernel: [ 1798.764297]  ffff88005715fe28 0000000000000000 ffff88005715fe08 ffffffff81479058
Dec 15 16:48:34 puck kernel: [ 1798.766020]  0000000000000000 ffffea0000004000 ffff880000004118 0000000000000000
Dec 15 16:48:34 puck kernel: [ 1798.767750]  ffff880000000001 ffff88006e520ac0 fffffff46fd81180 0000000000000000
Dec 15 16:48:34 puck kernel: [ 1798.769472] Call Trace:
Dec 15 16:48:34 puck kernel: [ 1798.771147]  [<ffffffff81479058>] proc_do_submiturb+0x778/0xa00
Dec 15 16:48:34 puck kernel: [ 1798.772798]  [<ffffffff8147a5fd>] usbdev_do_ioctl+0x24d/0x1200
Dec 15 16:48:34 puck kernel: [ 1798.774410]  [<ffffffff8147b5de>] usbdev_ioctl+0xe/0x20
Dec 15 16:48:34 puck kernel: [ 1798.775975]  [<ffffffff81189259>] do_vfs_ioctl+0x99/0x600
Dec 15 16:48:34 puck kernel: [ 1798.777534]  [<ffffffff81189851>] sys_ioctl+0x91/0xa0
Dec 15 16:48:34 puck kernel: [ 1798.779088]  [<ffffffff816247c2>] system_call_fastpath+0x16/0x1b
ec 15 16:48:34 puck kernel: [ 1798.780634] Code: 51 ff ff ff e9 29 ff ff ff 0f 1f 40 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 89 fb 48 8b 7f 18 e8 a6 ea c0 ff 4
8 8b 7b 20 <f0> ff 0f 0f 94 c0 84 c0 74 05 e8 d3 99 c1 ff 48 8b 43 40 48 8b
Dec 15 16:48:34 puck kernel: [ 1798.783970] RIP  [<ffffffff8147841e>] free_async+0x1e/0x70
Dec 15 16:48:34 puck kernel: [ 1798.785630]  RSP <ffff88005715fd78>
Dec 15 16:48:34 puck kernel: [ 1798.787274] CR2: 0000000000000000
Dec 15 16:48:34 puck kernel: [ 1798.794728] ---[ end trace 52894d3355f88d19 ]---

markup_oops.pl says the oops is in put_cred:

 ffffffff81478401:      48 89 e5                mov    %rsp,%rbp
 ffffffff81478404:      53                      push   %rbx
 ffffffff81478405:      48 83 ec 08             sub    $0x8,%rsp
 ffffffff81478409:      e8 f2 c0 1a 00          callq  ffffffff81624500 <mcount>
 ffffffff8147840e:      48 89 fb                mov    %rdi,%rbx   |  %ebx => ffff88006fe8f900
        put_pid(as->pid);
 ffffffff81478411:      48 8b 7f 18             mov    0x18(%rdi),%rdi
 ffffffff81478415:      e8 a6 ea c0 ff          callq  ffffffff81086ec0 <put_pid>
        put_cred(as->cred);
 ffffffff8147841a:      48 8b 7b 20             mov    0x20(%rbx),%rdi |  %edi => 0  %ebx = ffff88006fe8f900
  */
 static inline int atomic_dec_and_test(atomic_t *v)
 {
        unsigned char c;

        asm volatile(LOCK_PREFIX "decl %0; sete %1"
*ffffffff8147841e:      f0 ff 0f                lock decl (%rdi)   |  %edi = 0 <--- faulting instruction
 ffffffff81478421:      0f 94 c0                sete   %al
 static inline void put_cred(const struct cred *_cred)
 {
        struct cred *cred = (struct cred *) _cred;

        validate_creds(cred);
        if (atomic_dec_and_test(&(cred)->usage))
 ffffffff81478424:      84 c0                   test   %al,%al
 ffffffff81478426:      74 05                   je     ffffffff8147842d <free_async+0x2d>
                __put_cred(cred);
 ffffffff81478428:      e8 d3 99 c1 ff          callq  ffffffff81091e00 <__put_cred>
        kfree(as->urb->transfer_buffer);
 ffffffff8147842d:      48 8b 43 40             mov    0x40(%rbx),%rax
 ffffffff81478431:      48 8b 78 68             mov    0x68(%rax),%rdi
 ffffffff81478435:      e8 a6 e1 ce ff          callq  ffffffff811665e0 <kfree>
        kfree(as->urb->setup_packet);
 ffffffff8147843a:      48 8b 43 40             mov    0x40(%rbx),%rax
 ffffffff8147843e:      48 8b b8 90 00 00 00    mov    0x90(%rax),%rdi
 ffffffff81478445:      e8 96 e1 ce ff          callq  ffffffff811665e0 <kfree>
        usb_free_urb(as->urb);
 ffffffff8147844a:      48 8b 7b 40             mov    0x40(%rbx),%rdi
 ffffffff8147844e:      e8 0d 6b ff ff          callq  ffffffff8146ef60 <usb_free_urb>

This bug seems to have been introduced by commit
d178bc3 "user namespace: usb: make usb
urbs user namespace aware (v2)"

I'm not sure if this is right fix, but it does stop the oops.

Unfortunately, the Point Grey software still refuses to work, but it's a
closed source app, so I can't fix it.

Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
…bles the feature to fix an oops

commit 1a3a026 upstream.

Echo vendor and product number of a non usb-storage device to
usb-storage driver's new_id, then plug in the device to host and you
will find following oops msg, the root cause is usb_stor_probe1()
refers invalid id entry if giving a dynamic id, so just disable the
feature.

[ 3105.018012] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 3105.018062] CPU 0
[ 3105.018075] Modules linked in: usb_storage usb_libusual bluetooth
dm_crypt binfmt_misc snd_hda_codec_analog snd_hda_intel snd_hda_codec
snd_hwdep hp_wmi ppdev sparse_keymap snd_pcm snd_seq_midi snd_rawmidi
snd_seq_midi_event snd_seq snd_timer snd_seq_device psmouse snd
serio_raw tpm_infineon soundcore i915 snd_page_alloc tpm_tis
parport_pc tpm tpm_bios drm_kms_helper drm i2c_algo_bit video lp
parport usbhid hid sg sr_mod sd_mod ehci_hcd uhci_hcd usbcore e1000e
usb_common floppy
[ 3105.018408]
[ 3105.018419] Pid: 189, comm: khubd Tainted: G          I  3.2.0-rc7+
raspberrypi#29 Hewlett-Packard HP Compaq dc7800p Convertible Minitower/0AACh
[ 3105.018481] RIP: 0010:[<ffffffffa045830d>]  [<ffffffffa045830d>]
usb_stor_probe1+0x2fd/0xc20 [usb_storage]
[ 3105.018536] RSP: 0018:ffff880056a3d830  EFLAGS: 00010286
[ 3105.018562] RAX: ffff880065f4e648 RBX: ffff88006bb28000 RCX: 0000000000000000
[ 3105.018597] RDX: ffff88006f23c7b0 RSI: 0000000000000001 RDI: 0000000000000206
[ 3105.018632] RBP: ffff880056a3d900 R08: 0000000000000000 R09: ffff880067365000
[ 3105.018665] R10: 00000000000002ac R11: 0000000000000010 R12: ffff6000b41a7340
[ 3105.018698] R13: ffff880065f4ef60 R14: ffff88006bb28b88 R15: ffff88006f23d270
[ 3105.018733] FS:  0000000000000000(0000) GS:ffff88007a200000(0000)
knlGS:0000000000000000
[ 3105.018773] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3105.018801] CR2: 00007fc99c8c4650 CR3: 0000000001e05000 CR4: 00000000000006f0
[ 3105.018835] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3105.018870] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3105.018906] Process khubd (pid: 189, threadinfo ffff880056a3c000,
task ffff88005677a400)
[ 3105.018945] Stack:
[ 3105.018959]  0000000000000000 0000000000000000 ffff880056a3d8d0
0000000000000002
[ 3105.019011]  0000000000000000 ffff880056a3d918 ffff880000000000
0000000000000002
[ 3105.019058]  ffff880056a3d8d0 0000000000000012 ffff880056a3d8d0
0000000000000006
[ 3105.019105] Call Trace:
[ 3105.019128]  [<ffffffffa0458cd4>] storage_probe+0xa4/0xe0 [usb_storage]
[ 3105.019173]  [<ffffffffa0097822>] usb_probe_interface+0x172/0x330 [usbcore]
[ 3105.019211]  [<ffffffff815fda67>] driver_probe_device+0x257/0x3b0
[ 3105.019243]  [<ffffffff815fdd43>] __device_attach+0x73/0x90
[ 3105.019272]  [<ffffffff815fdcd0>] ? __driver_attach+0x110/0x110
[ 3105.019303]  [<ffffffff815fb93c>] bus_for_each_drv+0x9c/0xf0
[ 3105.019334]  [<ffffffff815fd6c7>] device_attach+0xf7/0x120
[ 3105.019364]  [<ffffffff815fc905>] bus_probe_device+0x45/0x80
[ 3105.019396]  [<ffffffff815f98a6>] device_add+0x876/0x990
[ 3105.019434]  [<ffffffffa0094e42>] usb_set_configuration+0x822/0x9e0 [usbcore]
[ 3105.019479]  [<ffffffffa00a3492>] generic_probe+0x62/0xf0 [usbcore]
[ 3105.019518]  [<ffffffffa0097a46>] usb_probe_device+0x66/0xb0 [usbcore]
[ 3105.019555]  [<ffffffff815fda67>] driver_probe_device+0x257/0x3b0
[ 3105.019589]  [<ffffffff815fdd43>] __device_attach+0x73/0x90
[ 3105.019617]  [<ffffffff815fdcd0>] ? __driver_attach+0x110/0x110
[ 3105.019648]  [<ffffffff815fb93c>] bus_for_each_drv+0x9c/0xf0
[ 3105.019680]  [<ffffffff815fd6c7>] device_attach+0xf7/0x120
[ 3105.019709]  [<ffffffff815fc905>] bus_probe_device+0x45/0x80
[ 3105.021040] usb usb6: usb auto-resume
[ 3105.021045] usb usb6: wakeup_rh
[ 3105.024849]  [<ffffffff815f98a6>] device_add+0x876/0x990
[ 3105.025086]  [<ffffffffa0088987>] usb_new_device+0x1e7/0x2b0 [usbcore]
[ 3105.025086]  [<ffffffffa008a4d7>] hub_thread+0xb27/0x1ec0 [usbcore]
[ 3105.025086]  [<ffffffff810d5200>] ? wake_up_bit+0x50/0x50
[ 3105.025086]  [<ffffffffa00899b0>] ? usb_remote_wakeup+0xa0/0xa0 [usbcore]
[ 3105.025086]  [<ffffffff810d49b8>] kthread+0xd8/0xf0
[ 3105.025086]  [<ffffffff81939884>] kernel_thread_helper+0x4/0x10
[ 3105.025086]  [<ffffffff8192a8c0>] ? _raw_spin_unlock_irq+0x50/0x80
[ 3105.025086]  [<ffffffff8192b1b4>] ? retint_restore_args+0x13/0x13
[ 3105.025086]  [<ffffffff810d48e0>] ? __init_kthread_worker+0x80/0x80
[ 3105.025086]  [<ffffffff81939880>] ? gs_change+0x13/0x13
[ 3105.025086] Code: 00 48 83 05 cd ad 00 00 01 48 83 05 cd ad 00 00
01 4c 8b ab 30 0c 00 00 48 8b 50 08 48 83 c0 30 48 89 45 a0 4c 89 a3
40 0c 00 00 <41> 0f b6 44 24 10 48 89 55 a8 3c ff 0f 84 b8 04 00 00 48
83 05
[ 3105.025086] RIP  [<ffffffffa045830d>] usb_stor_probe1+0x2fd/0xc20
[usb_storage]
[ 3105.025086]  RSP <ffff880056a3d830>
[ 3105.060037] hub 6-0:1.0: hub_resume
[ 3105.062616] usb usb5: usb auto-resume
[ 3105.064317] ehci_hcd 0000:00:1d.7: resume root hub
[ 3105.094809] ---[ end trace a7919e7f17c0a727 ]---
[ 3105.130069] hub 5-0:1.0: hub_resume
[ 3105.132131] usb usb4: usb auto-resume
[ 3105.132136] usb usb4: wakeup_rh
[ 3105.180059] hub 4-0:1.0: hub_resume
[ 3106.290052] usb usb6: suspend_rh (auto-stop)
[ 3106.290077] usb usb4: suspend_rh (auto-stop)

Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 342ff28 upstream.

Some error paths in mtd_blkdevs were fixed in the following commit:

    commit 94735ec
    mtd: mtd_blkdevs: fix error path in blktrans_open

But on these error paths, the block device's `dev->open' count is
already incremented before we check for errors. This meant that, while
the error path was handled correctly on the first time through
blktrans_open(), the device is erroneously considered already open on
the second time through.

This problem can be seen, for instance, when a UBI volume is
simultaneously mounted as a UBIFS partition and read through its
corresponding gluebi mtdblockX device. This results in blktrans_open()
passing its error checks (with `dev->open > 0') without actually having
a handle on the device. Here's a summarized log of the actions and
results with nandsim:

    # modprobe nandsim
    # modprobe mtdblock
    # modprobe gluebi
    # modprobe ubifs
    # ubiattach /dev/ubi_ctrl -m 0
    ...
    # ubimkvol /dev/ubi0 -N test -s 16MiB
    ...
    # mount -t ubifs ubi0:test /mnt
    # ls /dev/mtdblock*
    /dev/mtdblock0  /dev/mtdblock1
    # cat /dev/mtdblock1 > /dev/null
    cat: can't open '/dev/mtdblock4': Device or resource busy
    # cat /dev/mtdblock1 > /dev/null

    CPU 0 Unable to handle kernel paging request at virtual address
    fffffff0, epc == 8031536c, ra == 8031f280
    Oops[#1]:
    ...
    Call Trace:
    [<8031536c>] ubi_leb_read+0x14/0x164
    [<8031f280>] gluebi_read+0xf0/0x148
    [<802edba8>] mtdblock_readsect+0x64/0x198
    [<802ecfe4>] mtd_blktrans_thread+0x330/0x3f4
    [<8005be98>] kthread+0x88/0x90
    [<8000bc04>] kernel_thread_helper+0x10/0x18

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit b2ea70a upstream.

expkey_parse() oopses when handling a 0 length export. This is easily
triggerable from usermode by writing 0 bytes into
'/proc/[proc id]/net/rpc/nfsd.fh/channel'.

Below is the log:

[ 1402.286893] BUG: unable to handle kernel paging request at ffff880077c49fff
[ 1402.287632] IP: [<ffffffff812b4b99>] expkey_parse+0x28/0x2e1
[ 1402.287632] PGD 2206063 PUD 1fdfd067 PMD 1ffbc067 PTE 8000000077c49160
[ 1402.287632] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 1402.287632] CPU 1
[ 1402.287632] Pid: 20198, comm: trinity Not tainted 3.2.0-rc2-sasha-00058-gc65cd37 raspberrypi#6
[ 1402.287632] RIP: 0010:[<ffffffff812b4b99>]  [<ffffffff812b4b99>] expkey_parse+0x28/0x2e1
[ 1402.287632] RSP: 0018:ffff880077f0fd68  EFLAGS: 00010292
[ 1402.287632] RAX: ffff880077c49fff RBX: 00000000ffffffea RCX: 0000000001043400
[ 1402.287632] RDX: 0000000000000000 RSI: ffff880077c4a000 RDI: ffffffff82283de0
[ 1402.287632] RBP: ffff880077f0fe18 R08: 0000000000000001 R09: ffff880000000000
[ 1402.287632] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880077c4a000
[ 1402.287632] R13: ffffffff82283de0 R14: 0000000001043400 R15: ffffffff82283de0
[ 1402.287632] FS:  00007f25fec3f700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
[ 1402.287632] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1402.287632] CR2: ffff880077c49fff CR3: 0000000077e1d000 CR4: 00000000000406e0
[ 1402.287632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1402.287632] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1402.287632] Process trinity (pid: 20198, threadinfo ffff880077f0e000, task ffff880077db17b0)
[ 1402.287632] Stack:
[ 1402.287632]  ffff880077db17b0 ffff880077c4a000 ffff880077f0fdb8 ffffffff810b411e
[ 1402.287632]  ffff880000000000 ffff880077db17b0 ffff880077c4a000 ffffffff82283de0
[ 1402.287632]  0000000001043400 ffffffff82283de0 ffff880077f0fde8 ffffffff81111f63
[ 1402.287632] Call Trace:
[ 1402.287632]  [<ffffffff810b411e>] ? lock_release+0x1af/0x1bc
[ 1402.287632]  [<ffffffff81111f63>] ? might_fault+0x97/0x9e
[ 1402.287632]  [<ffffffff81111f1a>] ? might_fault+0x4e/0x9e
[ 1402.287632]  [<ffffffff81a8bcf2>] cache_do_downcall+0x3e/0x4f
[ 1402.287632]  [<ffffffff81a8c950>] cache_write.clone.16+0xbb/0x130
[ 1402.287632]  [<ffffffff81a8c9df>] ? cache_write_pipefs+0x1a/0x1a
[ 1402.287632]  [<ffffffff81a8c9f8>] cache_write_procfs+0x19/0x1b
[ 1402.287632]  [<ffffffff8118dc54>] proc_reg_write+0x8e/0xad
[ 1402.287632]  [<ffffffff8113fe81>] vfs_write+0xaa/0xfd
[ 1402.287632]  [<ffffffff8114142d>] ? fget_light+0x35/0x9e
[ 1402.287632]  [<ffffffff8113ff8b>] sys_write+0x48/0x6f
[ 1402.287632]  [<ffffffff81bbdb92>] system_call_fastpath+0x16/0x1b
[ 1402.287632] Code: c0 c9 c3 55 48 63 d2 48 89 e5 48 8d 44 32 ff 41 57 41 56 41 55 41 54 53 bb ea ff ff ff 48 81 ec 88 00 00 00 48 89 b5 58 ff ff ff
[ 1402.287632]  38 0a 0f 85 89 02 00 00 c6 00 00 48 8b 3d 44 4a e5 01 48 85
[ 1402.287632] RIP  [<ffffffff812b4b99>] expkey_parse+0x28/0x2e1
[ 1402.287632]  RSP <ffff880077f0fd68>
[ 1402.287632] CR2: ffff880077c49fff
[ 1402.287632] ---[ end trace 368ef53ff773a5e3 ]---

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 687875f upstream.

Fix the following NULL ptr dereference caused by

  cat /sys/devices/system/memory/memory0/removable

Pid: 13979, comm: sed Not tainted 3.0.13-0.5-default #1 IBM BladeCenter LS21 -[7971PAM]-/Server Blade
RIP: __count_immobile_pages+0x4/0x100
Process sed (pid: 13979, threadinfo ffff880221c36000, task ffff88022e788480)
Call Trace:
  is_pageblock_removable_nolock+0x34/0x40
  is_mem_section_removable+0x74/0xf0
  show_mem_removable+0x41/0x70
  sysfs_read_file+0xfe/0x1c0
  vfs_read+0xc7/0x130
  sys_read+0x53/0xa0
  system_call_fastpath+0x16/0x1b

We are crashing because we are trying to dereference NULL zone which
came from pfn=0 (struct page ffffea0000000000). According to the boot
log this page is marked reserved:
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)

and early_node_map confirms that:
early_node_map[3] active PFN ranges
    1: 0x00000010 -> 0x0000009c
    1: 0x00000100 -> 0x000bffa3
    1: 0x00100000 -> 0x00240000

The problem is that memory_present works in PAGE_SECTION_MASK aligned
blocks so the reserved range sneaks into the the section as well.  This
also means that free_area_init_node will not take care of those reserved
pages and they stay uninitialized.

When we try to read the removable status we walk through all available
sections and hope that the zone is valid for all pages in the section.
But this is not true in this case as the zone and nid are not initialized.

We have only one node in this particular case and it is marked as node=1
(rather than 0) and that made the problem visible because page_to_nid will
return 0 and there are no zones on the node.

Let's check that the zone is valid and that the given pfn falls into its
boundaries and mark the section not removable.  This might cause some
false positives, probably, but we do not have any sane way to find out
whether the page is reserved by the platform or it is just not used for
whatever other reasons.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
…S block during isolation for migration

commit 0bf380b upstream.

When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 raspberrypi#4 [d72d3c00] do_page_fault at c05c72ec
 raspberrypi#5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 raspberrypi#6 [d72d3cb4] isolate_migratepages at c030b15a
 raspberrypi#7 [d72d3d14] zone_watermark_ok at c02d26cb
 raspberrypi#8 [d72d3d2c] compact_zone at c030b8de
 raspberrypi#9 [d72d3d68] compact_zone_order at c030bba1
raspberrypi#10 [d72d3db4] try_to_compact_pages at c030bc84
raspberrypi#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
raspberrypi#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
raspberrypi#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
raspberrypi#14 [d72d3eb8] alloc_pages_vma at c030a845
raspberrypi#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
raspberrypi#16 [d72d3f00] handle_mm_fault at c02f36c6
raspberrypi#17 [d72d3f30] do_page_fault at c05c70ed
raspberrypi#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff00  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 525895b upstream.

Due to a race it was possible for a fence to be destroyed while another
thread was trying to synchronise with it.  If this happened in the fallback
non-semaphore path, it lead to the following oops due to fence->channel
being NULL.

BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<fa9632ce>] nouveau_fence_update+0xe/0xe0 [nouveau]
*pde = a649c067
SMP
Modules linked in: fuse nouveau(O) ttm(O) drm_kms_helper(O) drm(O) mxm_wmi video wmi netconsole configfs lockd bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables snd_hda_codec_realtek snd_hda_intel snd_hda_cobinfmt_misc uinput ata_generic pata_acpi pata_aet2c_algo_bit i2c_core [last unloaded: wmi]

Pid: 2255, comm: gnome-shell Tainted: G           O 3.2.0-0.rc5.git0.1.fc17.i686 #1 System manufacturer System Product Name/M2A-VM
EIP: 0060:[<fa9632ce>] EFLAGS: 00010296 CPU: 1
EIP is at nouveau_fence_update+0xe/0xe0 [nouveau]
EAX: 00000000 EBX: ddfc6dd0 ECX: dd111580 EDX: 00000000
ESI: 00003e80 EDI: dd111580 EBP: dd121d00 ESP: dd121ce8
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process gnome-shell (pid: 2255, ti=dd120000 task=dd111580 task.ti=dd120000)
Stack:
 7dc86c76 00000000 00003e80 ddfc6dd0 00003e80 dd111580 dd121d0c fa96371f
 00000000 dd121d3c fa963773 dd111580 01000246 000ec53d 00000000 ddfc6dd0
 00001f40 00000000 ddfc6dd0 00000010 dc7df840 dd121d6c fa9639a0 00000000
Call Trace:
 [<fa96371f>] __nouveau_fence_signalled+0x1f/0x30 [nouveau]
 [<fa963773>] __nouveau_fence_wait+0x43/0xd0 [nouveau]
 [<fa9639a0>] nouveau_fence_sync+0x1a0/0x1c0 [nouveau]
 [<fa964046>] validate_list+0x176/0x300 [nouveau]
 [<f7d9c9c0>] ? ttm_bo_mem_put+0x30/0x30 [ttm]
 [<fa964b8a>] nouveau_gem_ioctl_pushbuf+0x48a/0xfd0 [nouveau]
 [<c0406481>] ? die+0x31/0x80
 [<f7c93d98>] drm_ioctl+0x388/0x490 [drm]
 [<c0406481>] ? die+0x31/0x80
 [<fa964700>] ? nouveau_gem_ioctl_new+0x150/0x150 [nouveau]
 [<c0635c7b>] ? file_has_perm+0xcb/0xe0
 [<f7c93a10>] ? drm_copy_field+0x80/0x80 [drm]
 [<c0564f56>] do_vfs_ioctl+0x86/0x5b0
 [<c0406481>] ? die+0x31/0x80
 [<c0635f22>] ? selinux_file_ioctl+0x62/0x130
 [<c0554f30>] ? fget_light+0x30/0x340
 [<c05654ef>] sys_ioctl+0x6f/0x80
 [<c099e3a4>] syscall_call+0x7/0xb
 [<c0406481>] ? die+0x31/0x80
 [<c0406481>] ? die+0x31/0x80

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
…ation

commit dc90860 upstream.

When isolating pages for migration, migration starts at the start of a
zone while the free scanner starts at the end of the zone.  Migration
avoids entering a new zone by never going beyond the free scanned.

Unfortunately, in very rare cases nodes can overlap.  When this happens,
migration isolates pages without the LRU lock held, corrupting lists
which will trigger errors in reclaim or during page free such as in the
following oops

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: [<ffffffff810f795c>] free_pcppages_bulk+0xcc/0x450
  PGD 1dda554067 PUD 1e1cb58067 PMD 0
  Oops: 0000 [#1] SMP
  CPU 37
  Pid: 17088, comm: memcg_process_s Tainted: G            X
  RIP: free_pcppages_bulk+0xcc/0x450
  Process memcg_process_s (pid: 17088, threadinfo ffff881c2926e000, task ffff881c2926c0c0)
  Call Trace:
    free_hot_cold_page+0x17e/0x1f0
    __pagevec_free+0x90/0xb0
    release_pages+0x22a/0x260
    pagevec_lru_move_fn+0xf3/0x110
    putback_lru_page+0x66/0xe0
    unmap_and_move+0x156/0x180
    migrate_pages+0x9e/0x1b0
    compact_zone+0x1f3/0x2f0
    compact_zone_order+0xa2/0xe0
    try_to_compact_pages+0xdf/0x110
    __alloc_pages_direct_compact+0xee/0x1c0
    __alloc_pages_slowpath+0x370/0x830
    __alloc_pages_nodemask+0x1b1/0x1c0
    alloc_pages_vma+0x9b/0x160
    do_huge_pmd_anonymous_page+0x160/0x270
    do_page_fault+0x207/0x4c0
    page_fault+0x25/0x30

The "X" in the taint flag means that external modules were loaded but but
is unrelated to the bug triggering.  The real problem was because the PFN
layout looks like this

  Zone PFN ranges:
    DMA      0x00000010 -> 0x00001000
    DMA32    0x00001000 -> 0x00100000
    Normal   0x00100000 -> 0x01e80000
  Movable zone start PFN for each node
  early_node_map[14] active PFN ranges
      0: 0x00000010 -> 0x0000009b
      0: 0x00000100 -> 0x0007a1ec
      0: 0x0007a354 -> 0x0007a379
      0: 0x0007f7ff -> 0x0007f800
      0: 0x00100000 -> 0x00680000
      1: 0x00680000 -> 0x00e80000
      0: 0x00e80000 -> 0x01080000
      1: 0x01080000 -> 0x01280000
      0: 0x01280000 -> 0x01480000
      1: 0x01480000 -> 0x01680000
      0: 0x01680000 -> 0x01880000
      1: 0x01880000 -> 0x01a80000
      0: 0x01a80000 -> 0x01c80000
      1: 0x01c80000 -> 0x01e80000

The fix is straight-forward.  isolate_migratepages() has to make a
similar check to isolate_freepage to ensure that it never isolates pages
from a zone it does not hold the LRU lock for.

This was discovered in a 3.0-based kernel but it affects 3.1.x, 3.2.x
and current mainline.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 07445f6 upstream.

all works need to be initialized before ieee80211_register_hw
to prevent mac80211 call backs such as drv_start, drv_config
getting started. otherwise we would queue/cancel works before
initializing them and it leads to kernel panic.
this issue can be recreated with the following script
in Chrome laptops with AR928X cards, with background scan
running (or) Network manager is running

while true
do
sudo modprobe -v ath9k
sleep 3
sudo modprobe -r ath9k
sleep 3
done

	 EIP: [<81040a47>] __cancel_work_timer+0xb8/0xe1 SS:ESP 0068:f6be9d70
	 ---[ end trace 4f86d6139a9900ef ]---
	 Registered led device: ath9k-phy0
	 ieee80211 phy0: Atheros AR9280 Rev:2 mem=0xf88a0000,
	 irq=16
	 Kernel panic - not syncing: Fatal exception
	 Pid: 456, comm: wpa_supplicant Tainted: G      D
	 3.0.13 #1
	Call Trace:
	 [<81379e21>] panic+0x53/0x14a
	 [<81004a30>] oops_end+0x73/0x81
	 [<81004b53>] die+0x4c/0x55
	 [<81002710>] do_trap+0x7c/0x83
	 [<81002855>] ? do_bounds+0x58/0x58
	 [<810028cc>] do_invalid_op+0x77/0x81
	 [<81040a47>] ? __cancel_work_timer+0xb8/0xe1
	 [<810489ec>] ? sched_clock_cpu+0x81/0x11f
	 [<8103f809>] ? wait_on_work+0xe2/0xf7
	 [<8137f807>] error_code+0x67/0x6c
	 [<810300d8>] ? wait_consider_task+0x4ba/0x84c
	 [<81040a47>] ? __cancel_work_timer+0xb8/0xe1
	 [<810380c9>] ? try_to_del_timer_sync+0x5f/0x67
	 [<81040a91>] cancel_work_sync+0xf/0x11
	 [<f88d7b7c>] ath_set_channel+0x62/0x25c [ath9k]
	 [<f88d67d1>] ? ath9k_tx_last_beacon+0x26a/0x85c [ath9k]
	 [<f88d8899>] ath_radio_disable+0x3f1/0x68e [ath9k]
	 [<f90d0edb>] ieee80211_hw_config+0x111/0x116 [mac80211]
	 [<f90dd95c>] __ieee80211_recalc_idle+0x919/0xa37 [mac80211]
	 [<f90dda76>] __ieee80211_recalc_idle+0xa33/0xa37 [mac80211]
	 [<812dbed8>] __dev_open+0x82/0xab

Cc: Gary Morain <gmorain@google.com>
Cc: Paul Stewart <pstew@google.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Tested-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit f9c2a0d upstream.

Current PIO mode makes a kernel crash with CONFIG_HIGHMEM.
Highmem pages have a NULL from sg_virt(sg).
This patch fixes the following problem.

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 817 [#1] PREEMPT SMP
Modules linked in:
CPU: 0    Not tainted  (3.0.15-01423-gdbf465f raspberrypi#589)
PC is at dw_mci_pull_data32+0x4c/0x9c
LR is at dw_mci_read_data_pio+0x54/0x1f0
pc : [<c0358824>]    lr : [<c035988c>]    psr: 20000193
sp : c0619d48  ip : c0619d70  fp : c0619d6c
r10: 00000000  r9 : 00000002  r8 : 00001000
r7 : 00000200  r6 : 00000000  r5 : e1dd3100  r4 : 00000000
r3 : 65622023  r2 : 0000007f  r1 : eeb96000  r0 : e1dd3100
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment
xkernel
Control: 10c5387d  Table: 61e2004a  DAC: 00000015
Process swapper (pid: 0, stack limit = 0xc06182f0)
Stack: (0xc0619d48 to 0xc061a000)
9d40:                   e1dd3100 e1a4f000 00000000 e1dd3100 e1a4f000 00000200
9d60: c0619da4 c0619d70 c035988c c03587e4 c0619d9c e18158f4 e1dd3100 e1dd3100
9d80: 00000020 00000000 00000000 00000020 c06e8a84 00000000 c0619e04 c0619da8
9da0: c0359b24 c0359844 e18158f4 e1dd3164 e1dd3168 e1dd3150 3d02fc79 e1dd3154
9dc0: e1dd3178 00000000 00000020 00000000 e1dd3150 00000000 c10dd7e8 e1a84900
9de0: c061e7cc 00000000 00000000 0000008d c06e8a84 c061e780 c0619e4c c0619e08
9e00: c00c4738 c0359a34 3d02fc79 00000000 c0619e4c c05a1698 c05a1670 c05a165c
9e20: c04de8b0 c061e780 c061e7cc e1a84900 ffffed68 0000008d c0618000 00000000
9e40: c0619e6c c0619e50 c00c48b4 c00c46c8 c061e780 c00423ac c061e7cc ffffed68
9e60: c0619e8c c0619e70 c00c7358 c00c487c 0000008d ffffee38 c0618000 ffffed68
9e80: c0619ea4 c0619e90 c00c4258 c00c72b0 c00423ac ffffee38 c0619ecc c0619ea8
9ea0: c004241c c00c4234 ffffffff f8810000 0000006d 00000002 00000001 7fffffff
9ec0: c0619f44 c0619ed0 c0048bc0 c00423c4 220ae7a9 00000000 386f0d30 0005d3a4
9ee0: c00423ac c10dd0b8 c06f2cd8 c0618000 c0594778 c003a674 7fffffff c0619f44
9f00: 386f0d30 c0619f18 c00a6f94 c005be3c 80000013 ffffffff 386f0d30 0005d3a4
9f20: 386f0d30 0005d2d1 c10dd0a8 c10dd0b8 c06f2cd8 c0618000 c0619f74 c0619f48
9f40: c0345858 c005be00 c00a2440 c0618000 c0618000 c00410d8 c06c1944 c00410fc
9f60: c0594778 c003a674 c0619f9c c0619f78 c004a7e8 c03457b4 c0618000 c06c18f8
9f80: 00000000 c0039c70 c06c18d4 c003a674 c0619fb4 c0619fa0 c04ceafc c004a714
9fa0: c06287b4 c06c18f8 c0619ff4 c0619fb8 c0008b68 c04cea68 c0008578 00000000
9fc0: 00000000 c003a674 00000000 10c5387d c0628658 c003aa78 c062f1c4 4000406a
9fe0: 413fc090 00000000 00000000 c0619ff8 40008044 c0008858 00000000 00000000
Backtrace:
[<c03587d8>] (dw_mci_pull_data32+0x0/0x9c) from [<c035988c>] (dw_mci_read_data_pio+0x54/0x1f0)
 r6:00000200 r5:e1a4f000 r4:e1dd3100
 [<c0359838>] (dw_mci_read_data_pio+0x0/0x1f0) from [<c0359b24>] (dw_mci_interrupt+0xfc/0x4a4)
[<c0359a28>] (dw_mci_interrupt+0x0/0x4a4) from [<c00c4738>] (handle_irq_event_percpu+0x7c/0x1b4)
[<c00c46bc>] (handle_irq_event_percpu+0x0/0x1b4) from [<c00c48b4>] (handle_irq_event+0x44/0x64)
[<c00c4870>] (handle_irq_event+0x0/0x64) from [<c00c7358>] (handle_fasteoi_irq+0xb4/0x124)
 r7:ffffed68 r6:c061e7cc r5:c00423ac r4:c061e780
 [<c00c72a4>] (handle_fasteoi_irq+0x0/0x124) from [<c00c4258>] (generic_handle_irq+0x30/0x38)
 r7:ffffed68 r6:c0618000 r5:ffffee38 r4:0000008d
 [<c00c4228>] (generic_handle_irq+0x0/0x38) from [<c004241c>] (asm_do_IRQ+0x64/0xe0)
 r5:ffffee38 r4:c00423ac
 [<c00423b8>] (asm_do_IRQ+0x0/0xe0) from [<c0048bc0>] (__irq_svc+0x80/0x14c)
Exception stack(0xc0619ed0 to 0xc0619f18)

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit b57e6b5 upstream.

read_lock(&tpt_trig->trig.leddev_list_lock) is accessed via the path
ieee80211_open (->) ieee80211_do_open (->) ieee80211_mod_tpt_led_trig
(->) ieee80211_start_tpt_led_trig (->) tpt_trig_timer before initializing
it.
the intilization of this read/write lock happens via the path
ieee80211_led_init (->) led_trigger_register, but we are doing
'ieee80211_led_init'  after 'ieeee80211_if_add' where we
register netdev_ops.
so we access leddev_list_lock before initializing it and causes the
following bug in chrome laptops with AR928X cards with the following
script

while true
do
sudo modprobe -v ath9k
sleep 3
sudo modprobe -r ath9k
sleep 3
done

	BUG: rwlock bad magic on CPU#1, wpa_supplicant/358, f5b9eccc
	Pid: 358, comm: wpa_supplicant Not tainted 3.0.13 #1
	Call Trace:

	[<8137b9df>] rwlock_bug+0x3d/0x47
	[<81179830>] do_raw_read_lock+0x19/0x29
	[<8137f063>] _raw_read_lock+0xd/0xf
	[<f9081957>] tpt_trig_timer+0xc3/0x145 [mac80211]
	[<f9081f3a>] ieee80211_mod_tpt_led_trig+0x152/0x174 [mac80211]
	[<f9076a3f>] ieee80211_do_open+0x11e/0x42e [mac80211]
	[<f9075390>] ? ieee80211_check_concurrent_iface+0x26/0x13c [mac80211]
	[<f9076d97>] ieee80211_open+0x48/0x4c [mac80211]
	[<812dbed8>] __dev_open+0x82/0xab
	[<812dc0c9>] __dev_change_flags+0x9c/0x113
	[<812dc1ae>] dev_change_flags+0x18/0x44
	[<8132144f>] devinet_ioctl+0x243/0x51a
	[<81321ba9>] inet_ioctl+0x93/0xac
	[<812cc951>] sock_ioctl+0x1c6/0x1ea
	[<812cc78b>] ? might_fault+0x20/0x20
	[<810b1ebb>] do_vfs_ioctl+0x46e/0x4a2
	[<810a6ebb>] ? fget_light+0x2f/0x70
	[<812ce549>] ? sys_recvmsg+0x3e/0x48
	[<810b1f35>] sys_ioctl+0x46/0x69
	[<8137fa77>] sysenter_do_call+0x12/0x2

Cc: Gary Morain <gmorain@google.com>
Cc: Paul Stewart <pstew@google.com>
Cc: Abhijit Pradhan <abhijit@qca.qualcomm.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Cc: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Acked-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 4041071 upstream.

When a PMIC is not found, this driver is unable to obtain its
'vdds_dsi_reg' regulator.  Even through its initialization function
fails, other code still calls its enable function, which fails to
check whether it has this regulator before asking for it to be enabled.

This fixes the oops, however a better fix would be to sort out the
upper layers to prevent them calling into a module which failed to
initialize.

Unable to handle kernel NULL pointer dereference at virtual address 00000038
pgd = c0004000
[00000038] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (3.3.0-rc2+ raspberrypi#228)
PC is at regulator_enable+0x10/0x70
LR is at omapdss_dpi_display_enable+0x54/0x15c
pc : [<c01b9a08>]    lr : [<c01af994>]    psr: 60000013
sp : c181fd90  ip : c181fdb0  fp : c181fdac
r10: c042eff0  r9 : 00000060  r8 : c044a164
r7 : c042c0e4  r6 : c042bd60  r5 : 00000000  r4 : c042bd60
r3 : c084de48  r2 : c181e000  r1 : c042bd60  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387d  Table: 80004019  DAC: 00000015
Process swapper (pid: 1, stack limit = 0xc181e2e8)
Stack: (0xc181fd90 to 0xc1820000)
fd80:                                     c001754c c042bd60 00000000 c042bd60
fda0: c181fdcc c181fdb0 c01af994 c01b9a04 c0016104 c042bd60 c042bd60 c044a338
fdc0: c181fdec c181fdd0 c01b5ed0 c01af94c c042bd60 c042bd60 c1aa8000 c1aa8a0c
fde0: c181fe04 c181fdf0 c01b5f54 c01b5ea8 c02fc18c c042bd60 c181fe3c c181fe08
fe00: c01b2a18 c01b5f48 c01aed14 c02fc160 c01df8ec 00000002 c042bd60 00000003
fe20: c042bd60 c1aa8000 c1aa8a0c c042eff8 c181fe84 c181fe40 c01b3874 c01b29fc
fe40: c042eff8 00000000 c042f000 c0449db8 c044ed78 00000000 c181fe74 c042eff8
fe60: c042eff8 c0449db8 c0449db8 c044ed78 00000000 00000000 c181fe94 c181fe88
fe80: c01e452c c01b35e8 c181feb4 c181fe98 c01e2fdc c01e4518 c042eff8 c0449db8
fea0: c0449db8 c181fef0 c181fecc c181feb8 c01e3104 c01e2f48 c042eff8 c042f02c
fec0: c181feec c181fed0 c01e3190 c01e30c0 c01e311c 00000000 c01e311c c0449db8
fee0: c181ff14 c181fef0 c01e1998 c01e3128 c18330a8 c1892290 c04165e8 c0449db8
ff00: c0449db8 c1ab60c0 c181ff24 c181ff18 c01e2e28 c01e194c c181ff54 c181ff28
ff20: c01e2218 c01e2e14 c039afed c181ff38 c04165e8 c041660c c0449db8 00000013
ff40: 00000000 c03ffdb8 c181ff7c c181ff58 c01e384c c01e217c c181ff7c c04165e8
ff60: c041660c c003a37c 00000013 00000000 c181ff8c c181ff80 c01e488c c01e3790
ff80: c181ff9c c181ff90 c03ffdcc c01e484c c181ffdc c181ffa0 c0008798 c03ffdc4
ffa0: c181ffc4 c181ffb0 c0056440 c0187810 c003a37c c04165e8 c041660c c003a37c
ffc0: 00000013 00000000 00000000 00000000 c181fff4 c181ffe0 c03ea284 c0008708
ffe0: 00000000 c03ea208 00000000 c181fff8 c003a37c c03ea214 1073cec0 01f7ee08
Backtrace:
[<c01b99f8>] (regulator_enable+0x0/0x70) from [<c01af994>] (omapdss_dpi_display_enable+0x54/0x15c)
 r6:c042bd60 r5:00000000 r4:c042bd60
[<c01af940>] (omapdss_dpi_display_enable+0x0/0x15c) from [<c01b5ed0>] (generic_dpi_panel_power_on+0x34/0x78)
 r6:c044a338 r5:c042bd60 r4:c042bd60
[<c01b5e9c>] (generic_dpi_panel_power_on+0x0/0x78) from [<c01b5f54>] (generic_dpi_panel_enable+0x18/0x28)
 r7:c1aa8a0c r6:c1aa8000 r5:c042bd60 r4:c042bd60
[<c01b5f3c>] (generic_dpi_panel_enable+0x0/0x28) from [<c01b2a18>] (omapfb_init_display+0x28/0x150)
 r4:c042bd60
[<c01b29f0>] (omapfb_init_display+0x0/0x150) from [<c01b3874>] (omapfb_probe+0x298/0x318)
 r8:c042eff8 r7:c1aa8a0c r6:c1aa8000 r5:c042bd60 r4:00000003
[<c01b35dc>] (omapfb_probe+0x0/0x318) from [<c01e452c>] (platform_drv_probe+0x20/0x24)
[<c01e450c>] (platform_drv_probe+0x0/0x24) from [<c01e2fdc>] (really_probe+0xa0/0x178)
[<c01e2f3c>] (really_probe+0x0/0x178) from [<c01e3104>] (driver_probe_device+0x50/0x68)
 r7:c181fef0 r6:c0449db8 r5:c0449db8 r4:c042eff8
[<c01e30b4>] (driver_probe_device+0x0/0x68) from [<c01e3190>] (__driver_attach+0x74/0x98)
 r5:c042f02c r4:c042eff8
[<c01e311c>] (__driver_attach+0x0/0x98) from [<c01e1998>] (bus_for_each_dev+0x58/0x98)
 r6:c0449db8 r5:c01e311c r4:00000000
[<c01e1940>] (bus_for_each_dev+0x0/0x98) from [<c01e2e28>] (driver_attach+0x20/0x28)
 r7:c1ab60c0 r6:c0449db8 r5:c0449db8 r4:c04165e8
[<c01e2e08>] (driver_attach+0x0/0x28) from [<c01e2218>] (bus_add_driver+0xa8/0x22c)
[<c01e2170>] (bus_add_driver+0x0/0x22c) from [<c01e384c>] (driver_register+0xc8/0x154)
[<c01e3784>] (driver_register+0x0/0x154) from [<c01e488c>] (platform_driver_register+0x4c/0x60)
 r8:00000000 r7:00000013 r6:c003a37c r5:c041660c r4:c04165e8
[<c01e4840>] (platform_driver_register+0x0/0x60) from [<c03ffdcc>] (omapfb_init+0x14/0x34)
[<c03ffdb8>] (omapfb_init+0x0/0x34) from [<c0008798>] (do_one_initcall+0x9c/0x164)
[<c00086fc>] (do_one_initcall+0x0/0x164) from [<c03ea284>] (kernel_init+0x7c/0x120)
[<c03ea208>] (kernel_init+0x0/0x120) from [<c003a37c>] (do_exit+0x0/0x2d8)
 r5:c03ea208 r4:00000000
Code: e1a0c00d e92dd870 e24cb004 e24dd004 (e5906038)
---[ end trace 9e2474c2e193b223 ]---

Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit d980e0f upstream.

When the PMIC is not found, voltdm->pmic will be NULL.  vp.c's
initialization function tries to dereferences this, which causes an
oops:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (3.3.0-rc2+ raspberrypi#204)
PC is at omap_vp_init+0x5c/0x15c
LR is at omap_vp_init+0x58/0x15c
pc : [<c03db880>]    lr : [<c03db87c>]    psr: 60000013
sp : c181ff30  ip : c181ff68  fp : c181ff64
r10: c0407808  r9 : c040786c  r8 : c0407814
r7 : c0026868  r6 : c00264fc  r5 : c040ad6c  r4 : 00000000
r3 : 00000040  r2 : 000032c8  r1 : 0000fa00  r0 : 000032c8
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387d  Table: 80004019  DAC: 00000015
Process swapper (pid: 1, stack limit = 0xc181e2e8)
Stack: (0xc181ff30 to 0xc1820000)
ff20:                                     c0381d00 c02e9c6d c0383582 c040786c
ff40: c040ad6c c00264fc c0026868 c0407814 00000000 c03d9de4 c181ff8c c181ff68
ff60: c03db448 c03db830 c02e982c c03fdfb8 c03fe004 c0039988 00000013 00000000
ff80: c181ff9c c181ff90 c03d9df8 c03db390 c181ffdc c181ffa0 c0008798 c03d9df0
ffa0: c181ffc4 c181ffb0 c0055a44 c0187050 c0039988 c03fdfb8 c03fe004 c0039988
ffc0: 00000013 00000000 00000000 00000000 c181fff4 c181ffe0 c03d1284 c0008708
ffe0: 00000000 c03d1208 00000000 c181fff8 c0039988 c03d1214 1077ce40 01f7ee08
Backtrace:
[<c03db824>] (omap_vp_init+0x0/0x15c) from [<c03db448>] (omap_voltage_late_init+0xc4/0xfc)
[<c03db384>] (omap_voltage_late_init+0x0/0xfc) from [<c03d9df8>] (omap2_common_pm_late_init+0x14/0x54)
 r8:00000000 r7:00000013 r6:c0039988 r5:c03fe004 r4:c03fdfb8
[<c03d9de4>] (omap2_common_pm_late_init+0x0/0x54) from [<c0008798>] (do_one_initcall+0x9c/0x164)
[<c00086fc>] (do_one_initcall+0x0/0x164) from [<c03d1284>] (kernel_init+0x7c/0x120)
[<c03d1208>] (kernel_init+0x0/0x120) from [<c0039988>] (do_exit+0x0/0x2cc)
 r5:c03d1208 r4:00000000
Code: e5ca300b e5900034 ebf69027 e5994024 (e5941000)
---[ end trace aed617dddaf32c3d ]---
Kernel panic - not syncing: Attempted to kill init!

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit d7178c7 upstream.

According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:

 BUG: unable to handle kernel NULL pointer dereference at 00000048
 IP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2]
 *pde = 00000000
 Oops: 0000 [#1] PREEMPT SMP
 ...
 Call Trace:
  [<d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2]
  [<d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2]
  [<c0226636>] mount_fs+0x36/0x180
  [<c023d961>] vfs_kern_mount+0x51/0xa0
  [<c023ddae>] do_kern_mount+0x3e/0xe0
  [<c023f189>] do_mount+0x169/0x700
  [<c023fa9b>] sys_mount+0x6b/0xa0
  [<c04abd1f>] sysenter_do_call+0x12/0x28
 Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43
 20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72
 48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00
 EIP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc
 CR2: 0000000000000048

This turned out due to a defect in an error path which runs if the
calculated location of the secondary super block was invalid.

This patch fixes it and eliminates the reported oops.

Reported-by: Slicky Devil <slicky.dvl@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Slicky Devil <slicky.dvl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit ab2dde9 upstream.

Unbanked GPIO irq setup code was overwriting chip_data leading
to the following oops on request_irq()

Unable to handle kernel paging request at virtual address febfffff
pgd = c22dc000
[febfffff] *pgd=00000000
Internal error: Oops: 801 [#1] PREEMPT
Modules linked in: mcu(+) edmak irqk cmemk
CPU: 0    Not tainted  (3.0.0-rc7+ raspberrypi#93)
PC is at irq_gc_mask_set_bit+0x68/0x7c
LR is at vprintk+0x22c/0x484
pc : [<c0080c0c>]    lr : [<c00457e0>]    psr: 60000093
sp : c33e3ba0  ip : c33e3af0  fp : c33e3bc4
r10: c04555bc  r9 : c33d4340  r8 : 60000013
r7 : 0000002d  r6 : c04555bc  r5 : fec67010  r4 : 00000000
r3 : c04734c8  r2 : fec00000  r1 : ffffffff  r0 : 00000026
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0005317f  Table: 822dc000  DAC: 00000015
Process modprobe (pid: 526, stack limit = 0xc33e2270)
Stack: (0xc33e3ba0 to 0xc33e4000)
3ba0: 00000000 c007d3d4 c33e3bcc c04555bc c04555bc c33d4340 c33e3bdc c33e3bc8
3bc0: c007f5f8 c0080bb4 00000000 c04555bc c33e3bf4 c33e3be0 c007f654 c007f5c0
3be0: 00000000 c04555bc c33e3c24 c33e3bf8 c007e6e8 c007f618 c01f2284 c0350af8
3c00: c0405214 bf016c98 00000001 00000000 c33dc008 0000002d c33e3c54 c33e3c28
3c20: c007e888 c007e408 00000001 c23ef880 c33dc000 00000000 c33dc080 c25caa00
3c40: c0487498 bf017078 c33e3c94 c33e3c58 bf016b44 c007e7d4 bf017078 c33dc008
3c60: c25caa08 c33dc008 c33e3c84 bf017484 c25caa00 c25caa00 c01f5f48 c25caa08
3c80: c0496d60 bf017484 c33e3ca4 c33e3c98 c022a698 bf01692c c33e3cd4 c33e3ca8
3ca0: c01f5d88 c022a688 00000000 bf017484 c25caa00 c25caa00 c01f5f48 c25caa08
3cc0: c0496d60 00000000 c33e3cec c33e3cd8 c01f5f8c c01f5d10 00000000 c33e3cf0
3ce0: c33e3d14 c33e3cf0 c01f5210 c01f5f58 c303cb48 c25ecf94 c25caa00 c25caa00
3d00: c25caa34 c33e3dd8 c33e3d34 c33e3d18 c01f6044 c01f51b8 c0496d3c c25caa00
3d20: c044e918 c33e3dd8 c33e3d44 c33e3d38 c01f4ff4 c01f5fcc c33e3d94 c33e3d48
3d40: c01f3d10 c01f4fd8 00000000 c044e918 00000000 00000000 c01f52c0 c034d570
3d60: c33e3d84 c33e3d70 c022bf84 c25caa00 00000000 c044e918 c33e3dd8 c25c2e00
3d80: c0496d60 bf01763c c33e3db4 c33e3d98 c022b1a0 c01f384c c25caa00 c33e3dd8
3da0: 00000000 c33e3dd8 c33e3dd4 c33e3db8 c022b27c c022b0e8 00000000 bf01763c
3dc0: c0451c80 c33e3dd8 c33e3e34 c33e3dd8 bf016f60 c022b210 5f75636d 746e6f63
3de0: 006c6f72 00000000 00000000 00000000 00000000 00000000 00000000 bf0174bc
3e00: 00000000 00989680 00000000 00000020 c0451c80 c0451c80 bf0174dc c01f5eb0
3e20: c33f0f00 bf0174dc c33e3e44 c33e3e38 c01f72f4 bf016e2c c33e3e74 c33e3e48
3e40: c01f5d88 c01f72e4 00000000 c0451c80 c0451cb4 bf0174dc c01f5eb0 c33f0f00
3e60: c0473100 00000000 c33e3e94 c33e3e78 c01f5f44 c01f5d10 00000000 c33e3e98
3e80: bf0174dc c01f5eb0 c33e3ebc c33e3e98 c01f5534 c01f5ec0 c303c038 c3061c30
3ea0: 00003cd8 00098258 bf0174dc c0462ac8 c33e3ecc c33e3ec0 c01f5bec c01f54dc
3ec0: c33e3efc c33e3ed0 c01f4d30 c01f5bdc bf0173a0 c33e2000 00003cd8 00098258
3ee0: bf0174dc c33e2000 c00301a4 bf019000 c33e3f1c c33e3f00 c01f6588 c01f4c8c
3f00: 00003cd8 00098258 00000000 c33e2000 c33e3f2c c33e3f20 c01f777c c01f6524
3f20: c33e3f3c c33e3f30 bf019014 c01f7740 c33e3f7c c33e3f40 c002f3ec bf019010
3f40: 00000000 00003cd8 00098258 bf017518 00000000 00003cd8 00098258 bf017518
3f60: 00000000 c00301a4 c33e2000 00000000 c33e3fa4 c33e3f80 c007b934 c002f3c4
3f80: c00b307c c00b2f48 00003cd8 00000000 00000003 00000080 00000000 c33e3fa8
3fa0: c0030020 c007b8b8 00003cd8 00000000 00098288 00003cd8 00098258 00098240
3fc0: 00003cd8 00000000 00000003 00000080 00098008 00098028 00098288 00000001
3fe0: be892998 be892988 00013d7c 40178740 60000010 00098288 09089041 00200845
Backtrace:
[<c0080ba4>] (irq_gc_mask_set_bit+0x0/0x7c) from [<c007f5f8>] (irq_enable+0x48/0x58)
 r6:c33d4340 r5:c04555bc r4:c04555bc
[<c007f5b0>] (irq_enable+0x0/0x58) from [<c007f654>] (irq_startup+0x4c/0x54)
 r5:c04555bc r4:00000000
[<c007f608>] (irq_startup+0x0/0x54) from [<c007e6e8>] (__setup_irq+0x2f0/0x3cc)
 r5:c04555bc r4:00000000
[<c007e3f8>] (__setup_irq+0x0/0x3cc) from [<c007e888>] (request_threaded_irq+0xc4/0x110)
 r8:0000002d r7:c33dc008 r6:00000000 r5:00000001 r4:bf016c98
[<c007e7c4>] (request_threaded_irq+0x0/0x110) from [<bf016b44>] (mcu_spi_probe+0x228/0x37c [mcu])
[<bf01691c>] (mcu_spi_probe+0x0/0x37c [mcu]) from [<c022a698>] (spi_drv_probe+0x20/0x24)
[<c022a678>] (spi_drv_probe+0x0/0x24) from [<c01f5d88>] (driver_probe_device+0x88/0x1b0)
[<c01f5d00>] (driver_probe_device+0x0/0x1b0) from [<c01f5f8c>] (__device_attach+0x44/0x48)
[<c01f5f48>] (__device_attach+0x0/0x48) from [<c01f5210>] (bus_for_each_drv+0x68/0x94)
 r5:c33e3cf0 r4:00000000
[<c01f51a8>] (bus_for_each_drv+0x0/0x94) from [<c01f6044>] (device_attach+0x88/0xa0)
 r7:c33e3dd8 r6:c25caa34 r5:c25caa00 r4:c25caa00
[<c01f5fbc>] (device_attach+0x0/0xa0) from [<c01f4ff4>] (bus_probe_device+0x2c/0x4c)
 r7:c33e3dd8 r6:c044e918 r5:c25caa00 r4:c0496d3c
[<c01f4fc8>] (bus_probe_device+0x0/0x4c) from [<c01f3d10>] (device_add+0x4d4/0x648)
[<c01f383c>] (device_add+0x0/0x648) from [<c022b1a0>] (spi_add_device+0xc8/0x128)
[<c022b0d8>] (spi_add_device+0x0/0x128) from [<c022b27c>] (spi_new_device+0x7c/0xb4)
 r7:c33e3dd8 r6:00000000 r5:c33e3dd8 r4:c25caa00
[<c022b200>] (spi_new_device+0x0/0xb4) from [<bf016f60>] (mcu_probe+0x144/0x224 [mcu])
 r7:c33e3dd8 r6:c0451c80 r5:bf01763c r4:00000000
[<bf016e1c>] (mcu_probe+0x0/0x224 [mcu]) from [<c01f72f4>] (platform_drv_probe+0x20/0x24)
[<c01f72d4>] (platform_drv_probe+0x0/0x24) from [<c01f5d88>] (driver_probe_device+0x88/0x1b0)
[<c01f5d00>] (driver_probe_device+0x0/0x1b0) from [<c01f5f44>] (__driver_attach+0x94/0x98)
[<c01f5eb0>] (__driver_attach+0x0/0x98) from [<c01f5534>] (bus_for_each_dev+0x68/0x94)
 r7:c01f5eb0 r6:bf0174dc r5:c33e3e98 r4:00000000
[<c01f54cc>] (bus_for_each_dev+0x0/0x94) from [<c01f5bec>] (driver_attach+0x20/0x28)
 r7:c0462ac8 r6:bf0174dc r5:00098258 r4:00003cd8
[<c01f5bcc>] (driver_attach+0x0/0x28) from [<c01f4d30>] (bus_add_driver+0xb4/0x258)
[<c01f4c7c>] (bus_add_driver+0x0/0x258) from [<c01f6588>] (driver_register+0x74/0x158)
[<c01f6514>] (driver_register+0x0/0x158) from [<c01f777c>] (platform_driver_register+0x4c/0x60)
 r7:c33e2000 r6:00000000 r5:00098258 r4:00003cd8
[<c01f7730>] (platform_driver_register+0x0/0x60) from [<bf019014>] (mcu_init+0x14/0x20 [mcu])
[<bf019000>] (mcu_init+0x0/0x20 [mcu]) from [<c002f3ec>] (do_one_initcall+0x38/0x170)
[<c002f3b4>] (do_one_initcall+0x0/0x170) from [<c007b934>] (sys_init_module+0x8c/0x1a4)
[<c007b8a8>] (sys_init_module+0x0/0x1a4) from [<c0030020>] (ret_fast_syscall+0x0/0x2c)
 r7:00000080 r6:00000003 r5:00000000 r4:00003cd8
Code: e1844003 e585400c e596300c e5932064 (e7814002)

Fix the issue.

Reported-by: Jon Povey <Jon.Povey@racelogic.co.uk>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 33b69bf upstream.

Do not close protocol driver until device has been unregistered.

This fixes a race between tty_close and hci_dev_open which can result in
a NULL-pointer dereference.

The line discipline closes the protocol driver while we may still have
hci_dev_open sleeping on the req_lock mutex resulting in a NULL-pointer
dereference when lock is acquired and hci_init_req called.

Bug is 100% reproducible using hciattach and a disconnected serial port:

0. # hciattach -n ttyO1 any noflow

1. hci_dev_open called from hci_power_on grabs req lock
2. hci_init_req executes but device fails to initialise (times out
   eventually)
3. hci_dev_open is called from hci_sock_ioctl and sleeps on req lock
4. hci_uart_tty_close detaches protocol driver and cancels init req
5. hci_dev_open (1) releases req lock
6. hci_dev_open (3) grabs req lock, calls hci_init_req, which triggers oops
   when request is prepared in hci_uart_send_frame

[  137.201263] Unable to handle kernel NULL pointer dereference at virtual address 00000028
[  137.209838] pgd = c0004000
[  137.212677] [00000028] *pgd=00000000
[  137.216430] Internal error: Oops: 17 [#1]
[  137.220642] Modules linked in:
[  137.223846] CPU: 0    Tainted: G        W     (3.3.0-rc6-dirty raspberrypi#406)
[  137.230529] PC is at __lock_acquire+0x5c/0x1ab0
[  137.235290] LR is at lock_acquire+0x9c/0x128
[  137.239776] pc : [<c0071490>]    lr : [<c00733f8>]    psr: 20000093
[  137.239776] sp : cf869dd8  ip : c0529554  fp : c051c730
[  137.251800] r10: 00000000  r9 : cf8673c0  r8 : 00000080
[  137.257293] r7 : 00000028  r6 : 00000002  r5 : 00000000  r4 : c053fd70
[  137.264129] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 00000001
[  137.270965] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[  137.278717] Control: 10c5387d  Table: 8f0f4019  DAC: 00000015
[  137.284729] Process kworker/u:1 (pid: 7, stack limit = 0xcf8682e8)
[  137.291229] Stack: (0xcf869dd8 to 0xcf86a000)
[  137.295776] 9dc0:                                                       c0529554 00000000
[  137.304351] 9de0: cf8673c0 cf868000 d03ea1ef cf868000 000001ef 00000470 00000000 00000002
[  137.312927] 9e00: cf8673c0 00000001 c051c730 c00716ec 0000000c 00000440 c0529554 00000001
[  137.321533] 9e20: c051c730 cf868000 d03ea1f3 00000000 c053b978 00000000 00000028 cf868000
[  137.330078] 9e40: 00000000 00000000 00000002 00000000 00000000 c00733f8 00000002 00000080
[  137.338684] 9e60: 00000000 c02a1d50 00000000 00000001 60000013 c0969a1c 60000093 c053b96c
[  137.347259] 9e80: 00000002 00000018 20000013 c02a1d50 cf0ac000 00000000 00000002 cf868000
[  137.355834] 9ea0: 00000089 c0374130 00000002 00000000 c02a1d50 cf0ac000 0000000c cf0fc540
[  137.364410] 9ec0: 00000018 c02a1d50 cf0fc540 00000000 cf0fc540 c0282238 c028220c cf178d80
[  137.372985] 9ee0: 127525d8 c02821cc 9a1fa451 c032727c 9a1fa451 127525d8 cf0fc540 cf0ac4ec
[  137.381561] 9f00: cf0ac000 cf0fc540 cf0ac584 c03285f4 c0328580 cf0ac4ec cf85c740 c05510cc
[  137.390136] 9f20: ce825400 c004c914 00000002 00000000 c004c884 ce8254f5 cf869f48 00000000
[  137.398712] 9f40: c0328580 ce825415 c0a7f914 c061af64 00000000 c048cf3c cf8673c0 cf85c740
[  137.407287] 9f60: c05510cc c051a66c c05510ec c05510c4 cf85c750 cf868000 00000089 c004d6ac
[  137.415863] 9f80: 00000000 c0073d14 00000001 cf853ed8 cf85c740 c004d558 00000013 00000000
[  137.424438] 9fa0: 00000000 00000000 00000000 c00516b0 00000000 00000000 cf85c740 00000000
[  137.433013] 9fc0: 00000001 dead4ead ffffffff ffffffff c0551674 00000000 00000000 c0450aa4
[  137.441589] 9fe0: cf869fe0 cf869fe0 cf853ed8 c005162c c0013b30 c0013b30 00ffff00 00ffff00
[  137.450164] [<c0071490>] (__lock_acquire+0x5c/0x1ab0) from [<c00733f8>] (lock_acquire+0x9c/0x128)
[  137.459503] [<c00733f8>] (lock_acquire+0x9c/0x128) from [<c0374130>] (_raw_spin_lock_irqsave+0x44/0x58)
[  137.469360] [<c0374130>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c02a1d50>] (skb_queue_tail+0x18/0x48)
[  137.479339] [<c02a1d50>] (skb_queue_tail+0x18/0x48) from [<c0282238>] (h4_enqueue+0x2c/0x34)
[  137.488189] [<c0282238>] (h4_enqueue+0x2c/0x34) from [<c02821cc>] (hci_uart_send_frame+0x34/0x68)
[  137.497497] [<c02821cc>] (hci_uart_send_frame+0x34/0x68) from [<c032727c>] (hci_send_frame+0x50/0x88)
[  137.507171] [<c032727c>] (hci_send_frame+0x50/0x88) from [<c03285f4>] (hci_cmd_work+0x74/0xd4)
[  137.516204] [<c03285f4>] (hci_cmd_work+0x74/0xd4) from [<c004c914>] (process_one_work+0x1a0/0x4ec)
[  137.525604] [<c004c914>] (process_one_work+0x1a0/0x4ec) from [<c004d6ac>] (worker_thread+0x154/0x344)
[  137.535278] [<c004d6ac>] (worker_thread+0x154/0x344) from [<c00516b0>] (kthread+0x84/0x90)
[  137.543975] [<c00516b0>] (kthread+0x84/0x90) from [<c0013b30>] (kernel_thread_exit+0x0/0x8)
[  137.552734] Code: e59f4e5c e5941000 e3510000 0a000031 (e5971000)
[  137.559234] ---[ end trace 1b75b31a2719ed1e ]---

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 62d2feb upstream.

Fix crash after issuing:
	echo hmc5843 0x1e > /sys/class/i2c-dev/i2c-2/device/new_device

	[   37.180999] device: '2-001e': device_add
	[   37.188293] bus: 'i2c': add device 2-001e
	[   37.194549] PM: Adding info for i2c:2-001e
	[   37.200958] bus: 'i2c': driver_probe_device: matched device 2-001e with driver hmc5843
	[   37.210815] bus: 'i2c': really_probe: probing driver hmc5843 with device 2-001e
	[   37.224884] HMC5843 initialized
	[   37.228759] ------------[ cut here ]------------
	[   37.233612] kernel BUG at mm/slab.c:505!
	[   37.237701] Internal error: Oops - BUG: 0 [#1] PREEMPT
	[   37.243103] Modules linked in:
	[   37.246337] CPU: 0    Not tainted  (3.3.1-gta04+ raspberrypi#28)
	[   37.251647] PC is at kfree+0x84/0x144
	[   37.255493] LR is at kfree+0x20/0x144
	[   37.259338] pc : [<c00b408c>]    lr : [<c00b4028>]    psr: 40000093
	[   37.259368] sp : de249cd8  ip : 0000000c  fp : 00000090
	[   37.271362] r10: 0000000a  r9 : de229eac  r8 : c0236274
	[   37.276855] r7 : c09d6490  r6 : a0000013  r5 : de229c00  r4 : de229c10
	[   37.283691] r3 : c0f00218  r2 : 00000400  r1 : c0eea000  r0 : c00b4028
	[   37.290527] Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
	[   37.298095] Control: 10c5387d  Table: 9e1d0019  DAC: 00000015
	[   37.304107] Process sh (pid: 91, stack limit = 0xde2482f0)
	[   37.309844] Stack: (0xde249cd8 to 0xde24a000)
	[   37.314422] 9cc0:                                                       de229c10 de229c00
	[   37.322998] 9ce0: de229c10 ffffffea 00000005 c0236274 de140a80 c00b4798 dec00080 de140a80
	[   37.331573] 9d00: c032f37c dec00080 000080d0 00000001 de229c00 de229c10 c048d578 00000005
	[   37.340148] 9d20: de229eac 0000000a 00000090 c032fa40 00000001 00000000 00000001 de229c10
	[   37.348724] 9d40: de229eac 00000029 c075b558 00000001 00000003 00000004 de229c10 c048d594
	[   37.357299] 9d60: 00000000 60000013 00000018 205b0007 37332020 3432322e 5d343838 c0060020
	[   37.365905] 9d80: de251600 00000001 00000000 de251600 00000001 c0065a84 de229c00 de229c48
	[   37.374481] 9da0: 00000006 0048d62c de229c38 de229c00 de229c00 de1f6c00 de1f6c20 00000001
	[   37.383056] 9dc0: 00000000 c048d62c 00000000 de229c00 de229c00 de1f6c00 de1f6c20 00000001
	[   37.391632] 9de0: 00000000 c048d62c 00000000 c0330164 00000000 de1f6c20 c048d62c de1f6c00
	[   37.400207] 9e00: c0330078 de1f6c04 c078d714 de189b58 00000000 c02ccfd8 de1f6c20 c0795f40
	[   37.408782] 9e20: c0238330 00000000 00000000 c02381a8 de1b9fc0 de1f6c20 de1f6c20 de249e48
	[   37.417358] 9e40: c0238330 c0236bb0 decdbed8 de7d0f14 de1f6c20 de1f6c20 de1f6c54 de1f6c20
	[   37.425933] 9e60: 00000000 c0238030 de1f6c20 c078d7bc de1f6c20 c02377ec de1f6c20 de1f6c28
	[   37.434509] 9e80: dee64cb0 c0236138 c047c554 de189b58 00000000 c004b45c de1f6c20 de1f6cd8
	[   37.443084] 9ea0: c0edfa6c de1f6c00 dee64c68 de1f6c04 de1f6c20 dee64cb8 c047c554 de189b58
	[   37.451690] 9ec0: 00000000 c02cd634 dee64c68 de249ef4 de23b008 dee64cb0 0000000d de23b000
	[   37.460266] 9ee0: de23b007 c02cd78c 00000002 00000000 00000000 35636d68 00333438 00000000
	[   37.468841] 9f00: 00000000 00000000 001e0000 00000000 00000000 00000000 00000000 0a10cec0
	[   37.477416] 9f20: 00000002 de249f80 0000000d dee62990 de189b40 c0234d88 0000000d c010c354
	[   37.485992] 9f40: 0000000d de210f28 000acc88 de249f80 0000000d de248000 00000000 c00b7bf8
	[   37.494567] 9f60: de210f28 000acc88 de210f28 000acc88 00000000 00000000 0000000d c00b7ed8
	[   37.503143] 9f80: 00000000 00000000 0000000d 00000000 0007fa28 0000000d 000acc88 00000004
	[   37.511718] 9fa0: c000e544 c000e380 0007fa28 0000000d 00000001 000acc88 0000000d 00000000
	[   37.520294] 9fc0: 0007fa28 0000000d 000acc88 00000004 00000001 00000020 00000002 00000000
	[   37.528869] 9fe0: 00000000 beab8624 0000ea05 b6eaebac 600d0010 00000001 00000000 00000000
	[   37.537475] [<c00b408c>] (kfree+0x84/0x144) from [<c0236274>] (device_add+0x530/0x57c)
	[   37.545806] [<c0236274>] (device_add+0x530/0x57c) from [<c032fa40>] (iio_device_register+0x8c8/0x990)
	[   37.555480] [<c032fa40>] (iio_device_register+0x8c8/0x990) from [<c0330164>] (hmc5843_probe+0xec/0x114)
	[   37.565338] [<c0330164>] (hmc5843_probe+0xec/0x114) from [<c02ccfd8>] (i2c_device_probe+0xc4/0xf8)
	[   37.574737] [<c02ccfd8>] (i2c_device_probe+0xc4/0xf8) from [<c02381a8>] (driver_probe_device+0x118/0x218)
	[   37.584777] [<c02381a8>] (driver_probe_device+0x118/0x218) from [<c0236bb0>] (bus_for_each_drv+0x4c/0x84)
	[   37.594818] [<c0236bb0>] (bus_for_each_drv+0x4c/0x84) from [<c0238030>] (device_attach+0x78/0xa4)
	[   37.604125] [<c0238030>] (device_attach+0x78/0xa4) from [<c02377ec>] (bus_probe_device+0x28/0x9c)
	[   37.613433] [<c02377ec>] (bus_probe_device+0x28/0x9c) from [<c0236138>] (device_add+0x3f4/0x57c)
	[   37.622650] [<c0236138>] (device_add+0x3f4/0x57c) from [<c02cd634>] (i2c_new_device+0xf8/0x19c)
	[   37.631805] [<c02cd634>] (i2c_new_device+0xf8/0x19c) from [<c02cd78c>] (i2c_sysfs_new_device+0xb4/0x130)
	[   37.641754] [<c02cd78c>] (i2c_sysfs_new_device+0xb4/0x130) from [<c0234d88>] (dev_attr_store+0x18/0x24)
	[   37.651611] [<c0234d88>] (dev_attr_store+0x18/0x24) from [<c010c354>] (sysfs_write_file+0x10c/0x140)
	[   37.661193] [<c010c354>] (sysfs_write_file+0x10c/0x140) from [<c00b7bf8>] (vfs_write+0xb0/0x178)
	[   37.670410] [<c00b7bf8>] (vfs_write+0xb0/0x178) from [<c00b7ed8>] (sys_write+0x3c/0x68)
	[   37.678833] [<c00b7ed8>] (sys_write+0x3c/0x68) from [<c000e380>] (ret_fast_syscall+0x0/0x3c)
	[   37.687683] Code: 1593301c e5932000 e3120080 1a000000 (e7f001f2)
	[   37.700775] ---[ end trace aaf805debdb69390 ]---

Client data was assigned to iio_dev structure in probe but in
hmc5843_init_client function casted to private driver data structure which
is wrong. Possibly calling mutex_init(&data->lock); corrupt data
which the lead to above crash.

Signed-off-by: Marek Belisko <marek.belisko@open-nandra.com>
Acked-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit b78f29c upstream.

This patch fix the oops below that catched in my machine

[   81.560602] uvesafb: NVIDIA Corporation, GT216 Board - 0696a290, Chip Rev   , OEM: NVIDIA, VBE v3.0
[   81.609384] uvesafb: protected mode interface info at c000:d350
[   81.609388] uvesafb: pmi: set display start = c00cd3b3, set palette = c00cd40e
[   81.609390] uvesafb: pmi: ports = 3b4 3b5 3ba 3c0 3c1 3c4 3c5 3c6 3c7 3c8 3c9 3cc 3ce 3cf 3d0 3d1 3d2 3d3 3d4 3d5 3da
[   81.614558] uvesafb: VBIOS/hardware doesn't support DDC transfers
[   81.614562] uvesafb: no monitor limits have been set, default refresh rate will be used
[   81.614994] uvesafb: scrolling: ypan using protected mode interface, yres_virtual=4915
[   81.744147] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   81.744153] BUG: unable to handle kernel paging request at c00cd3b3
[   81.744159] IP: [<c00cd3b3>] 0xc00cd3b2
[   81.744167] *pdpt = 00000000016d6001 *pde = 0000000001c7b067 *pte = 80000000000cd163
[   81.744171] Oops: 0011 [#1] SMP
[   81.744174] Modules linked in: uvesafb(+) cfbcopyarea cfbimgblt cfbfillrect
[   81.744178]
[   81.744181] Pid: 3497, comm: modprobe Not tainted 3.3.0-rc4NX+ raspberrypi#71 Acer            Aspire 4741                    /Aspire 4741
[   81.744185] EIP: 0060:[<c00cd3b3>] EFLAGS: 00010246 CPU: 0
[   81.744187] EIP is at 0xc00cd3b3
[   81.744189] EAX: 00004f07 EBX: 00000000 ECX: 00000000 EDX: 00000000
[   81.744191] ESI: f763f000 EDI: f763f6e8 EBP: f57f3a0c ESP: f57f3a00
[   81.744192]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[   81.744195] Process modprobe (pid: 3497, ti=f57f2000 task=f748c600 task.ti=f57f2000)
[   81.744196] Stack:
[   81.744197]  f82512c5 f759341c 00000000 f57f3a30 c124a9bc 00000001 00000001 000001e0
[   81.744202]  f8251280 f763f000 f7593400 00000000 f57f3a40 c12598dd f5c0c000 00000000
[   81.744206]  f57f3b10 c1255efe c125a21a 00000006 f763f09c 00000000 c1c6cb60 f7593400
[   81.744210] Call Trace:
[   81.744215]  [<f82512c5>] ? uvesafb_pan_display+0x45/0x60 [uvesafb]
[   81.744222]  [<c124a9bc>] fb_pan_display+0x10c/0x160
[   81.744226]  [<f8251280>] ? uvesafb_vbe_find_mode+0x180/0x180 [uvesafb]
[   81.744230]  [<c12598dd>] bit_update_start+0x1d/0x50
[   81.744232]  [<c1255efe>] fbcon_switch+0x39e/0x550
[   81.744235]  [<c125a21a>] ? bit_cursor+0x4ea/0x560
[   81.744240]  [<c129b6cb>] redraw_screen+0x12b/0x220
[   81.744245]  [<c128843b>] ? tty_do_resize+0x3b/0xc0
[   81.744247]  [<c129ef42>] vc_do_resize+0x3d2/0x3e0
[   81.744250]  [<c129efb4>] vc_resize+0x14/0x20
[   81.744253]  [<c12586bd>] fbcon_init+0x29d/0x500
[   81.744255]  [<c12984c4>] ? set_inverse_trans_unicode+0xe4/0x110
[   81.744258]  [<c129b378>] visual_init+0xb8/0x150
[   81.744261]  [<c129c16c>] bind_con_driver+0x16c/0x360
[   81.744264]  [<c129b47e>] ? register_con_driver+0x6e/0x190
[   81.744267]  [<c129c3a1>] take_over_console+0x41/0x50
[   81.744269]  [<c1257b7a>] fbcon_takeover+0x6a/0xd0
[   81.744272]  [<c12594b8>] fbcon_event_notify+0x758/0x790
[   81.744277]  [<c10929e2>] notifier_call_chain+0x42/0xb0
[   81.744280]  [<c1092d30>] __blocking_notifier_call_chain+0x60/0x90
[   81.744283]  [<c1092d7a>] blocking_notifier_call_chain+0x1a/0x20
[   81.744285]  [<c124a5a1>] fb_notifier_call_chain+0x11/0x20
[   81.744288]  [<c124b759>] register_framebuffer+0x1d9/0x2b0
[   81.744293]  [<c1061c73>] ? ioremap_wc+0x33/0x40
[   81.744298]  [<f82537c6>] uvesafb_probe+0xaba/0xc40 [uvesafb]
[   81.744302]  [<c12bb81f>] platform_drv_probe+0xf/0x20
[   81.744306]  [<c12ba558>] driver_probe_device+0x68/0x170
[   81.744309]  [<c12ba731>] __device_attach+0x41/0x50
[   81.744313]  [<c12b9088>] bus_for_each_drv+0x48/0x70
[   81.744316]  [<c12ba7f3>] device_attach+0x83/0xa0
[   81.744319]  [<c12ba6f0>] ? __driver_attach+0x90/0x90
[   81.744321]  [<c12b991f>] bus_probe_device+0x6f/0x90
[   81.744324]  [<c12b8a45>] device_add+0x5e5/0x680
[   81.744329]  [<c122a1a3>] ? kvasprintf+0x43/0x60
[   81.744332]  [<c121e6e4>] ? kobject_set_name_vargs+0x64/0x70
[   81.744335]  [<c121e6e4>] ? kobject_set_name_vargs+0x64/0x70
[   81.744339]  [<c12bbe9f>] platform_device_add+0xff/0x1b0
[   81.744343]  [<f8252906>] uvesafb_init+0x50/0x9b [uvesafb]
[   81.744346]  [<c100111f>] do_one_initcall+0x2f/0x170
[   81.744350]  [<f82528b6>] ? uvesafb_is_valid_mode+0x66/0x66 [uvesafb]
[   81.744355]  [<c10c6994>] sys_init_module+0xf4/0x1410
[   81.744359]  [<c1157fc0>] ? vfsmount_lock_local_unlock_cpu+0x30/0x30
[   81.744363]  [<c144cb10>] sysenter_do_call+0x12/0x36
[   81.744365] Code: f5 00 00 00 32 f6 66 8b da 66 d1 e3 66 ba d4 03 8a e3 b0 1c 66 ef b0 1e 66 ef 8a e7 b0 1d 66 ef b0 1f 66 ef e8 fa 00 00 00 61 c3 <60> e8 c8 00 00 00 66 8b f3 66 8b da 66 ba d4 03 b0 0c 8a e5 66
[   81.744388] EIP: [<c00cd3b3>] 0xc00cd3b3 SS:ESP 0068:f57f3a00
[   81.744391] CR2: 00000000c00cd3b3
[   81.744393] ---[ end trace 18b2c87c925b54d6 ]---

Signed-off-by: Wang YanQing <udknight@gmail.com>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit a65a6f1 upstream.

Fix race between probe and open by making sure that the disconnected
flag is not cleared until all ports have been registered.

A call to tty_open while probe is running may get a reference to the
serial structure in serial_install before its ports have been
registered. This may lead to usb_serial_core calling driver open before
port is fully initialised.

With ftdi_sio this result in the following NULL-pointer dereference as
the private data has not been initialised at open:

[  199.698286] IP: [<f811a089>] ftdi_open+0x59/0xe0 [ftdi_sio]
[  199.698297] *pde = 00000000
[  199.698303] Oops: 0000 [#1] PREEMPT SMP
[  199.698313] Modules linked in: ftdi_sio usbserial
[  199.698323]
[  199.698327] Pid: 1146, comm: ftdi_open Not tainted 3.2.11 raspberrypi#70 Dell Inc. Vostro 1520/0T816J
[  199.698339] EIP: 0060:[<f811a089>] EFLAGS: 00010286 CPU: 0
[  199.698344] EIP is at ftdi_open+0x59/0xe0 [ftdi_sio]
[  199.698348] EAX: 0000003e EBX: f5067000 ECX: 00000000 EDX: 80000600
[  199.698352] ESI: f48d8800 EDI: 00000001 EBP: f515dd54 ESP: f515dcfc
[  199.698356]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  199.698361] Process ftdi_open (pid: 1146, ti=f515c000 task=f481e040 task.ti=f515c000)
[  199.698364] Stack:
[  199.698368]  f811a9fe f811a9e0 f811b3ef 00000000 00000000 00001388 00000000 f4a86800
[  199.698387]  00000002 00000000 f806e68e 00000000 f532765c f481e040 00000246 22222222
[  199.698479]  22222222 22222222 22222222 f5067004 f5327600 f5327638 f515dd74 f806e6ab
[  199.698496] Call Trace:
[  199.698504]  [<f806e68e>] ? serial_activate+0x2e/0x70 [usbserial]
[  199.698511]  [<f806e6ab>] serial_activate+0x4b/0x70 [usbserial]
[  199.698521]  [<c126380c>] tty_port_open+0x7c/0xd0
[  199.698527]  [<f806e660>] ? serial_set_termios+0xa0/0xa0 [usbserial]
[  199.698534]  [<f806e76f>] serial_open+0x2f/0x70 [usbserial]
[  199.698540]  [<c125d07c>] tty_open+0x20c/0x510
[  199.698546]  [<c10e9eb7>] chrdev_open+0xe7/0x230
[  199.698553]  [<c10e48f2>] __dentry_open+0x1f2/0x390
[  199.698559]  [<c144bfec>] ? _raw_spin_unlock+0x2c/0x50
[  199.698565]  [<c10e4b76>] nameidata_to_filp+0x66/0x80
[  199.698570]  [<c10e9dd0>] ? cdev_put+0x20/0x20
[  199.698576]  [<c10f3e08>] do_last+0x198/0x730
[  199.698581]  [<c10f4440>] path_openat+0xa0/0x350
[  199.698587]  [<c10f47d5>] do_filp_open+0x35/0x80
[  199.698593]  [<c144bfec>] ? _raw_spin_unlock+0x2c/0x50
[  199.698599]  [<c10ff110>] ? alloc_fd+0xc0/0x100
[  199.698605]  [<c10f0b72>] ? getname_flags+0x72/0x120
[  199.698611]  [<c10e4450>] do_sys_open+0xf0/0x1c0
[  199.698617]  [<c11fcc08>] ? trace_hardirqs_on_thunk+0xc/0x10
[  199.698623]  [<c10e458e>] sys_open+0x2e/0x40
[  199.698628]  [<c144c990>] sysenter_do_call+0x12/0x36
[  199.698632] Code: 85 89 00 00 00 8b 16 8b 4d c0 c1 e2 08 c7 44 24 14 88 13 00 00 81 ca 00 00 00 80 c7 44 24 10 00 00 00 00 c7 44 24 0c 00 00 00 00 <0f> b7 41 78 31 c9 89 44 24 08 c7 44 24 04 00 00 00 00 c7 04 24
[  199.698884] EIP: [<f811a089>] ftdi_open+0x59/0xe0 [ftdi_sio] SS:ESP 0068:f515dcfc
[  199.698893] CR2: 0000000000000078
[  199.698925] ---[ end trace 77c43ec023940cff ]---

Reported-and-tested-by: Ken Huang <csuhgw@gmail.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 9432496 upstream.

Make sure hci_dev_open returns immediately if hci_dev_unregister has
been called.

This fixes a race between hci_dev_open and hci_dev_unregister which can
lead to a NULL-pointer dereference.

Bug is 100% reproducible using hciattach and a disconnected serial port:

0. # hciattach -n /dev/ttyO1 any noflow

1. hci_dev_open called from hci_power_on grabs req lock
2. hci_init_req executes but device fails to initialise (times out
   eventually)
3. hci_dev_open is called from hci_sock_ioctl and sleeps on req lock
4. hci_uart_tty_close calls hci_dev_unregister and sleeps on req lock in
   hci_dev_do_close
5. hci_dev_open (1) releases req lock
6. hci_dev_do_close grabs req lock and returns as device is not up
7. hci_dev_unregister sleeps in destroy_workqueue
8. hci_dev_open (3) grabs req lock, calls hci_init_req and eventually sleeps
9. hci_dev_unregister finishes, while hci_dev_open is still running...

[   79.627136] INFO: trying to register non-static key.
[   79.632354] the code is fine but needs lockdep annotation.
[   79.638122] turning off the locking correctness validator.
[   79.643920] [<c00188bc>] (unwind_backtrace+0x0/0xf8) from [<c00729c4>] (__lock_acquire+0x1590/0x1ab0)
[   79.653594] [<c00729c4>] (__lock_acquire+0x1590/0x1ab0) from [<c00733f8>] (lock_acquire+0x9c/0x128)
[   79.663085] [<c00733f8>] (lock_acquire+0x9c/0x128) from [<c0040a88>] (run_timer_softirq+0x150/0x3ac)
[   79.672668] [<c0040a88>] (run_timer_softirq+0x150/0x3ac) from [<c003a3b8>] (__do_softirq+0xd4/0x22c)
[   79.682281] [<c003a3b8>] (__do_softirq+0xd4/0x22c) from [<c003a924>] (irq_exit+0x8c/0x94)
[   79.690856] [<c003a924>] (irq_exit+0x8c/0x94) from [<c0013a50>] (handle_IRQ+0x34/0x84)
[   79.699157] [<c0013a50>] (handle_IRQ+0x34/0x84) from [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c)
[   79.708648] [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c) from [<c037499c>] (__irq_usr+0x3c/0x60)
[   79.718048] Exception stack(0xcf281fb0 to 0xcf281ff8)
[   79.723358] 1fa0:                                     0001e6a0 be8dab00 0001e698 00036698
[   79.731933] 1fc0: 0002df98 0002df38 0000001f 00000000 b6f234d0 00000000 00000004 00000000
[   79.740509] 1fe0: 0001e6f8 be8d6aa0 be8dac50 0000aab8 80000010 ffffffff
[   79.747497] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   79.756011] pgd = cf3b4000
[   79.758850] [00000000] *pgd=8f0c7831, *pte=00000000, *ppte=00000000
[   79.765502] Internal error: Oops: 80000007 [#1]
[   79.770294] Modules linked in:
[   79.773529] CPU: 0    Tainted: G        W     (3.3.0-rc6-00002-gb5d5c87 raspberrypi#421)
[   79.781066] PC is at 0x0
[   79.783721] LR is at run_timer_softirq+0x16c/0x3ac
[   79.788787] pc : [<00000000>]    lr : [<c0040aa4>]    psr: 60000113
[   79.788787] sp : cf281ee0  ip : 00000000  fp : cf280000
[   79.800903] r10: 00000004  r9 : 00000100  r8 : b6f234d0
[   79.806427] r7 : c0519c28  r6 : cf093488  r5 : c0561a00  r4 : 00000000
[   79.813323] r3 : 00000000  r2 : c054eee0  r1 : 00000001  r0 : 00000000
[   79.820190] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   79.827728] Control: 10c5387d  Table: 8f3b4019  DAC: 00000015
[   79.833801] Process gpsd (pid: 1265, stack limit = 0xcf2802e8)
[   79.839965] Stack: (0xcf281ee0 to 0xcf282000)
[   79.844573] 1ee0: 00000002 00000000 c0040a24 00000000 00000002 cf281f08 00200200 00000000
[   79.853210] 1f00: 00000000 cf281f18 cf281f08 00000000 00000000 00000000 cf281f18 cf281f18
[   79.861816] 1f20: 00000000 00000001 c056184c 00000000 00000001 b6f234d0 c0561848 00000004
[   79.870452] 1f40: cf280000 c003a3b8 c051e79c 00000001 00000000 00000100 3fa9e7b8 0000000a
[   79.879089] 1f60: 00000025 cf280000 00000025 00000000 00000000 b6f234d0 00000000 00000004
[   79.887756] 1f80: 00000000 c003a924 c053ad38 c0013a50 fa200000 cf281fb0 ffffffff c0008530
[   79.896362] 1fa0: 0001e6a0 0000aab8 80000010 c037499c 0001e6a0 be8dab00 0001e698 00036698
[   79.904998] 1fc0: 0002df98 0002df38 0000001f 00000000 b6f234d0 00000000 00000004 00000000
[   79.913665] 1fe0: 0001e6f8 be8d6aa0 be8dac50 0000aab8 80000010 ffffffff 00fbf700 04ffff00
[   79.922302] [<c0040aa4>] (run_timer_softirq+0x16c/0x3ac) from [<c003a3b8>] (__do_softirq+0xd4/0x22c)
[   79.931945] [<c003a3b8>] (__do_softirq+0xd4/0x22c) from [<c003a924>] (irq_exit+0x8c/0x94)
[   79.940582] [<c003a924>] (irq_exit+0x8c/0x94) from [<c0013a50>] (handle_IRQ+0x34/0x84)
[   79.948913] [<c0013a50>] (handle_IRQ+0x34/0x84) from [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c)
[   79.958404] [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c) from [<c037499c>] (__irq_usr+0x3c/0x60)
[   79.967773] Exception stack(0xcf281fb0 to 0xcf281ff8)
[   79.973083] 1fa0:                                     0001e6a0 be8dab00 0001e698 00036698
[   79.981658] 1fc0: 0002df98 0002df38 0000001f 00000000 b6f234d0 00000000 00000004 00000000
[   79.990234] 1fe0: 0001e6f8 be8d6aa0 be8dac50 0000aab8 80000010 ffffffff
[   79.997161] Code: bad PC value
[   80.000396] ---[ end trace 6f6739840475f9ee ]---
[   80.005279] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Johan Hovold <jhovold@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit cf405ae upstream.

When we boot on a machine that can hotplug CPUs and we
are using 'dom0_max_vcpus=X' on the Xen hypervisor line
to clip the amount of CPUs available to the initial domain,
we get this:

(XEN) Command line: com1=115200,8n1 dom0_mem=8G noreboot dom0_max_vcpus=8 sync_console mce_verbosity=verbose console=com1,vga loglvl=all guest_loglvl=all
.. snip..
DMI: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x032.072520111118 07/25/2011
.. snip.
SMP: Allowing 64 CPUs, 32 hotplug CPUs
installing Xen timer for CPU 7
cpu 7 spinlock event irq 361
NMI watchdog: disabled (cpu7): hardware events not enabled
Brought up 8 CPUs
.. snip..
	[acpi processor finds the CPUs are not initialized and starts calling
	arch_register_cpu, which creates /sys/devices/system/cpu/cpu8/online]
CPU 8 got hotplugged
CPU 9 got hotplugged
CPU 10 got hotplugged
.. snip..
initcall 1_acpi_battery_init_async+0x0/0x1b returned 0 after 406 usecs
calling  erst_init+0x0/0x2bb @ 1

	[and the scheduler sticks newly started tasks on the new CPUs, but
	said CPUs cannot be initialized b/c the hypervisor has limited the
	amount of vCPUS to 8 - as per the dom0_max_vcpus=8 flag.
	The spinlock tries to kick the other CPU, but the structure for that
	is not initialized and we crash.]
BUG: unable to handle kernel paging request at fffffffffffffed8
IP: [<ffffffff81035289>] xen_spin_lock+0x29/0x60
PGD 180d067 PUD 180e067 PMD 0
Oops: 0002 [#1] SMP
CPU 7
Modules linked in:

Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc2upstream-00001-gf5154e8 #1 Intel Corporation S2600CP/S2600CP
RIP: e030:[<ffffffff81035289>]  [<ffffffff81035289>] xen_spin_lock+0x29/0x60
RSP: e02b:ffff8801fb9b3a70  EFLAGS: 00010282

With this patch, we cap the amount of vCPUS that the initial domain
can run, to exactly what dom0_max_vcpus=X has specified.

In the future, if there is a hypercall that will allow a running
domain to expand past its initial set of vCPUS, this patch should
be re-evaluated.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit fb2cf2c upstream.

Under extreme memory used up situations, percpu allocation
might fail. We hit it when system goes to suspend-to-ram,
causing a kworker panic:

 EIP: [<c124411a>] build_sched_domains+0x23a/0xad0
 Kernel panic - not syncing: Fatal exception
 Pid: 3026, comm: kworker/u:3
 3.0.8-137473-gf42fbef #1

 Call Trace:
  [<c18cc4f2>] panic+0x66/0x16c
  [...]
  [<c1244c37>] partition_sched_domains+0x287/0x4b0
  [<c12a77be>] cpuset_update_active_cpus+0x1fe/0x210
  [<c123712d>] cpuset_cpu_inactive+0x1d/0x30
  [...]

With this fix applied build_sched_domains() will return -ENOMEM and
the suspend attempt fails.

Signed-off-by: he, bo <bo.he@intel.com>
Reviewed-by: Zhang, Yanmin <yanmin.zhang@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1335355161.5892.17.camel@hebo
[ So, we fail to deallocate a CPU because we cannot allocate RAM :-/
  I don't like that kind of sad behavior but nevertheless it should
  not crash under high memory load. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: change filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit b704871 upstream.

coretemp tries to access core_data array beyond bounds on cpu unplug if
core id of the cpu if more than NUM_REAL_CORES-1.

BUG: unable to handle kernel NULL pointer dereference at 000000000000013c
IP: [<ffffffffa00159af>] coretemp_cpu_callback+0x93/0x1ba [coretemp]
PGD 673e5a067 PUD 66e9b3067 PMD 0
Oops: 0000 [#1] SMP
CPU 79
Modules linked in: sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter nf_conntrack_ipv4 nf_defrag_ipv4 ip6_tables xt_state nf_conntrack coretemp crc32c_intel asix tpm_tis pcspkr usbnet iTCO_wdt i2c_i801 microcode mii joydev tpm i2c_core iTCO_vendor_support tpm_bios i7core_edac igb ioatdma edac_core dca megaraid_sas [last unloaded: oprofile]

Pid: 3315, comm: set-cpus Tainted: G        W    3.4.0-rc5+ #2 QCI QSSC-S4R/QSSC-S4R
RIP: 0010:[<ffffffffa00159af>]  [<ffffffffa00159af>] coretemp_cpu_callback+0x93/0x1ba [coretemp]
RSP: 0018:ffff880472fb3d48  EFLAGS: 00010246
RAX: 0000000000000124 RBX: 0000000000000034 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff880472fb3d88 R08: ffff88077fcd36c0 R09: 0000000000000001
R10: ffffffff8184bc48 R11: 0000000000000000 R12: ffff880273095800
R13: 0000000000000013 R14: ffff8802730a1810 R15: 0000000000000000
FS:  00007f694a20f720(0000) GS:ffff88077fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000013c CR3: 000000067209b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process set-cpus (pid: 3315, threadinfo ffff880472fb2000, task ffff880471fa0000)
Stack:
 ffff880277b4c308 0000000000000003 ffff880472fb3d88 0000000000000005
 0000000000000034 00000000ffffffd1 ffffffff81cadc70 ffff880472fb3e14
 ffff880472fb3dc8 ffffffff8161f48d ffff880471fa0000 0000000000000034
Call Trace:
 [<ffffffff8161f48d>] notifier_call_chain+0x4d/0x70
 [<ffffffff8107f1be>] __raw_notifier_call_chain+0xe/0x10
 [<ffffffff81059d30>] __cpu_notify+0x20/0x40
 [<ffffffff815fa251>] _cpu_down+0x81/0x270
 [<ffffffff815fa477>] cpu_down+0x37/0x50
 [<ffffffff815fd6a3>] store_online+0x63/0xc0
 [<ffffffff813c7078>] dev_attr_store+0x18/0x30
 [<ffffffff811f02cf>] sysfs_write_file+0xef/0x170
 [<ffffffff81180443>] vfs_write+0xb3/0x180
 [<ffffffff8118076a>] sys_write+0x4a/0x90
 [<ffffffff816236a9>] system_call_fastpath+0x16/0x1b
Code: 48 c7 c7 94 60 01 a0 44 0f b7 ac 10 ac 00 00 00 31 c0 e8 41 b7 5f e1 41 83 c5 02 49 63 c5 49 8b 44 c4 10 48 85 c0 74 56 45 31 ff <39> 58 18 75 4e eb 1f 49 63 d7 4c 89 f7 48 89 45 c8 48 6b d2 28
RIP  [<ffffffffa00159af>] coretemp_cpu_callback+0x93/0x1ba [coretemp]
 RSP <ffff880472fb3d48>
CR2: 000000000000013c

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 44eb65c upstream.

Under some circumstances, a PCI-based driver reports the following OOPs:

Mar 19 08:14:35 kvothe kernel: [ 6584.626011] Oops: 0000 [#1] SMP
--snip--
Mar 19 08:14:35 kvothe kernel: [ 6584.626011] Pid: 19627, comm: rmmod
Not tainted 3.2.9-2.fc16.x86_64 #1 LENOVO 05962RU/05962RU
Mar 19 08:14:35 kvothe kernel: [ 6584.626011] RIP:
0010:[<ffffffffa0418d39>]  [<ffffffffa0418d39>]
rtl92ce_get_desc+0x19/0xd0 [rtl8192ce]
--snip--
Mar 19 08:14:35 kvothe kernel: [ 6584.626011] Process rmmod (pid:
19627, threadinfo ffff880050262000, task ffff8801156d5cc0)
Mar 19 08:14:35 kvothe kernel: [ 6584.626011] Stack:
Mar 19 08:14:35 kvothe kernel: [ 6584.626011]  0000000000000002
ffff8801176c2540 ffff880050263ca8 ffffffffa03348e7
Mar 19 08:14:35 kvothe kernel: [ 6584.626011]  0000000000000282
0000000180150014 ffff880050263fd8 ffff8801176c2810
Mar 19 08:14:35 kvothe kernel: [ 6584.626011]  ffff880050263bc8
ffffffff810550e2 00000000000002c0 ffff8801176c0d40
Mar 19 08:14:35 kvothe kernel: [ 6584.626011] Call Trace:
Mar 19 08:14:35 kvothe kernel: [ 6584.626011]  [<ffffffffa03348e7>]
_rtl_pci_rx_interrupt+0x187/0x650 [rtlwifi]
--snip--
Mar 19 08:14:35 kvothe kernel: [ 6584.626011] Code: ff 09 d0 89 07 48
83 c4 08 5b 5d c3 66 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66
66 66 90 40 84 f6 89 d3 74 13 84 d2 75 57 <8b> 07 48 83 c4 08 5b 5d c1
e8 1f c3 0f 1f 00 84 d2 74 ed 80 fa
Mar 19 08:14:35 kvothe kernel: [ 6584.626011] RIP
[<ffffffffa0418d39>] rtl92ce_get_desc+0x19/0xd0 [rtl8192ce]
Mar 19 08:14:35 kvothe kernel: [ 6584.626011]  RSP <ffff880050263b58>
Mar 19 08:14:35 kvothe kernel: [ 6584.626011] CR2: 00000000000006e0
Mar 19 08:14:35 kvothe kernel: [ 6584.646491] ---[ end trace
8636c766dcfbe0e6 ]---

This oops is due to interrupts not being disabled in this particular path.

Reported-by: Dave Airlie <airlied@gmail.com>
Tested-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit b7e5ffe upstream.

If I try to do "cat /sys/kernel/debug/kernel_page_tables"
I end up with:

BUG: unable to handle kernel paging request at ffffc7fffffff000
IP: [<ffffffff8106aa51>] ptdump_show+0x221/0x480
PGD 0
Oops: 0000 [#1] SMP
CPU 0
.. snip..
RAX: 0000000000000000 RBX: ffffc00000000fff RCX: 0000000000000000
RDX: 0000800000000000 RSI: 0000000000000000 RDI: ffffc7fffffff000

which is due to the fact we are trying to access a PFN that is not
accessible to us. The reason (at least in this case) was that
PGD[256] is set to __HYPERVISOR_VIRT_START which was setup (by the
hypervisor) to point to a read-only linear map of the MFN->PFN array.
During our parsing we would get the MFN (a valid one), try to look
it up in the MFN->PFN tree and find it invalid and return ~0 as PFN.
Then pte_mfn_to_pfn would happilly feed that in, attach the flags
and return it back to the caller. 'ptdump_show' bitshifts it and
gets and invalid value that it tries to dereference.

Instead of doing all of that, we detect the ~0 case and just
return !_PAGE_PRESENT.

This bug has been in existence .. at least until 2.6.37 (yikes!)

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 226bb7d upstream.

The locking policy is such that the erase_complete_block spinlock is
nested within the alloc_sem mutex.  This fixes a case in which the
acquisition order was erroneously reversed.  This issue was caught by
the following lockdep splat:

   =======================================================
   [ INFO: possible circular locking dependency detected ]
   3.0.5 #1
   -------------------------------------------------------
   jffs2_gcd_mtd6/299 is trying to acquire lock:
    (&c->alloc_sem){+.+.+.}, at: [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890

   but task is already holding lock:
    (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #1 (&(&c->erase_completion_lock)->rlock){+.+...}:
          [<c008bec4>] validate_chain+0xe6c/0x10bc
          [<c008c660>] __lock_acquire+0x54c/0xba4
          [<c008d240>] lock_acquire+0xa4/0x114
          [<c046780c>] _raw_spin_lock+0x3c/0x4c
          [<c01f744c>] jffs2_garbage_collect_pass+0x4c/0x890
          [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
          [<c0071a68>] kthread+0x98/0xa0
          [<c000f264>] kernel_thread_exit+0x0/0x8

   -> #0 (&c->alloc_sem){+.+.+.}:
          [<c008ad2c>] print_circular_bug+0x70/0x2c4
          [<c008c08c>] validate_chain+0x1034/0x10bc
          [<c008c660>] __lock_acquire+0x54c/0xba4
          [<c008d240>] lock_acquire+0xa4/0x114
          [<c0466628>] mutex_lock_nested+0x74/0x33c
          [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890
          [<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
          [<c0071a68>] kthread+0x98/0xa0
          [<c000f264>] kernel_thread_exit+0x0/0x8

   other info that might help us debug this:

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(&(&c->erase_completion_lock)->rlock);
                                  lock(&c->alloc_sem);
                                  lock(&(&c->erase_completion_lock)->rlock);
     lock(&c->alloc_sem);

    *** DEADLOCK ***

   1 lock held by jffs2_gcd_mtd6/299:
    #0:  (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890

   stack backtrace:
   [<c00155dc>] (unwind_backtrace+0x0/0x100) from [<c0463dc0>] (dump_stack+0x20/0x24)
   [<c0463dc0>] (dump_stack+0x20/0x24) from [<c008ae84>] (print_circular_bug+0x1c8/0x2c4)
   [<c008ae84>] (print_circular_bug+0x1c8/0x2c4) from [<c008c08c>] (validate_chain+0x1034/0x10bc)
   [<c008c08c>] (validate_chain+0x1034/0x10bc) from [<c008c660>] (__lock_acquire+0x54c/0xba4)
   [<c008c660>] (__lock_acquire+0x54c/0xba4) from [<c008d240>] (lock_acquire+0xa4/0x114)
   [<c008d240>] (lock_acquire+0xa4/0x114) from [<c0466628>] (mutex_lock_nested+0x74/0x33c)
   [<c0466628>] (mutex_lock_nested+0x74/0x33c) from [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890)
   [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890) from [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc)
   [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc) from [<c0071a68>] (kthread+0x98/0xa0)
   [<c0071a68>] (kthread+0x98/0xa0) from [<c000f264>] (kernel_thread_exit+0x0/0x8)

This was introduce in '81cfc9f jffs2: Fix serious write stall due to erase'.

Signed-off-by: Josh Cartwright <joshc@linux.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 4523e14 upstream.

hugetlb_reserve_pages() can be used for either normal file-backed
hugetlbfs mappings, or MAP_HUGETLB.  In the MAP_HUGETLB, semi-anonymous
mode, there is not a VMA around.  The new call to resv_map_put() assumed
that there was, and resulted in a NULL pointer dereference:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
  IP: vma_resv_map+0x9/0x30
  PGD 141453067 PUD 1421e1067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP
  ...
  Pid: 14006, comm: trinity-child6 Not tainted 3.4.0+ raspberrypi#36
  RIP: vma_resv_map+0x9/0x30
  ...
  Process trinity-child6 (pid: 14006, threadinfo ffff8801414e0000, task ffff8801414f26b0)
  Call Trace:
    resv_map_put+0xe/0x40
    hugetlb_reserve_pages+0xa6/0x1d0
    hugetlb_file_setup+0x102/0x2c0
    newseg+0x115/0x360
    ipcget+0x1ce/0x310
    sys_shmget+0x5a/0x60
    system_call_fastpath+0x16/0x1b

This was reported by Dave Jones, but was reproducible with the
libhugetlbfs test cases, so shame on me for not running them in the
first place.

With this, the oops is gone, and the output of libhugetlbfs's
run_tests.py is identical to plain 3.4 again.

[ Marked for stable, since this was introduced by commit c50ac05
  ("hugetlb: fix resv_map leak in error path") which was also marked for
  stable ]

Reported-by: Dave Jones <davej@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
[ Upstream commit e49cc0d ]

We hit a kernel OOPS.

<3>[23898.789643] BUG: sleeping function called from invalid context at
/data/buildbot/workdir/ics/hardware/intel/linux-2.6/arch/x86/mm/fault.c:1103
<3>[23898.862215] in_atomic(): 0, irqs_disabled(): 0, pid: 10526, name:
Thread-6683
<4>[23898.967805] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.258526] Pid: 10526, comm: Thread-6683 Tainted: G        W
3.0.8-137685-ge7742f9 #1
<4>[23899.357404] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.904225] Call Trace:
<4>[23899.989209]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.000416]  [<c1238c2a>] __might_sleep+0x10a/0x110
<4>[23900.007357]  [<c1228021>] do_page_fault+0xd1/0x3c0
<4>[23900.013764]  [<c18e9ba9>] ? restore_all+0xf/0xf
<4>[23900.024024]  [<c17c007b>] ? napi_complete+0x8b/0x690
<4>[23900.029297]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.123739]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.128955]  [<c18ea0c3>] error_code+0x5f/0x64
<4>[23900.133466]  [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.138450]  [<c17f6298>] ? __ip_route_output_key+0x698/0x7c0
<4>[23900.144312]  [<c17f5f8d>] ? __ip_route_output_key+0x38d/0x7c0
<4>[23900.150730]  [<c17f63df>] ip_route_output_flow+0x1f/0x60
<4>[23900.156261]  [<c181de58>] ip4_datagram_connect+0x188/0x2b0
<4>[23900.161960]  [<c18e981f>] ? _raw_spin_unlock_bh+0x1f/0x30
<4>[23900.167834]  [<c18298d6>] inet_dgram_connect+0x36/0x80
<4>[23900.173224]  [<c14f9e88>] ? _copy_from_user+0x48/0x140
<4>[23900.178817]  [<c17ab9da>] sys_connect+0x9a/0xd0
<4>[23900.183538]  [<c132e93c>] ? alloc_file+0xdc/0x240
<4>[23900.189111]  [<c123925d>] ? sub_preempt_count+0x3d/0x50

Function free_fib_info resets nexthop_nh->nh_dev to NULL before releasing
fi. Other cpu might be accessing fi. Fixing it by delaying the releasing.

With the patch, we ran MTBF testing on Android mobile for 12 hours
and didn't trigger the issue.

Thank Eric for very detailed review/checking the issue.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Kun Jiang <kunx.jiang@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 08f75bf upstream.

usb_ep_ops.disable must clear external copy of the endpoint descriptor,
otherwise musb crashes after loading/unloading several gadget modules
in a row:

Unable to handle kernel paging request at virtual address bf013730
pgd = c0004000
[bf013730] *pgd=8f26d811, *pte=00000000, *ppte=00000000
Internal error: Oops: 7 [#1]
Modules linked in: g_cdc [last unloaded: g_file_storage]
CPU: 0    Not tainted  (3.2.17 raspberrypi#647)
PC is at musb_gadget_enable+0x4c/0x24c
LR is at _raw_spin_lock_irqsave+0x4c/0x58
[<c027c030>] (musb_gadget_enable+0x4c/0x24c) from [<bf01b760>] (gether_connect+0x3c/0x19c [g_cdc])
[<bf01b760>] (gether_connect+0x3c/0x19c [g_cdc]) from [<bf01ba1c>] (ecm_set_alt+0x15c/0x180 [g_cdc])
[<bf01ba1c>] (ecm_set_alt+0x15c/0x180 [g_cdc]) from [<bf01ecd4>] (composite_setup+0x85c/0xac4 [g_cdc])
[<bf01ecd4>] (composite_setup+0x85c/0xac4 [g_cdc]) from [<c027b744>] (musb_g_ep0_irq+0x844/0x924)
[<c027b744>] (musb_g_ep0_irq+0x844/0x924) from [<c027a97c>] (musb_interrupt+0x79c/0x864)
[<c027a97c>] (musb_interrupt+0x79c/0x864) from [<c027aaa8>] (generic_interrupt+0x64/0x7c)
[<c027aaa8>] (generic_interrupt+0x64/0x7c) from [<c00797cc>] (handle_irq_event_percpu+0x28/0x178)
...

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit e5851da upstream.

Remove spinlock as atomic_t can be used instead. Note we use only 16
lower bits, upper bits are changed but we impilcilty cast to u16.

This fix possible deadlock on IBSS mode reproted by lockdep:

=================================
[ INFO: inconsistent lock state ]
3.4.0-wl+ raspberrypi#4 Not tainted
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kworker/u:2/30374 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&(&intf->seqlock)->rlock){+.?...}, at: [<f9979a20>] rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib]
{IN-SOFTIRQ-W} state was registered at:
  [<c04978ab>] __lock_acquire+0x47b/0x1050
  [<c0498504>] lock_acquire+0x84/0xf0
  [<c0835733>] _raw_spin_lock+0x33/0x40
  [<f9979a20>] rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib]
  [<f9979f2a>] rt2x00queue_write_tx_frame+0x1a/0x300 [rt2x00lib]
  [<f997834f>] rt2x00mac_tx+0x7f/0x380 [rt2x00lib]
  [<f98fe363>] __ieee80211_tx+0x1b3/0x300 [mac80211]
  [<f98ffdf5>] ieee80211_tx+0x105/0x130 [mac80211]
  [<f99000dd>] ieee80211_xmit+0xad/0x100 [mac80211]
  [<f9900519>] ieee80211_subif_start_xmit+0x2d9/0x930 [mac80211]
  [<c0782e87>] dev_hard_start_xmit+0x307/0x660
  [<c079bb71>] sch_direct_xmit+0xa1/0x1e0
  [<c0784bb3>] dev_queue_xmit+0x183/0x730
  [<c078c27a>] neigh_resolve_output+0xfa/0x1e0
  [<c07b436a>] ip_finish_output+0x24a/0x460
  [<c07b4897>] ip_output+0xb7/0x100
  [<c07b2d60>] ip_local_out+0x20/0x60
  [<c07e01ff>] igmpv3_sendpack+0x4f/0x60
  [<c07e108f>] igmp_ifc_timer_expire+0x29f/0x330
  [<c04520fc>] run_timer_softirq+0x15c/0x2f0
  [<c0449e3e>] __do_softirq+0xae/0x1e0
irq event stamp: 18380437
hardirqs last  enabled at (18380437): [<c0526027>] __slab_alloc.clone.3+0x67/0x5f0
hardirqs last disabled at (18380436): [<c0525ff3>] __slab_alloc.clone.3+0x33/0x5f0
softirqs last  enabled at (18377616): [<c0449eb3>] __do_softirq+0x123/0x1e0
softirqs last disabled at (18377611): [<c041278d>] do_softirq+0x9d/0xe0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&intf->seqlock)->rlock);
  <Interrupt>
    lock(&(&intf->seqlock)->rlock);

 *** DEADLOCK ***

4 locks held by kworker/u:2/30374:
 #0:  (wiphy_name(local->hw.wiphy)){++++.+}, at: [<c045cf99>] process_one_work+0x109/0x3f0
 #1:  ((&sdata->work)){+.+.+.}, at: [<c045cf99>] process_one_work+0x109/0x3f0
 #2:  (&ifibss->mtx){+.+.+.}, at: [<f98f005b>] ieee80211_ibss_work+0x1b/0x470 [mac80211]
 #3:  (&intf->beacon_skb_mutex){+.+...}, at: [<f997a644>] rt2x00queue_update_beacon+0x24/0x50 [rt2x00lib]

stack backtrace:
Pid: 30374, comm: kworker/u:2 Not tainted 3.4.0-wl+ raspberrypi#4
Call Trace:
 [<c04962a6>] print_usage_bug+0x1f6/0x220
 [<c0496a12>] mark_lock+0x2c2/0x300
 [<c0495ff0>] ? check_usage_forwards+0xc0/0xc0
 [<c04978ec>] __lock_acquire+0x4bc/0x1050
 [<c0527890>] ? __kmalloc_track_caller+0x1c0/0x1d0
 [<c0777fb6>] ? copy_skb_header+0x26/0x90
 [<c0498504>] lock_acquire+0x84/0xf0
 [<f9979a20>] ? rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib]
 [<c0835733>] _raw_spin_lock+0x33/0x40
 [<f9979a20>] ? rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib]
 [<f9979a20>] rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib]
 [<f997a5cf>] rt2x00queue_update_beacon_locked+0x5f/0xb0 [rt2x00lib]
 [<f997a64d>] rt2x00queue_update_beacon+0x2d/0x50 [rt2x00lib]
 [<f9977e3a>] rt2x00mac_bss_info_changed+0x1ca/0x200 [rt2x00lib]
 [<f9977c70>] ? rt2x00mac_remove_interface+0x70/0x70 [rt2x00lib]
 [<f98e4dd0>] ieee80211_bss_info_change_notify+0xe0/0x1d0 [mac80211]
 [<f98ef7b8>] __ieee80211_sta_join_ibss+0x3b8/0x610 [mac80211]
 [<c0496ab4>] ? mark_held_locks+0x64/0xc0
 [<c0440012>] ? virt_efi_query_capsule_caps+0x12/0x50
 [<f98efb09>] ieee80211_sta_join_ibss+0xf9/0x140 [mac80211]
 [<f98f0456>] ieee80211_ibss_work+0x416/0x470 [mac80211]
 [<c0496d8b>] ? trace_hardirqs_on+0xb/0x10
 [<c077683b>] ? skb_dequeue+0x4b/0x70
 [<f98f207f>] ieee80211_iface_work+0x13f/0x230 [mac80211]
 [<c045cf99>] ? process_one_work+0x109/0x3f0
 [<c045d015>] process_one_work+0x185/0x3f0
 [<c045cf99>] ? process_one_work+0x109/0x3f0
 [<f98f1f40>] ? ieee80211_teardown_sdata+0xa0/0xa0 [mac80211]
 [<c045ed86>] worker_thread+0x116/0x270
 [<c045ec70>] ? manage_workers+0x1e0/0x1e0
 [<c0462f64>] kthread+0x84/0x90
 [<c0462ee0>] ? __init_kthread_worker+0x60/0x60
 [<c083d382>] kernel_thread_helper+0x6/0x10

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 954c3f8 upstream.

We need to make sure that the USB serial driver we find
matches the USB driver whose probe we are currently
executing. Otherwise we will end up with USB serial
devices bound to the correct serial driver but wrong
USB driver.

An example of such cross-probing, where the usbserial_generic
USB driver has found the sierra serial driver:

May 29 18:26:15 nemi kernel: [ 4442.559246] usbserial_generic 4-4:1.0: Sierra USB modem converter detected
May 29 18:26:20 nemi kernel: [ 4447.556747] usbserial_generic 4-4:1.2: Sierra USB modem converter detected
May 29 18:26:25 nemi kernel: [ 4452.557288] usbserial_generic 4-4:1.3: Sierra USB modem converter detected

sysfs view of the same problem:

bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root    0 May 29 18:23 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root    0 May 29 18:23 module -> ../../../../module/sierra
-rw-r--r-- 1 root root 4096 May 29 18:23 new_id
lrwxrwxrwx 1 root root    0 May 29 18:32 ttyUSB0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0/ttyUSB0
lrwxrwxrwx 1 root root    0 May 29 18:32 ttyUSB1 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2/ttyUSB1
lrwxrwxrwx 1 root root    0 May 29 18:32 ttyUSB2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3/ttyUSB2
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind

bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/usbserial_generic/
total 0
lrwxrwxrwx 1 root root    0 May 29 18:33 4-4:1.0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0
lrwxrwxrwx 1 root root    0 May 29 18:33 4-4:1.2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2
lrwxrwxrwx 1 root root    0 May 29 18:33 4-4:1.3 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root    0 May 29 18:33 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/generic/
total 0
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root    0 May 29 18:33 module -> ../../../../module/usbserial
-rw-r--r-- 1 root root 4096 May 29 18:33 new_id
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind

So we end up with a mismatch between the USB driver and the
USB serial driver.  The reason for the above is simple: The
USB driver probe will succeed if *any* registered serial
driver matches, and will use that serial driver for all
serial driver functions.

This makes ref counting go wrong. We count the USB driver
as used, but not the USB serial driver.  This may result
in Oops'es as demonstrated by Johan Hovold <jhovold@gmail.com>:

[11811.646396] drivers/usb/serial/usb-serial.c: get_free_serial 1
[11811.646443] drivers/usb/serial/usb-serial.c: get_free_serial - minor base = 0
[11811.646460] drivers/usb/serial/usb-serial.c: usb_serial_probe - registering ttyUSB0
[11811.646766] usb 6-1: pl2303 converter now attached to ttyUSB0
[11812.264197] USB Serial deregistering driver FTDI USB Serial Device
[11812.264865] usbcore: deregistering interface driver ftdi_sio
[11812.282180] USB Serial deregistering driver pl2303
[11812.283141] pl2303 ttyUSB0: pl2303 converter now disconnected from ttyUSB0
[11812.283272] usbcore: deregistering interface driver pl2303
[11812.301056] USB Serial deregistering driver generic
[11812.301186] usbcore: deregistering interface driver usbserial_generic
[11812.301259] drivers/usb/serial/usb-serial.c: usb_serial_disconnect
[11812.301823] BUG: unable to handle kernel paging request at f8e7438c
[11812.301845] IP: [<f8e38445>] usb_serial_disconnect+0xb5/0x100 [usbserial]
[11812.301871] *pde = 357ef067 *pte = 00000000
[11812.301957] Oops: 0000 [#1] PREEMPT SMP
[11812.301983] Modules linked in: usbserial(-) [last unloaded: pl2303]
[11812.302008]
[11812.302019] Pid: 1323, comm: modprobe Tainted: G        W    3.4.0-rc7+ raspberrypi#101 Dell Inc. Vostro 1520/0T816J
[11812.302115] EIP: 0060:[<f8e38445>] EFLAGS: 00010246 CPU: 1
[11812.302130] EIP is at usb_serial_disconnect+0xb5/0x100 [usbserial]
[11812.302141] EAX: f508a180 EBX: f508a180 ECX: 00000000 EDX: f8e74300
[11812.302151] ESI: f5050800 EDI: 00000001 EBP: f5141e78 ESP: f5141e58
[11812.302160]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[11812.302170] CR0: 8005003b CR2: f8e7438c CR3: 34848000 CR4: 000007d0
[11812.302180] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[11812.302189] DR6: ffff0ff0 DR7: 00000400
[11812.302199] Process modprobe (pid: 1323, ti=f5140000 task=f61e2bc0 task.ti=f5140000)
[11812.302209] Stack:
[11812.302216]  f8e3be0f f8e3b29c f8e3ae00 00000000 f513641c f5136400 f513641c f507a540
[11812.302325]  f5141e98 c133d2c1 00000000 00000000 f509c400 f513641c f507a590 f5136450
[11812.302372]  f5141ea8 c12f0344 f513641c f507a590 f5141ebc c12f0c67 00000000 f507a590
[11812.302419] Call Trace:
[11812.302439]  [<c133d2c1>] usb_unbind_interface+0x51/0x190
[11812.302456]  [<c12f0344>] __device_release_driver+0x64/0xb0
[11812.302469]  [<c12f0c67>] driver_detach+0x97/0xa0
[11812.302483]  [<c12f001c>] bus_remove_driver+0x6c/0xe0
[11812.302500]  [<c145938d>] ? __mutex_unlock_slowpath+0xcd/0x140
[11812.302514]  [<c12f0ff9>] driver_unregister+0x49/0x80
[11812.302528]  [<c1457df6>] ? printk+0x1d/0x1f
[11812.302540]  [<c133c50d>] usb_deregister+0x5d/0xb0
[11812.302557]  [<f8e37c55>] ? usb_serial_deregister+0x45/0x50 [usbserial]
[11812.302575]  [<f8e37c8d>] usb_serial_deregister_drivers+0x2d/0x40 [usbserial]
[11812.302593]  [<f8e3a6e2>] usb_serial_generic_deregister+0x12/0x20 [usbserial]
[11812.302611]  [<f8e3acf0>] usb_serial_exit+0x8/0x32 [usbserial]
[11812.302716]  [<c1080b48>] sys_delete_module+0x158/0x260
[11812.302730]  [<c110594e>] ? mntput+0x1e/0x30
[11812.302746]  [<c145c3c3>] ? sysenter_exit+0xf/0x18
[11812.302746]  [<c107777c>] ? trace_hardirqs_on_caller+0xec/0x170
[11812.302746]  [<c145c390>] sysenter_do_call+0x12/0x36
[11812.302746] Code: 24 02 00 00 e8 dd f3 20 c8 f6 86 74 02 00 00 02 74 b4 8d 86 4c 02 00 00 47 e8 78 55 4b c8 0f b6 43 0e 39 f8 7f a9 8b 53 04 89 d8 <ff> 92 8c 00 00 00 89 d8 e8 0e ff ff ff 8b 45 f0 c7 44 24 04 2f
[11812.302746] EIP: [<f8e38445>] usb_serial_disconnect+0xb5/0x100 [usbserial] SS:ESP 0068:f5141e58
[11812.302746] CR2: 00000000f8e7438c

Fix by only evaluating serial drivers pointing back to the
USB driver we are currently probing.  This still allows two
or more drivers to match the same device, running their
serial driver probes to sort out which one to use.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Tested-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
…condition

commit 26c1917 upstream.

When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.

PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
 #0 [f06a9dd8] crash_kexec at c049b5ec
 #1 [f06a9e2c] oops_end at c083d1c2
 #2 [f06a9e40] no_context at c0433ded
 #3 [f06a9e64] bad_area_nosemaphore at c043401a
 raspberrypi#4 [f06a9e6c] __do_page_fault at c0434493
 raspberrypi#5 [f06a9eec] do_page_fault at c083eb45
 raspberrypi#6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
 raspberrypi#7 [f06a9f38] _spin_lock at c083bc14
 raspberrypi#8 [f06a9f44] sys_mincore at c0507b7d
 raspberrypi#9 [f06a9fb0] system_call at c083becd
                         start           len
    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286

This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.

With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.

This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.

Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.

NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.

This bug was discovered and fully debugged by Ulrich, quote:

----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498         /* depend on compiler for an atomic pmd read */
    499         pmd_t pmdval = *pmd;

                                // edi = pmd pointer
0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
...
                                // edx = PTE page table high address
0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
...
                                // eax = PTE page table low address
0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax

[..]

Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.

-  The PMD entry {high|low} is 0x0000000000000000.
   The "mov" at 0xc0507a84 loads 0x00000000 into edx.

-  A page fault (on another CPU) sneaks in between the two "mov"
   instructions and instantiates the PMD.

-  The PMD entry {high|low} is now 0x00000003fda38067.
   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----

Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit fe20b39 upstream.

reg_timeout_work() calls restore_regulatory_settings() which
takes cfg80211_mutex.

reg_set_request_processed() already holds cfg80211_mutex
before calling cancel_delayed_work_sync(reg_timeout),
so it might deadlock.

Call the async cancel_delayed_work instead, in order
to avoid the potential deadlock.

This is the relevant lockdep warning:

cfg80211: Calling CRDA for country: XX

======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5-wl+ raspberrypi#26 Not tainted
-------------------------------------------------------
kworker/0:2/1391 is trying to acquire lock:
 (cfg80211_mutex){+.+.+.}, at: [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211]

but task is already holding lock:
 ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 ((reg_timeout).work){+.+...}:
       [<c008fd44>] validate_chain+0xb94/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c005b600>] wait_on_work+0x4c/0x154
       [<c005c000>] __cancel_work_timer+0xd4/0x11c
       [<c005c064>] cancel_delayed_work_sync+0x1c/0x20
       [<bf28b274>] reg_set_request_processed+0x50/0x78 [cfg80211]
       [<bf28bd84>] set_regdom+0x550/0x600 [cfg80211]
       [<bf294cd8>] nl80211_set_reg+0x218/0x258 [cfg80211]
       [<c03c7738>] genl_rcv_msg+0x1a8/0x1e8
       [<c03c6a00>] netlink_rcv_skb+0x5c/0xc0
       [<c03c7584>] genl_rcv+0x28/0x34
       [<c03c6720>] netlink_unicast+0x15c/0x228
       [<c03c6c7c>] netlink_sendmsg+0x218/0x298
       [<c03933c8>] sock_sendmsg+0xa4/0xc0
       [<c039406c>] __sys_sendmsg+0x1e4/0x268
       [<c0394228>] sys_sendmsg+0x4c/0x70
       [<c0013840>] ret_fast_syscall+0x0/0x3c

-> #1 (reg_mutex){+.+.+.}:
       [<c008fd44>] validate_chain+0xb94/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c04734dc>] mutex_lock_nested+0x48/0x320
       [<bf28b2cc>] reg_todo+0x30/0x538 [cfg80211]
       [<c0059f44>] process_one_work+0x2a0/0x480
       [<c005a4b4>] worker_thread+0x1bc/0x2bc
       [<c0061148>] kthread+0x98/0xa4
       [<c0014af4>] kernel_thread_exit+0x0/0x8

-> #0 (cfg80211_mutex){+.+.+.}:
       [<c008ed58>] print_circular_bug+0x68/0x2cc
       [<c008fb28>] validate_chain+0x978/0x10f0
       [<c0090b68>] __lock_acquire+0x8c8/0x9b0
       [<c0090d40>] lock_acquire+0xf0/0x114
       [<c04734dc>] mutex_lock_nested+0x48/0x320
       [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211]
       [<bf28b200>] reg_timeout_work+0x1c/0x20 [cfg80211]
       [<c0059f44>] process_one_work+0x2a0/0x480
       [<c005a4b4>] worker_thread+0x1bc/0x2bc
       [<c0061148>] kthread+0x98/0xa4
       [<c0014af4>] kernel_thread_exit+0x0/0x8

other info that might help us debug this:

Chain exists of:
  cfg80211_mutex --> reg_mutex --> (reg_timeout).work

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((reg_timeout).work);
                               lock(reg_mutex);
                               lock((reg_timeout).work);
  lock(cfg80211_mutex);

 *** DEADLOCK ***

2 locks held by kworker/0:2/1391:
 #0:  (events){.+.+.+}, at: [<c0059e94>] process_one_work+0x1f0/0x480
 #1:  ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480

stack backtrace:
[<c001b928>] (unwind_backtrace+0x0/0x12c) from [<c0471d3c>] (dump_stack+0x20/0x24)
[<c0471d3c>] (dump_stack+0x20/0x24) from [<c008ef70>] (print_circular_bug+0x280/0x2cc)
[<c008ef70>] (print_circular_bug+0x280/0x2cc) from [<c008fb28>] (validate_chain+0x978/0x10f0)
[<c008fb28>] (validate_chain+0x978/0x10f0) from [<c0090b68>] (__lock_acquire+0x8c8/0x9b0)
[<c0090b68>] (__lock_acquire+0x8c8/0x9b0) from [<c0090d40>] (lock_acquire+0xf0/0x114)
[<c0090d40>] (lock_acquire+0xf0/0x114) from [<c04734dc>] (mutex_lock_nested+0x48/0x320)
[<c04734dc>] (mutex_lock_nested+0x48/0x320) from [<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211])
[<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211]) from [<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211])
[<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211]) from [<c0059f44>] (process_one_work+0x2a0/0x480)
[<c0059f44>] (process_one_work+0x2a0/0x480) from [<c005a4b4>] (worker_thread+0x1bc/0x2bc)
[<c005a4b4>] (worker_thread+0x1bc/0x2bc) from [<c0061148>] (kthread+0x98/0xa4)
[<c0061148>] (kthread+0x98/0xa4) from [<c0014af4>] (kernel_thread_exit+0x0/0x8)
cfg80211: Calling CRDA to update world regulatory domain
cfg80211: World regulatory domain updated:
cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 6bc96d0 upstream.

Fixes:
[   15.470311] WARNING: at /local/scratch/ianc/devel/kernels/linux/fs/sysfs/file.c:498 sysfs_attr_ns+0x95/0xa0()
[   15.470326] sysfs: kobject eth0 without dirent
[   15.470333] Modules linked in:
[   15.470342] Pid: 12, comm: xenwatch Not tainted 3.4.0-x86_32p-xenU raspberrypi#93
and
[    9.150554] BUG: unable to handle kernel paging request at 2b359000
[    9.150577] IP: [<c1279561>] linkwatch_do_dev+0x81/0xc0
[    9.150592] *pdpt = 000000002c3c9027 *pde = 0000000000000000
[    9.150604] Oops: 0002 [#1] SMP
[    9.150613] Modules linked in:

This is http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=675190

Reported-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: William Dauchy <wdauchy@gmail.com>
Cc: 675190@bugs.debian.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 03e934f upstream.

Sasha Levin reported following panic :

[ 2136.383310] BUG: unable to handle kernel NULL pointer dereference at
00000000000003b0
[ 2136.384022] IP: [<ffffffff8114e400>] __lock_acquire+0xc0/0x4b0
[ 2136.384022] PGD 131c4067 PUD 11c0c067 PMD 0
[ 2136.388106] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 2136.388106] CPU 1
[ 2136.388106] Pid: 24855, comm: trinity-child1 Tainted: G        W
3.5.0-rc2-sasha-00015-g7b268f7 raspberrypi#374
[ 2136.388106] RIP: 0010:[<ffffffff8114e400>]  [<ffffffff8114e400>]
__lock_acquire+0xc0/0x4b0
[ 2136.388106] RSP: 0018:ffff8800130b3ca8  EFLAGS: 00010046
[ 2136.388106] RAX: 0000000000000086 RBX: ffff88001186b000 RCX:
0000000000000000
[ 2136.388106] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[ 2136.388106] RBP: ffff8800130b3d08 R08: 0000000000000001 R09:
0000000000000000
[ 2136.388106] R10: 0000000000000000 R11: 0000000000000001 R12:
0000000000000002
[ 2136.388106] R13: 00000000000003b0 R14: 0000000000000000 R15:
0000000000000000
[ 2136.388106] FS:  00007fa5b1bd4700(0000) GS:ffff88001b800000(0000)
knlGS:0000000000000000
[ 2136.388106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2136.388106] CR2: 00000000000003b0 CR3: 0000000011d1f000 CR4:
00000000000406e0
[ 2136.388106] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 2136.388106] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 2136.388106] Process trinity-child1 (pid: 24855, threadinfo
ffff8800130b2000, task ffff88001186b000)
[ 2136.388106] Stack:
[ 2136.388106]  ffff8800130b3cd8 ffffffff81121785 ffffffff81236774
000080d000000001
[ 2136.388106]  ffff88001b9d6c00 00000000001d6c00 ffffffff130b3d08
ffff88001186b000
[ 2136.388106]  0000000000000000 0000000000000002 0000000000000000
0000000000000000
[ 2136.388106] Call Trace:
[ 2136.388106]  [<ffffffff81121785>] ? sched_clock_local+0x25/0x90
[ 2136.388106]  [<ffffffff81236774>] ? get_empty_filp+0x74/0x220
[ 2136.388106]  [<ffffffff8114e97a>] lock_acquire+0x18a/0x1e0
[ 2136.388106]  [<ffffffff836b37df>] ? rawsock_release+0x4f/0xa0
[ 2136.388106]  [<ffffffff837c0ef0>] _raw_write_lock_bh+0x40/0x80
[ 2136.388106]  [<ffffffff836b37df>] ? rawsock_release+0x4f/0xa0
[ 2136.388106]  [<ffffffff836b37df>] rawsock_release+0x4f/0xa0
[ 2136.388106]  [<ffffffff8321cfe8>] sock_release+0x18/0x70
[ 2136.388106]  [<ffffffff8321d069>] sock_close+0x29/0x30
[ 2136.388106]  [<ffffffff81236bca>] __fput+0x11a/0x2c0
[ 2136.388106]  [<ffffffff81236d85>] fput+0x15/0x20
[ 2136.388106]  [<ffffffff8321de34>] sys_accept4+0x1b4/0x200
[ 2136.388106]  [<ffffffff837c165c>] ? _raw_spin_unlock_irq+0x4c/0x80
[ 2136.388106]  [<ffffffff837c1669>] ? _raw_spin_unlock_irq+0x59/0x80
[ 2136.388106]  [<ffffffff837c2565>] ? sysret_check+0x22/0x5d
[ 2136.388106]  [<ffffffff8321de8b>] sys_accept+0xb/0x10
[ 2136.388106]  [<ffffffff837c2539>] system_call_fastpath+0x16/0x1b
[ 2136.388106] Code: ec 04 00 0f 85 ea 03 00 00 be d5 0b 00 00 48 c7 c7
8a c1 40 84 e8 b1 a5 f8 ff 31 c0 e9 d4 03 00 00 66 2e 0f 1f 84 00 00 00
00 00 <49> 81 7d 00 60 73 5e 85 b8 01 00 00 00 44 0f 44 e0 83 fe 01 77
[ 2136.388106] RIP  [<ffffffff8114e400>] __lock_acquire+0xc0/0x4b0
[ 2136.388106]  RSP <ffff8800130b3ca8>
[ 2136.388106] CR2: 00000000000003b0
[ 2136.388106] ---[ end trace 6d450e935ee18982 ]---
[ 2136.388106] Kernel panic - not syncing: Fatal exception in interrupt

rawsock_release() should test if sock->sk is NULL before calling
sock_orphan()/sock_put()

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
[bwh: Backported to 3.2: keep using nfc_dbg(), not pr_debug()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 6266230 upstream.

If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and dm_sm_checker_create()
fails, dm_tm_create_internal() would still return success even though it
cleaned up all resources it was supposed to have created.  This will
lead to a kernel crash:

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
...
RIP: 0010:[<ffffffff81593659>]  [<ffffffff81593659>] dm_bufio_get_block_size+0x9/0x20
Call Trace:
  [<ffffffff81599bae>] dm_bm_block_size+0xe/0x10
  [<ffffffff8159b8b8>] sm_ll_init+0x78/0xd0
  [<ffffffff8159c1a6>] sm_ll_new_disk+0x16/0xa0
  [<ffffffff8159c98e>] dm_sm_disk_create+0xfe/0x160
  [<ffffffff815abf6e>] dm_pool_metadata_open+0x16e/0x6a0
  [<ffffffff815aa010>] pool_ctr+0x3f0/0x900
  [<ffffffff8158d565>] dm_table_add_target+0x195/0x450
  [<ffffffff815904c4>] table_load+0xe4/0x330
  [<ffffffff815917ea>] ctl_ioctl+0x15a/0x2c0
  [<ffffffff81591963>] dm_ctl_ioctl+0x13/0x20
  [<ffffffff8116a4f8>] do_vfs_ioctl+0x98/0x560
  [<ffffffff8116aa51>] sys_ioctl+0x91/0xa0
  [<ffffffff81869f52>] system_call_fastpath+0x16/0x1b

Fix the space map checker code to return an appropriate ERR_PTR and have
dm_sm_disk_create() and dm_tm_create_internal() check for it with
IS_ERR.

Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 0fde0a8 upstream.

Fix:

BUG: sleeping function called from invalid context at kernel/workqueue.c:2547
in_atomic(): 1, irqs_disabled(): 0, pid: 629, name: wpa_supplicant
2 locks held by wpa_supplicant/629:
 #0:  (rtnl_mutex){+.+.+.}, at: [<c08b2b84>] rtnl_lock+0x14/0x20
 #1:  (&trigger->leddev_list_lock){.+.?..}, at: [<c0867f41>] led_trigger_event+0x21/0x80
Pid: 629, comm: wpa_supplicant Not tainted 3.3.0-0.rc3.git5.1.fc17.i686
Call Trace:
 [<c046a9f6>] __might_sleep+0x126/0x1d0
 [<c0457d6c>] wait_on_work+0x2c/0x1d0
 [<c045a09a>] __cancel_work_timer+0x6a/0x120
 [<c045a160>] cancel_delayed_work_sync+0x10/0x20
 [<f7dd3c22>] rtl8187_led_brightness_set+0x82/0xf0 [rtl8187]
 [<c0867f7c>] led_trigger_event+0x5c/0x80
 [<f7ff5e6d>] ieee80211_led_radio+0x1d/0x40 [mac80211]
 [<f7ff3583>] ieee80211_stop_device+0x13/0x230 [mac80211]

Removing _sync is ok, because if led_on work is currently running
it will be finished before led_off work start to perform, since
they are always queued on the same mac80211 local->workqueue.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=795176

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit d8adde1 upstream.

kswapd_stop() is called to destroy the kswapd work thread when all memory
of a NUMA node has been offlined.  But kswapd_stop() only terminates the
work thread without resetting NODE_DATA(nid)->kswapd to NULL.  The stale
pointer will prevent kswapd_run() from creating a new work thread when
adding memory to the memory-less NUMA node again.  Eventually the stale
pointer may cause invalid memory access.

An example stack dump as below. It's reproduced with 2.6.32, but latest
kernel has the same issue.

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: [<ffffffff81051a94>] exit_creds+0x12/0x78
  PGD 0
  Oops: 0000 [#1] SMP
  last sysfs file: /sys/devices/system/memory/memory391/state
  CPU 11
  Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod
  Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default raspberrypi#92 Tecal RH2285
  RIP: 0010:exit_creds+0x12/0x78
  RSP: 0018:ffff8806044f1d78  EFLAGS: 00010202
  RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502
  RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
  RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10
  R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000
  R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001
  FS:  00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600)
  Stack:
   ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e
   ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1
   0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003
  Call Trace:
    __put_task_struct+0x5d/0x97
    kthread_stop+0x50/0x58
    offline_pages+0x324/0x3da
    memory_block_change_state+0x179/0x1db
    store_mem_state+0x9e/0xbb
    sysfs_write_file+0xd0/0x107
    vfs_write+0xad/0x169
    sys_write+0x45/0x6e
    system_call_fastpath+0x16/0x1b
  Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0
  RIP  exit_creds+0x12/0x78
   RSP <ffff8806044f1d78>
  CR2: 0000000000000000

[akpm@linux-foundation.org: add pglist_data.kswapd locking comments]
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 8d657eb upstream.

This can be trivially triggered from userspace by passing in something unexpected.

    kernel BUG at fs/locks.c:1468!
    invalid opcode: 0000 [#1] SMP
    RIP: 0010:generic_setlease+0xc2/0x100
    Call Trace:
      __vfs_setlease+0x35/0x40
      fcntl_setlease+0x76/0x150
      sys_fcntl+0x1c6/0x810
      system_call_fastpath+0x1a/0x1f

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 60d65f1 upstream.

Don't grab the daemon mutex while holding the message context mutex.
Addresses this lockdep warning:

 ecryptfsd/2141 is trying to acquire lock:
  (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]

 but task is already holding lock:
  (&(*daemon)->mux){+.+...}, at: [<ffffffffa029c2ec>] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(*daemon)->mux){+.+...}:
        [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
        [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
        [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
        [<ffffffffa029c5d7>] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs]
        [<ffffffffa029b744>] ecryptfs_send_message+0x134/0x1e0 [ecryptfs]
        [<ffffffffa029a24e>] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs]
        [<ffffffffa02960f8>] ecryptfs_write_metadata+0x108/0x250 [ecryptfs]
        [<ffffffffa0290f80>] ecryptfs_create+0x130/0x250 [ecryptfs]
        [<ffffffff811963a4>] vfs_create+0xb4/0x120
        [<ffffffff81197865>] do_last+0x8c5/0xa10
        [<ffffffff811998f9>] path_openat+0xd9/0x460
        [<ffffffff81199da2>] do_filp_open+0x42/0xa0
        [<ffffffff81187998>] do_sys_open+0xf8/0x1d0
        [<ffffffff81187a91>] sys_open+0x21/0x30
        [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b

 -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}:
        [<ffffffff810a3418>] __lock_acquire+0x1bf8/0x1c50
        [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
        [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
        [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
        [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
        [<ffffffff811887d3>] vfs_read+0xb3/0x180
        [<ffffffff811888ed>] sys_read+0x4d/0x90
        [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
commit 3cf003c upstream.

Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 raspberrypi#4 [eed63e50] do_writepages at c04f3e32
 raspberrypi#5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 raspberrypi#6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 raspberrypi#7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 raspberrypi#8 [eed63ecc] do_sync_write at c052d202
 raspberrypi#9 [eed63f74] vfs_write at c052d4ee
raspberrypi#10 [eed63f94] sys_write at c052df4c
raspberrypi#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     raspberrypi#4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     raspberrypi#5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     raspberrypi#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     raspberrypi#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     raspberrypi#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     raspberrypi#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    raspberrypi#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    raspberrypi#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    raspberrypi#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    raspberrypi#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    raspberrypi#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    raspberrypi#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    raspberrypi#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    raspberrypi#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    raspberrypi#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    raspberrypi#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    raspberrypi#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    raspberrypi#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    raspberrypi#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    raspberrypi#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    raspberrypi#24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    raspberrypi#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
ar0n pushed a commit that referenced this pull request Nov 5, 2012
…bles

commit d833352 upstream.

If a process creates a large hugetlbfs mapping that is eligible for page
table sharing and forks heavily with children some of whom fault and
others which destroy the mapping then it is possible for page tables to
get corrupted.  Some teardowns of the mapping encounter a "bad pmd" and
output a message to the kernel log.  The final teardown will trigger a
BUG_ON in mm/filemap.c.

This was reproduced in 3.4 but is known to have existed for a long time
and goes back at least as far as 2.6.37.  It was probably was introduced
in 2.6.20 by [39dde65: shared page table for hugetlb page].  The messages
look like this;

[  ..........] Lots of bad pmd messages followed by this
[  127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
[  127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
[  127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
[  127.186778] ------------[ cut here ]------------
[  127.186781] kernel BUG at mm/filemap.c:134!
[  127.186782] invalid opcode: 0000 [#1] SMP
[  127.186783] CPU 7
[  127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
[  127.186801]
[  127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild raspberrypi#53 Dell Inc. OptiPlex 990/06D7TR
[  127.186804] RIP: 0010:[<ffffffff810ed6ce>]  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[  127.186809] RSP: 0000:ffff8804144b5c08  EFLAGS: 00010002
[  127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
[  127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
[  127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
[  127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
[  127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
[  127.186815] FS:  00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
[  127.186816] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
[  127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
[  127.186821] Stack:
[  127.186822]  ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
[  127.186824]  ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
[  127.186825]  ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
[  127.186827] Call Trace:
[  127.186829]  [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
[  127.186832]  [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
[  127.186834]  [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
[  127.186837]  [<ffffffff811655c7>] evict+0xa7/0x1b0
[  127.186839]  [<ffffffff811657a3>] iput_final+0xd3/0x1f0
[  127.186840]  [<ffffffff811658f9>] iput+0x39/0x50
[  127.186842]  [<ffffffff81162708>] d_kill+0xf8/0x130
[  127.186843]  [<ffffffff81162812>] dput+0xd2/0x1a0
[  127.186845]  [<ffffffff8114e2d0>] __fput+0x170/0x230
[  127.186848]  [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
[  127.186849]  [<ffffffff8114e3ad>] fput+0x1d/0x30
[  127.186851]  [<ffffffff81117db7>] remove_vma+0x37/0x80
[  127.186853]  [<ffffffff81119182>] do_munmap+0x2d2/0x360
[  127.186855]  [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
[  127.186857]  [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
[  127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
[  127.186868] RIP  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[  127.186870]  RSP <ffff8804144b5c08>
[  127.186871] ---[ end trace 7cbac5d1db69f426 ]---

The bug is a race and not always easy to reproduce.  To reproduce it I was
doing the following on a single socket I7-based machine with 16G of RAM.

$ hugeadm --pool-pages-max DEFAULT:13G
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
$ for i in `seq 1 9000`; do ./hugetlbfs-test; done

On my particular machine, it usually triggers within 10 minutes but
enabling debug options can change the timing such that it never hits.
Once the bug is triggered, the machine is in trouble and needs to be
rebooted.  The machine will respond but processes accessing proc like "ps
aux" will hang due to the BUG_ON.  shutdown will also hang and needs a
hard reset or a sysrq-b.

The basic problem is a race between page table sharing and teardown.  For
the most part page table sharing depends on i_mmap_mutex.  In some cases,
it is also taking the mm->page_table_lock for the PTE updates but with
shared page tables, it is the i_mmap_mutex that is more important.

Unfortunately it appears to be also insufficient. Consider the following
situation

Process A					Process B
---------					---------
hugetlb_fault					shmdt
  						LockWrite(mmap_sem)
    						  do_munmap
						    unmap_region
						      unmap_vmas
						        unmap_single_vma
						          unmap_hugepage_range
      						            Lock(i_mmap_mutex)
							    Lock(mm->page_table_lock)
							    huge_pmd_unshare/unmap tables <--- (1)
							    Unlock(mm->page_table_lock)
      						            Unlock(i_mmap_mutex)
  huge_pte_alloc				      ...
    Lock(i_mmap_mutex)				      ...
    vma_prio_walk, find svma, spte		      ...
    Lock(mm->page_table_lock)			      ...
    share spte					      ...
    Unlock(mm->page_table_lock)			      ...
    Unlock(i_mmap_mutex)			      ...
  hugetlb_no_page									  <--- (2)
						      free_pgtables
						        unlink_file_vma
							hugetlb_free_pgd_range
						    remove_vma_list

In this scenario, it is possible for Process A to share page tables with
Process B that is trying to tear them down.  The i_mmap_mutex on its own
does not prevent Process A walking Process B's page tables.  At (1) above,
the page tables are not shared yet so it unmaps the PMDs.  Process A sets
up page table sharing and at (2) faults a new entry.  Process B then trips
up on it in free_pgtables.

This patch fixes the problem by adding a new function
__unmap_hugepage_range_final that is only called when the VMA is about to
be destroyed.  This function clears VM_MAYSHARE during
unmap_hugepage_range() under the i_mmap_mutex.  This makes the VMA
ineligible for sharing and avoids the race.  Superficially this looks like
it would then be vunerable to truncate and madvise issues but hugetlbfs
has its own truncate handlers so does not use unmap_mapping_range() and
does not support madvise(DONTNEED).

This should be treated as a -stable candidate if it is merged.

Test program is as follows. The test case was mostly written by Michal
Hocko with a few minor changes to reproduce this bug.

==== CUT HERE ====

static size_t huge_page_size = (2UL << 20);
static size_t nr_huge_page_A = 512;
static size_t nr_huge_page_B = 5632;

unsigned int get_random(unsigned int max)
{
	struct timeval tv;

	gettimeofday(&tv, NULL);
	srandom(tv.tv_usec);
	return random() % max;
}

static void play(void *addr, size_t size)
{
	unsigned char *start = addr,
		      *end = start + size,
		      *a;
	start += get_random(size/2);

	/* we could itterate on huge pages but let's give it more time. */
	for (a = start; a < end; a += 4096)
		*a = 0;
}

int main(int argc, char **argv)
{
	key_t key = IPC_PRIVATE;
	size_t sizeA = nr_huge_page_A * huge_page_size;
	size_t sizeB = nr_huge_page_B * huge_page_size;
	int shmidA, shmidB;
	void *addrA = NULL, *addrB = NULL;
	int nr_children = 300, n = 0;

	if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
		perror("shmget:");
		return 1;
	}

	if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
		perror("shmat");
		return 1;
	}
	if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
		perror("shmget:");
		return 1;
	}

	if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
		perror("shmat");
		return 1;
	}

fork_child:
	switch(fork()) {
		case 0:
			switch (n%3) {
			case 0:
				play(addrA, sizeA);
				break;
			case 1:
				play(addrB, sizeB);
				break;
			case 2:
				break;
			}
			break;
		case -1:
			perror("fork:");
			break;
		default:
			if (++n < nr_children)
				goto fork_child;
			play(addrA, sizeA);
			break;
	}
	shmdt(addrA);
	shmdt(addrB);
	do {
		wait(NULL);
	} while (--n > 0);
	shmctl(shmidA, IPC_RMID, NULL);
	shmctl(shmidB, IPC_RMID, NULL);
	return 0;
}

[akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
 - Adjust context
 - Drop the mmu_gather * parameters]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants