dwc_otg: more robust interrupt handling #330

P33M · 2013-07-13T21:11:28Z

This PR fixes issues such as the error message

ERROR::handle_hc_chhltd_intr_dma:2537: handle_hc_chhltd_intr_dma: Channel 7, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x04400001

popping up in the log since
db4fad7.
I would prefer a round of testing by people that have seen this issue before merging for general consumption, I will advertise as necessary in the relevant places.

The dwc_otg driver will unmask certain interrupts on a transaction that previously halted in the error state in order to reset the QTD error count. The various fine-grained interrupt handlers do not consider that other interrupts besides themselves were unmasked. By disabling the two other interrupts only ever enabled in DMA mode for this purpose, we can avoid unnecessary function calls in the IRQ handler. This will also prevent an unneccesary FIQ interrupt from being generated if the FIQ is enabled.

In the case of a transaction to a device that had previously aborted due to an error, several interrupts are enabled to reset the error count when a device responds. This has the side-effect of making the FIQ thrash because the hardware will generate multiple instances of a NAK on an IN bulk/interrupt endpoint and multiple instances of ACK on an OUT bulk/interrupt endpoint. Make the FIQ mask and clear the associated interrupts. Additionally, on non-split transactions make sure that only unmasked interrupts are cleared. This caused a hard-to-trigger but serious race condition when you had the combination of an endpoint awaiting error recovery and a transaction completed on an endpoint - due to the sequencing and timing of interrupts generated by the dwc_otg core, it was possible to confuse the IRQ handler.

P33M · 2013-07-14T13:30:20Z

I can say that this patch definitely squashes the issues that I've seen at least - I can artificially create endpoints that are very lossy (via the world's worst 10-port hub and several pendrives) which completely breaks USB with plenty of the above error message, and will continuously reset/probe/error/reset devices (as it should do for dodgy connections) after the patch.

popcornmix · 2013-07-14T17:29:07Z

I'm testing now, and no problems so far. I'll keep you updated.

popcornmix · 2013-07-15T16:30:26Z

I just got this:

<4>[17013.296896] ERROR::handle_hc_chhltd_intr_dma:2574: handle_hc_chhltd_intr_dma: Channel 6, DMA1
<5>[17024.229527] nfs: server 10.177.71.57 not responding, still trying
<5>[17059.541002] nfs: server 10.177.71.57 not responding, still trying

http://pastebin.com/NNB78K9K

$ uname -a
Linux raspberrypi 3.6.11+ #492 PREEMPT Sun Jul 14 00:04:59 BST 2013 armv6l GNU/Linux

I'd booted from nfs and had xbmc running in the background, but idle.

I also get these failures without this PR.

P33M · 2013-07-15T17:37:29Z

The pastebin chopped off the registers in that error which is the important bit. Can you still get at the message log?

How frequent are these errors now vs before?
There's also a bluetooth dongle in there - can I get a lsusb -v for that. It may be a full-speed device which could result in interesting things for the same sort of error happening on split transactions.

popcornmix · 2013-07-15T17:47:14Z

No sign of "handle_hc_chhltd_intr_dma" in /var/log/*
I'll catch it next time.

I would say it used to occur every couple of hours (with work setup).
It's happened once today (although it's not been heavily used). No evidence that it is worse.

No bluetooth - this is what I had (and still have) connected:

$ lsusb -v

Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp.
Couldn't open device, some information will be missing
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         2 TT per port
  bMaxPacketSize0        64
  idVendor           0x0424 Standard Microsystems Corp.
  idProduct          0x9512
  bcdDevice            2.00
  iManufacturer           0
  iProduct                0
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           41
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                2mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      1 Single TT
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0001  1x 1 bytes
        bInterval              12
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       1
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      2 TT per port
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0001  1x 1 bytes
        bInterval              12

Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Couldn't open device, some information will be missing
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         1 Single TT
  bMaxPacketSize0        64
  idVendor           0x1d6b Linux Foundation
  idProduct          0x0002 2.0 root hub
  bcdDevice            3.06
  iManufacturer           3
  iProduct                2
  iSerial                 1
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0004  1x 4 bytes
        bInterval              12

Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp.
Couldn't open device, some information will be missing
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass          255 Vendor Specific Class
  bDeviceSubClass         0
  bDeviceProtocol         1
  bMaxPacketSize0        64
  idVendor           0x0424 Standard Microsystems Corp.
  idProduct          0xec00
  bcdDevice            2.00
  iManufacturer           0
  iProduct                0
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           39
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                2mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0
      bInterfaceProtocol    255
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0010  1x 16 bytes
        bInterval               4

Bus 001 Device 004: ID 0424:2514 Standard Microsystems Corp. USB 2.0 Hub
Couldn't open device, some information will be missing
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         2 TT per port
  bMaxPacketSize0        64
  idVendor           0x0424 Standard Microsystems Corp.
  idProduct          0x2514 USB 2.0 Hub
  bcdDevice            0.00
  iManufacturer           0
  iProduct                0
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           41
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                2mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      1 Single TT
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0001  1x 1 bytes
        bInterval              12
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       1
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      2 TT per port
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0001  1x 1 bytes
        bInterval              12

Bus 001 Device 005: ID 413c:2107 Dell Computer Corp.
Couldn't open device, some information will be missing
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0         8
  idVendor           0x413c Dell Computer Corp.
  idProduct          0x2107
  bcdDevice            5.02
  iManufacturer           1
  iProduct                2
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           34
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xa0
      (Bus Powered)
      Remote Wakeup
    MaxPower               70mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      1 Boot Interface Subclass
      bInterfaceProtocol      1 Keyboard
      iInterface              0
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.10
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength      65
         Report Descriptors:
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10

P33M · 2013-07-15T20:40:53Z

The only alternative could be something similar happening with a split transaction.

I did specifically test this before and after with a FS device known to produce 2 data toggle errors on the first transaction after a reset, result was negative for this error before and after which led me to believe that split transactions weren't affected. Data toggle errors halt the channel in split transactions and they are now handled exactly like HS transaction errors, so I used this as an example repeatable test.

P33M · 2013-07-16T14:22:40Z

3 positives from forum users:

http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=49223&start=25
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=49902

I have now tested split transactions using the same tactic as I generated multiple errors for USB2.0 devices - I have not seen any FS/LS devices cause the error condition to appear.

popcornmix · 2013-07-16T14:35:00Z

I've not had the handle_hc_chhltd_intr_dma message repeat.
I believe I was seeing the issue more frequently without the patch.
I'm seeing no new problems.

I'm happy to accept this PR, and produce updated firmware for wider testing.
Agree?

ghollingworth · 2013-07-16T14:38:31Z

Yep,

It's clearly an improvement

Gordon

dwc_otg: more robust interrupt handling

popcornmix · 2013-07-20T16:28:49Z

I caught it again:

<4>[  200.089014] ERROR::handle_hc_chhltd_intr_dma:2579: handle_hc_chhltd_intr_dma: Channel 4, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x04600021
<5>[  209.324850] nfs: server 192.168.4.41 not responding, still trying
<5>[  213.734972] nfs: server 192.168.4.41 not responding, still trying
<5>[  218.145112] nfs: server 192.168.4.41 not responding, still trying
<5>[  222.555243] nfs: server 192.168.4.41 not responding, still trying
<5>[  226.965369] nfs: server 192.168.4.41 not responding, still trying
<5>[  231.375507] nfs: server 192.168.4.41 not responding, still trying
<6>[  237.180411] SysRq : DEBUG

Running on latest "next" kernel:

Linux version 3.9.10+ (dc4@dc4-arm-01) (gcc version 4.7.2 20120731 (prerelease) (crosstool-NG linaro-1.13.1+bzr2458 - Linaro GCC 2012.08) ) #508 PREEMPT Fri Jul 19 20:55:21 BST 2013

P33M · 2013-07-20T20:34:23Z

Hm

I suspect that what's happening in this case is the delay between ack/xfercomp and chhltd is becoming large. I suspect that the dwc DMA controller is having a lot of work to do because of an NFS root which delays the halt - provoking the race between the FIQ and the IRQ.

Given that the bogus channel halt isn't really a disaster, I should probably add a retry mechanism similar to what I did for split data toggle errors...

…iable declaration Networking prefers this style, so warn when it's not used. Networking uses: void foo(int bar) { int baz; code... } not void foo(int bar) { int baz; code... } There are a limited number of false positives when using macros to declare variables like: WARNING: networking uses a blank line after declarations #330: FILE: net/ipv4/inet_hashtables.c:330: + int dif = sk->sk_bound_dev_if; + INET_ADDR_COOKIE(acookie, saddr, daddr) Signed-off-by: Joe Perches <joe@perches.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

See: raspberrypi/linux#330 kernel: fix for Oops in dwc_otg_dump_spram. See raspberrypi#21 firmware: fix, part 2, for audo sync when pcm decoder combines multiple packets. See http://forum.xbmc.org/showthread.php?tid=140518&pid=1463506#pid1463506

With completion using swait and so rawlocks we don't need this anymore. Further, bisect thinks this patch is responsible for: |BUG: unable to handle kernel NULL pointer dereference at (null) |IP: [<ffffffff81082123>] sched_cpu_active+0x53/0x70 |PGD 0 |Oops: 0000 [raspberrypi#1] PREEMPT SMP DEBUG_PAGEALLOC |Dumping ftrace buffer: | (ftrace buffer empty) |Modules linked in: |CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.1+ raspberrypi#330 |Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 |task: ffff88013ae64b00 ti: ffff88013ae74000 task.ti: ffff88013ae74000 |RIP: 0010:[<ffffffff81082123>] [<ffffffff81082123>] sched_cpu_active+0x53/0x70 |RSP: 0000:ffff88013ae77eb8 EFLAGS: 00010082 |RAX: 0000000000000001 RBX: ffffffff81c2cf20 RCX: 0000001050fb52fb |RDX: 0000001050fb52fb RSI: 000000105117ca1e RDI: 00000000001c7723 |RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 |R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff |R13: ffffffff81c2cee0 R14: 0000000000000000 R15: 0000000000000001 |FS: 0000000000000000(0000) GS:ffff88013b200000(0000) knlGS:0000000000000000 |CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b |CR2: 0000000000000000 CR3: 0000000001c09000 CR4: 00000000000006e0 |Stack: | ffffffff810c446d ffff88013ae77f00 ffffffff8107d8dd 000000000000000a | 0000000000000001 0000000000000000 0000000000000000 0000000000000000 | 0000000000000000 ffff88013ae77f10 ffffffff8107d90e ffff88013ae77f20 |Call Trace: | [<ffffffff810c446d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 | [<ffffffff8107d8dd>] ? notifier_call_chain+0x5d/0x80 | [<ffffffff8107d90e>] ? __raw_notifier_call_chain+0xe/0x10 | [<ffffffff810598a3>] ? cpu_notify+0x23/0x40 | [<ffffffff8105a7b8>] ? notify_cpu_starting+0x28/0x30 during hotplug. The rawlocks need to remain however. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

See: raspberrypi/linux#330 kernel: fix for Oops in dwc_otg_dump_spram. See raspberrypi#21 firmware: fix, part 2, for audo sync when pcm decoder combines multiple packets. See http://forum.xbmc.org/showthread.php?tid=140518&pid=1463506#pid1463506

Geminilake can output two pixels per clock, and that affects the maximum scaling factor for its scalers. Take that into account and avoid the following warning: WARNING: CPU: 1 PID: 593 at drivers/gpu/drm/i915/intel_display.c:13223 skl_max_scale.part.129+0x78/0x80 [i915] WARN_ON_ONCE(!crtc_clock || cdclk < crtc_clock) Modules linked in: x86_pkg_temp_thermal i915 coretemp kvm_intel kvm i2c_algo_bit drm_kms_helper irqbypass crct10dif_pclmul prime_numbers crc32_pclmul drm ghash_clmulni_intel shpchp tpm_tis tpm_tis_core tpm nfsd authw CPU: 1 PID: 593 Comm: kworker/u8:3 Tainted: G W 4.10.0-rc8ander+ raspberrypi#330 Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0035.B33.1702150552 02/15/2017 Workqueue: events_unbound async_run_entry_fn Call Trace: dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 skl_max_scale.part.129+0x78/0x80 [i915] intel_check_primary_plane+0xa6/0xc0 [i915] intel_plane_atomic_check_with_state+0xd1/0x1a0 [i915] ? drm_printk+0xb5/0xc0 [drm] intel_plane_atomic_check+0x3d/0x80 [i915] drm_atomic_helper_check_planes+0x7c/0x200 [drm_kms_helper] intel_atomic_check+0xa5b/0x11a0 [i915] drm_atomic_check_only+0x353/0x600 [drm] ? drm_atomic_add_affected_connectors+0x10c/0x120 [drm] drm_atomic_commit+0x18/0x50 [drm] restore_fbdev_mode+0x14c/0x2a0 [drm_kms_helper] drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper] drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper] intel_fbdev_set_par+0x1a/0x70 [i915] fbcon_init+0x582/0x610 visual_init+0xd6/0x130 do_bind_con_driver+0x1da/0x3c0 do_take_over_console+0x116/0x180 do_fbcon_takeover+0x5c/0xb0 fbcon_event_notify+0x772/0x8a0 ? __blocking_notifier_call_chain+0x35/0x70 notifier_call_chain+0x4a/0x70 __blocking_notifier_call_chain+0x4d/0x70 blocking_notifier_call_chain+0x16/0x20 fb_notifier_call_chain+0x1b/0x20 register_framebuffer+0x278/0x360 drm_fb_helper_initial_config+0x253/0x440 [drm_kms_helper] intel_fbdev_initial_config+0x18/0x30 [i915] async_run_entry_fn+0x39/0x170 process_one_work+0x212/0x670 ? process_one_work+0x197/0x670 worker_thread+0x4e/0x490 kthread+0x101/0x140 ? process_one_work+0x670/0x670 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x31/0x40 v2: s/max_pixclk/max_dotclk/ (Ville) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223071600.14356-3-ander.conselvan.de.oliveira@intel.com

commit 44d1745 upstream. Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock on x86 due to a chain of locks and SRCU synchronizations. Translating the below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the fairness of r/w semaphores). CPU0 CPU1 CPU2 1 lock(&kvm->slots_lock); 2 lock(&vcpu->mutex); 3 lock(&kvm->srcu); 4 lock(cpu_hotplug_lock); 5 lock(kvm_lock); 6 lock(&kvm->slots_lock); 7 lock(cpu_hotplug_lock); 8 sync(&kvm->srcu); Note, there are likely more potential deadlocks in KVM x86, e.g. the same pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with __kvmclock_cpufreq_notifier(): cpuhp_cpufreq_online() | -> cpufreq_online() | -> cpufreq_gov_performance_limits() | -> __cpufreq_driver_target() | -> __target_index() | -> cpufreq_freq_transition_begin() | -> cpufreq_notify_transition() | -> ... __kvmclock_cpufreq_notifier() But, actually triggering such deadlocks is beyond rare due to the combination of dependencies and timings involved. E.g. the cpufreq notifier is only used on older CPUs without a constant TSC, mucking with the NX hugepage mitigation while VMs are running is very uncommon, and doing so while also onlining/offlining a CPU (necessary to generate contention on cpu_hotplug_lock) would be even more unusual. The most robust solution to the general cpu_hotplug_lock issue is likely to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq notifier doesn't to take kvm_lock. For now, settle for fixing the most blatant deadlock, as switching to an RCU-protected list is a much more involved change, but add a comment in locking.rst to call out that care needs to be taken when walking holding kvm_lock and walking vm_list. ====================================================== WARNING: possible circular locking dependency detected 6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O ------------------------------------------------------ tee/35048 is trying to acquire lock: ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm] but task is already holding lock: ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (kvm_lock){+.+.}-{3:3}: __mutex_lock+0x6a/0xb40 mutex_lock_nested+0x1f/0x30 kvm_dev_ioctl+0x4fb/0xe50 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #2 (cpu_hotplug_lock){++++}-{0:0}: cpus_read_lock+0x2e/0xb0 static_key_slow_inc+0x16/0x30 kvm_lapic_set_base+0x6a/0x1c0 [kvm] kvm_set_apic_base+0x8f/0xe0 [kvm] kvm_set_msr_common+0x9ae/0xf80 [kvm] vmx_set_msr+0xa54/0xbe0 [kvm_intel] __kvm_set_msr+0xb6/0x1a0 [kvm] kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm] kvm_vcpu_ioctl+0x485/0x5b0 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&kvm->srcu){.+.+}-{0:0}: __synchronize_srcu+0x44/0x1a0 synchronize_srcu_expedited+0x21/0x30 kvm_swap_active_memslots+0x110/0x1c0 [kvm] kvm_set_memslot+0x360/0x620 [kvm] __kvm_set_memory_region+0x27b/0x300 [kvm] kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm] kvm_vm_ioctl+0x295/0x650 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&kvm->slots_lock){+.+.}-{3:3}: __lock_acquire+0x15ef/0x2e30 lock_acquire+0xe0/0x260 __mutex_lock+0x6a/0xb40 mutex_lock_nested+0x1f/0x30 set_nx_huge_pages+0x179/0x1e0 [kvm] param_attr_store+0x93/0x100 module_attr_store+0x22/0x40 sysfs_kf_write+0x81/0xb0 kernfs_fop_write_iter+0x133/0x1d0 vfs_write+0x28d/0x380 ksys_write+0x70/0xe0 __x64_sys_write+0x1f/0x30 x64_sys_call+0x281b/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Cc: Chao Gao <chao.gao@intel.com> Fixes: 0bf5049 ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock") Cc: stable@vger.kernel.org Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240830043600.127750-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

P33M added 2 commits July 13, 2013 20:41

popcornmix added a commit that referenced this pull request Jul 16, 2013

Merge pull request #330 from P33M/rpi-3.6.y

30500d9

dwc_otg: more robust interrupt handling

popcornmix merged commit 30500d9 into raspberrypi:rpi-3.6.y Jul 16, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dwc_otg: more robust interrupt handling #330

dwc_otg: more robust interrupt handling #330

Uh oh!

P33M commented Jul 13, 2013

Uh oh!

P33M commented Jul 14, 2013

Uh oh!

popcornmix commented Jul 14, 2013

Uh oh!

popcornmix commented Jul 15, 2013

Uh oh!

P33M commented Jul 15, 2013

Uh oh!

popcornmix commented Jul 15, 2013

Uh oh!

P33M commented Jul 15, 2013

Uh oh!

P33M commented Jul 16, 2013

Uh oh!

popcornmix commented Jul 16, 2013

Uh oh!

ghollingworth commented Jul 16, 2013

Uh oh!

popcornmix commented Jul 20, 2013

Uh oh!

P33M commented Jul 20, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dwc_otg: more robust interrupt handling #330

dwc_otg: more robust interrupt handling #330

Uh oh!

Conversation

P33M commented Jul 13, 2013

Uh oh!

P33M commented Jul 14, 2013

Uh oh!

popcornmix commented Jul 14, 2013

Uh oh!

popcornmix commented Jul 15, 2013

Uh oh!

P33M commented Jul 15, 2013

Uh oh!

popcornmix commented Jul 15, 2013

Uh oh!

P33M commented Jul 15, 2013

Uh oh!

P33M commented Jul 16, 2013

Uh oh!

popcornmix commented Jul 16, 2013

Uh oh!

ghollingworth commented Jul 16, 2013

Uh oh!

popcornmix commented Jul 20, 2013

Uh oh!

P33M commented Jul 20, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants