Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raspberrypi-bootloader (1.20120810-1) update deletes files in /boot #100

Closed
xnosek00 opened this issue Aug 29, 2012 · 8 comments
Closed

Comments

@xnosek00
Copy link

I did
aptitude update; aptitude safe upgrade and after that, my RASPI freezes

after reboot I did: dpkg --configure -a

root@raspberrypi:/home/pi# dpkg --configure -a
Setting up raspberrypi-bootloader (1.20120810-1) …
Your current start.elf is the same as arm240_start.elf
Was not able to detect current memory split
Diversion „diversion /boot/arm128_start.elf by package rpikernelhack“ doesn't exist, not deleted
Diversion „diversion /boot/arm192_start.elf by package rpikernelhack“ doesn't exist, not deleted
Diversion „diversion /boot/arm224_start.elf by package rpikernelhack“ doesn't exist, not deleted
Diversion „diversion /boot/arm240_start.elf by package rpikernelhack“ doesn't exist, not deleted

and so on

All files in my /boot are deleted and RASPI can't boot.

Deleted files: (bootcode.bin, kernel.img, kernel_cutdown.img, kernel_emergency.img, arm*_start.elf)

What is wrong?

I have to insert SD card into other PC and copy these files into /boot. But after I do dpkg --configure -a -> the same problem.

@popcornmix
Copy link
Collaborator

There was a problem with FAT corruption with kernels before 2nd August. You may be hitting that.
Can you start with the "2012-08-16-wheezy-raspbian.zip" image from download page?

Aleternatively download latest kernel.img from https://github.com/Hexxeh/rpi-firmware and boot from that before trying to upgrade firmware.

@xnosek00
Copy link
Author

I tried download latest kernel.img from Hexxeh and boot from it. I have the same problem. System is working before I do: dpgk --configure -a

Is there any other way? I don't want to start with the: "2012-08-16-wheezy-raspbian.zip". I want to repair my system.

Maybe this could help. I put my SD card into another computer and ran fsck.

gentoovbi root # fsck.vfat /dev/mmcblk0p1
dosfsck 3.0.9, 31 Jan 2010, FAT32, LFN
/kernel_emergency.img.dpkg-divert.tmp
File size is 10158080 bytes, cluster chain length is > 10158080 bytes.
Truncating file to 10158080 bytes.
Reclaimed 146 unused clusters (1196032 bytes).
Leaving file system unchanged.
/dev/mmcblk0p1: 7 files, 1356/7161 clusters

@xnosek00
Copy link
Author

I deleted kernel_emergency.img.dpkg-divert.tmp

and after that I ran fsck:

gentoovbi root # fsck /dev/mmcblk0p1
fsck na util-linux 2.20.1
dosfsck 3.0.9, 31 Jan 2010, FAT32, LFN
Reclaimed 233 unused clusters (1908736 bytes).
Leaving file system unchanged.
/dev/mmcblk0p1: 17 files, 4495/7161 clusters

gentoovbi nosek # fsck /dev/mmcblk0p2
fsck na util-linux 2.20.1
e2fsck 1.42 (29-Nov-2011)
/dev/mmcblk0p2: clean, 165204/491520 files, 1226921/1942272 blocks

then: dpgk --configure -a (with a kernel from Hexxeh) and my problem remains.

Btw.: before I put aptitude safe-upgrade, I did: rpi-update.

Latest firmware was in my Raspi before i ran aptitude safe-upgrade.

@asb
Copy link

asb commented Aug 29, 2012

Just to clarify the initial message - your problem is that the Raspberry Pi froze during the upgrade procedure? Could you look in /var/log/apt/term.log.* to see if anything at all was logged for this initial upgrade. I'll get back to you with some advice on fixing your dpkg database.

@xnosek00
Copy link
Author

Yes, you're right. My problem is, that the RasPI froze during the upgrade procedure.

I was looking in /va/log/apt/term.log

Preparing substitution raspberrypi-bootloader 1.20120714-1 ( …/raspberrypi-bootloader_1.20120810-1_armhf.deb) …
Adding diversion „diversion /boot/arm128_start.elf do /usr/share/rpikernelhack/arm128_start.elf by package rpikernelhack“
Adding diversion „diversion /boot/arm192_start.elf do /usr/share/rpikernelhack/arm192_start.elf by package rpikernelhack“
Adding diversion „diversion /boot/kernel.img do /usr/share/rpikernelhack/kernel.img by package rpikernelhack“
.
.
.
Unpacking raspberrypi-bootloader ...

.
.
.

Setting up package raspberrypi-bootloader (1.20120810-1) ...
Your current start.elf is the same as arm128_start.elf
Deleting „diversion /boot/arm128_start.elf into /usr/share/rpikernelhack/arm128_start.elf by package rpikernelhack“
Deleting „diversion /boot/arm192_start.elf into /usr/share/rpikernelhack/arm192_start.elf by package rpikernelhack“
.
.
.

@xnosek00
Copy link
Author

The problem is, that upgrade procedure deletes kernel and files in /boot.

After that RasPi freeze and can't boot after restart, because kernel and files in /boot are missing.

@xnosek00
Copy link
Author

xnosek00 commented Sep 2, 2012

Problem is solved. My SD card was bad and it broke my update. It is not a problem in aptitude update.

@xnosek00 xnosek00 closed this as completed Sep 2, 2012
popcornmix pushed a commit that referenced this issue Mar 28, 2013
commit 66efdc7 upstream.

snd_seq_timer_open() didn't catch the whole error path but let through
if the timer id is a slave.  This may lead to Oops by accessing the
uninitialized pointer.

 BUG: unable to handle kernel NULL pointer dereference at 00000000000002ae
 IP: [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130
 PGD 785cd067 PUD 76964067 PMD 0
 Oops: 0002 [#4] SMP
 CPU 0
 Pid: 4288, comm: trinity-child7 Tainted: G      D W 3.9.0-rc1+ #100 Bochs Bochs
 RIP: 0010:[<ffffffff819b3477>]  [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130
 RSP: 0018:ffff88006ece7d38  EFLAGS: 00010246
 RAX: 0000000000000286 RBX: ffff88007851b400 RCX: 0000000000000000
 RDX: 000000000000ffff RSI: ffff88006ece7d58 RDI: ffff88006ece7d38
 RBP: ffff88006ece7d98 R08: 000000000000000a R09: 000000000000fffe
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: ffff8800792c5400 R14: 0000000000e8f000 R15: 0000000000000007
 FS:  00007f7aaa650700(0000) GS:ffff88007f800000(0000) GS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000002ae CR3: 000000006efec000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process trinity-child7 (pid: 4288, threadinfo ffff88006ece6000, task ffff880076a8a290)
 Stack:
  0000000000000286 ffffffff828f2be0 ffff88006ece7d58 ffffffff810f354d
  65636e6575716573 2065756575712072 ffff8800792c0030 0000000000000000
  ffff88006ece7d98 ffff8800792c5400 ffff88007851b400 ffff8800792c5520
 Call Trace:
  [<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff819b17e9>] snd_seq_queue_timer_open+0x29/0x70
  [<ffffffff819ae01a>] snd_seq_ioctl_set_queue_timer+0xda/0x120
  [<ffffffff819acb9b>] snd_seq_do_ioctl+0x9b/0xd0
  [<ffffffff819acbe0>] snd_seq_ioctl+0x10/0x20
  [<ffffffff811b9542>] do_vfs_ioctl+0x522/0x570
  [<ffffffff8130a4b3>] ? file_has_perm+0x83/0xa0
  [<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff811b95ed>] sys_ioctl+0x5d/0xa0
  [<ffffffff813663fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81faed69>] system_call_fastpath+0x16/0x1b

Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Nov 4, 2013
enable_irq_wake() might fail, if so, we will see kernel warning in resume
entries due to it always calls disable_irq_wake().

  WARNING: at kernel/irq/manage.c:529 irq_set_irq_wake+0xc4/0xf0()
  Unbalanced IRQ 52 wake disable
  Modules linked in: ipv6 libcomposite configfs
  CPU: 0 PID: 1591 Comm: ash Tainted: G        W    3.10.0-00854-gdbd86d4-dirty #100
    (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
    (show_stack+0x10/0x14) from (warn_slowpath_common+0x54/0x68)
    (warn_slowpath_common+0x54/0x68) from (warn_slowpath_fmt+0x30/0x40)
    (warn_slowpath_fmt+0x30/0x40) from (irq_set_irq_wake+0xc4/0xf0)
    (irq_set_irq_wake+0xc4/0xf0) from (sirfsoc_rtc_restore+0x30/0x38)
    (sirfsoc_rtc_restore+0x30/0x38) from (platform_pm_restore+0x2c/0x50)
    (platform_pm_restore+0x2c/0x50) from (dpm_run_callback.clone.6+0x30/0xb0)
    (dpm_run_callback.clone.6+0x30/0xb0) from (device_resume+0x88/0x134)
    (device_resume+0x88/0x134) from (dpm_resume+0x114/0x230)
    (dpm_resume+0x114/0x230) from (hibernation_snapshot+0x178/0x1d0)
    (hibernation_snapshot+0x178/0x1d0) from (hibernate+0x130/0x1dc)
    (hibernate+0x130/0x1dc) from (state_store+0xb4/0xc0)
    (state_store+0xb4/0xc0) from (kobj_attr_store+0x14/0x20)
    (kobj_attr_store+0x14/0x20) from (sysfs_write_file+0xfc/0x17c)
    (sysfs_write_file+0xfc/0x17c) from (vfs_write+0xc8/0x194)
    (vfs_write+0xc8/0x194) from (SyS_write+0x40/0x6c)
    (SyS_write+0x40/0x6c) from (ret_fast_syscall+0x0/0x30)

To avoid unbalanced "IRQ wake disable", ensure that disable_irq_wake() is
called only when enable_irq_wake() have been successfully enabled.

Signed-off-by: Xianglong Du <Xianglong.Du@csr.com>
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
popcornmix pushed a commit that referenced this issue Jun 15, 2017
commit a008c31 upstream.

On a ppc64 machine executing overlayfs/019 with xfs as the lower and
upper filesystem causes the following call trace,

WARNING: CPU: 2 PID: 8034 at /root/repos/linux/fs/iomap.c:765 .iomap_dio_actor+0xcc/0x420
Modules linked in:
CPU: 2 PID: 8034 Comm: fsstress Tainted: G             L  4.11.0-rc5-next-20170405 #100
task: c000000631314880 task.stack: c0000003915d4000
NIP: c00000000035a72c LR: c00000000035a6f4 CTR: c00000000035a660
REGS: c0000003915d7570 TRAP: 0700   Tainted: G             L   (4.11.0-rc5-next-20170405)
MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>
  CR: 24004284  XER: 00000000
CFAR: c0000000006f7190 SOFTE: 1
GPR00: c00000000035a6f4 c0000003915d77f0 c0000000015a3f00 000000007c22f600
GPR04: 000000000022d000 0000000000002600 c0000003b2d56360 c0000003915d7960
GPR08: c0000003915d7cd0 0000000000000002 0000000000002600 c000000000521cc0
GPR12: 0000000024004284 c00000000fd80a00 000000004b04ae64 ffffffffffffffff
GPR16: 000000001000ca70 0000000000000000 c0000003b2d56380 c00000000153d2b8
GPR20: 0000000000000010 c0000003bc87bac8 0000000000223000 000000000022f5ff
GPR24: c0000003b2d56360 000000000000000c 0000000000002600 000000000022d000
GPR28: 0000000000000000 c0000003915d7960 c0000003b2d56360 00000000000001ff
NIP [c00000000035a72c] .iomap_dio_actor+0xcc/0x420
LR [c00000000035a6f4] .iomap_dio_actor+0x94/0x420
Call Trace:
[c0000003915d77f0] [c00000000035a6f4] .iomap_dio_actor+0x94/0x420 (unreliable)
[c0000003915d78f0] [c00000000035b9f4] .iomap_apply+0xf4/0x1f0
[c0000003915d79d0] [c00000000035c320] .iomap_dio_rw+0x230/0x420
[c0000003915d7ae0] [c000000000512a14] .xfs_file_dio_aio_read+0x84/0x160
[c0000003915d7b80] [c000000000512d24] .xfs_file_read_iter+0x104/0x130
[c0000003915d7c10] [c0000000002d6234] .__vfs_read+0x114/0x1a0
[c0000003915d7cf0] [c0000000002d7a8c] .vfs_read+0xac/0x1a0
[c0000003915d7d90] [c0000000002d96b8] .SyS_read+0x58/0x100
[c0000003915d7e30] [c00000000000b8e0] system_call+0x38/0xfc
Instruction dump:
78630020 7f831b78 7ffc07b4 7c7ce039 40820360 a13d0018 2f890003 419e0288
2f890004 419e00a0 2f890001 419e02a8 <0fe00000> 3b80fffb 38210100 7f83e378

The above problem can also be recreated on a regular xfs filesystem
using the command,

$ fsstress -d /mnt -l 1000 -n 1000 -p 1000

The reason for the call trace is,
1. When 'reserving' blocks for delayed allocation , XFS reserves more
   blocks (i.e. past file's current EOF) than required. This is done
   because XFS assumes that userspace might write more data and hence
   'reserving' more blocks might lead to the file's new data being
   stored contiguously on disk.
2. The in-memory 'struct xfs_bmbt_irec' mapping the file's last extent would
   then cover the prealloc-ed EOF blocks in addition to the regular blocks.
3. When flushing the dirty blocks to disk, we only flush data till the
   file's EOF. But before writing out the dirty data, we allocate blocks
   on the disk for holding the file's new data. This allocation includes
   the blocks that are part of the 'prealloc EOF blocks'.
4. Later, when the last reference to the inode is being closed, XFS frees the
   unused 'prealloc EOF blocks' in xfs_inactive().

In step 3 above, When allocating space on disk for the delayed allocation
range, the space allocator might sometimes allocate less blocks than
required. If such an allocation ends right at the current EOF of the
file, We will not be able to clear the "delayed allocation" flag for the
'prealloc EOF blocks', since we won't have dirty buffer heads associated
with that range of the file.

In such a situation if a Direct I/O read operation is performed on file
range [X, Y] (where X < EOF and Y > EOF), we flush dirty data in the
range [X, Y] and invalidate page cache for that range (Refer to
iomap_dio_rw()). Later for performing the Direct I/O read, XFS obtains
the extent items (which are still cached in memory) for the file
range. When doing so we are not supposed to get an extent item with
IOMAP_DELALLOC flag set, since the previous "flush" operation should
have converted any delayed allocation data in the range [X, Y]. Hence we
end up hitting a WARN_ON_ONCE(1) statement in iomap_dio_actor().

This commit fixes the bug by preventing the read operation from going
beyond iomap_dio->i_size.

Reported-by: Santhosh G <santhog4@linux.vnet.ibm.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@singpolyma
Copy link

I just had this same problem. I think the script to add the new diversions didn't run, but then finishing the upgrade removes the old ones... Hopefully I can just manually add them to my /boot again to get my system back up...

popcornmix pushed a commit that referenced this issue Oct 4, 2019
…vices fails

commit 8ccff28 upstream.

Before this commit dj_probe would exit with an error if the initial
logi_dj_recv_query_paired_devices fails. The initial call may fail
when the receiver is connected through a kvm and the focus is away.

When the call fails this causes 2 problems:

1) dj_probe calls logi_dj_recv_query_paired_devices after calling
hid_device_io_start() so a HID report may have been received in between
and our delayedwork_callback may be running. It seems that the initial
logi_dj_recv_query_paired_devices failure happening with some KVMs triggers
this exact scenario, causing the work-queue to run on free-ed memory,
leading to:

 BUG: unable to handle page fault for address: 0000000000001e88
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 257 Comm: kworker/3:3 Tainted: G           OE     5.3.0-rc5+ #100
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B150M Pro4S/D3, BIOS P7.10 12/06/2016
 Workqueue: events 0xffffffffc02ba200
 RIP: 0010:0xffffffffc02ba1bd
 Code: e8 e8 13 00 d8 48 89 c5 48 85 c0 74 4c 48 8b 7b 10 48 89 ea b9 07 00 00 00 41 b9 09 00 00 00 41 b8 01 00 00 00 be 10 00 00 00 <48> 8b 87 88 1e 00 00 48 8b 40 40 e8 b3 6b b4 d8 48 89 ef 41 89 c4
 RSP: 0018:ffffb760c046bdb8 EFLAGS: 00010286
 RAX: ffff935038ea4550 RBX: ffff935046778000 RCX: 0000000000000007
 RDX: ffff935038ea4550 RSI: 0000000000000010 RDI: 0000000000000000
 RBP: ffff935038ea4550 R08: 0000000000000001 R09: 0000000000000009
 R10: 000000000000e011 R11: 0000000000000001 R12: ffff9350467780e8
 R13: ffff935046778000 R14: 0000000000000000 R15: ffff935046778070
 FS:  0000000000000000(0000) GS:ffff935054e00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000001e88 CR3: 000000075a612002 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  0xffffffffc02ba2f7
  ? process_one_work+0x1b1/0x560
  process_one_work+0x234/0x560
  worker_thread+0x50/0x3b0
  kthread+0x10a/0x140
  ? process_one_work+0x560/0x560
  ? kthread_park+0x80/0x80
  ret_from_fork+0x3a/0x50
 Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep vfat fat btusb btrtl btbcm btintel bluetooth intel_rapl_msr ecdh_generic rfkill ecc snd_usb_audio snd_usbmidi_lib intel_rapl_common snd_rawmidi mc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support mei_wdt mei_hdcp ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_generic crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ghash_clmulni_intel intel_cstate snd_hda_intel snd_hda_codec intel_uncore snd_hda_core snd_hwdep intel_rapl_perf snd_seq snd_seq_device snd_pcm snd_timer intel_wmi_thunderbolt snd e1000e soundcore mxm_wmi i2c_i801 bfq mei_me mei intel_pch_thermal parport_pc parport acpi_pad binfmt_misc hid_lg_g15(E) hid_logitech_dj(E) i915 crc32c_intel i2c_algo_bit drm_kms_helper nvme nvme_core drm wmi video uas usb_storage i2c_dev
 CR2: 0000000000001e88
 ---[ end trace 1d3f8afdcfcbd842 ]---

2) Even if we were to fix 1. by making sure the work is stopped before
failing probe, failing probe is the wrong thing to do, we have
logi_dj_recv_queue_unknown_work to deal with the initial
logi_dj_recv_query_paired_devices failure.

Rather then error-ing out of the probe, causing the receiver to not work at
all we should rely on this, so that the attached devices will get properly
enumerated once the KVM focus is switched back.

Cc: stable@vger.kernel.org
Fixes: 74808f9 ("HID: logitech-dj: add support for non unifying receivers")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Oct 4, 2019
…vices fails

commit 8ccff28 upstream.

Before this commit dj_probe would exit with an error if the initial
logi_dj_recv_query_paired_devices fails. The initial call may fail
when the receiver is connected through a kvm and the focus is away.

When the call fails this causes 2 problems:

1) dj_probe calls logi_dj_recv_query_paired_devices after calling
hid_device_io_start() so a HID report may have been received in between
and our delayedwork_callback may be running. It seems that the initial
logi_dj_recv_query_paired_devices failure happening with some KVMs triggers
this exact scenario, causing the work-queue to run on free-ed memory,
leading to:

 BUG: unable to handle page fault for address: 0000000000001e88
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 257 Comm: kworker/3:3 Tainted: G           OE     5.3.0-rc5+ #100
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B150M Pro4S/D3, BIOS P7.10 12/06/2016
 Workqueue: events 0xffffffffc02ba200
 RIP: 0010:0xffffffffc02ba1bd
 Code: e8 e8 13 00 d8 48 89 c5 48 85 c0 74 4c 48 8b 7b 10 48 89 ea b9 07 00 00 00 41 b9 09 00 00 00 41 b8 01 00 00 00 be 10 00 00 00 <48> 8b 87 88 1e 00 00 48 8b 40 40 e8 b3 6b b4 d8 48 89 ef 41 89 c4
 RSP: 0018:ffffb760c046bdb8 EFLAGS: 00010286
 RAX: ffff935038ea4550 RBX: ffff935046778000 RCX: 0000000000000007
 RDX: ffff935038ea4550 RSI: 0000000000000010 RDI: 0000000000000000
 RBP: ffff935038ea4550 R08: 0000000000000001 R09: 0000000000000009
 R10: 000000000000e011 R11: 0000000000000001 R12: ffff9350467780e8
 R13: ffff935046778000 R14: 0000000000000000 R15: ffff935046778070
 FS:  0000000000000000(0000) GS:ffff935054e00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000001e88 CR3: 000000075a612002 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  0xffffffffc02ba2f7
  ? process_one_work+0x1b1/0x560
  process_one_work+0x234/0x560
  worker_thread+0x50/0x3b0
  kthread+0x10a/0x140
  ? process_one_work+0x560/0x560
  ? kthread_park+0x80/0x80
  ret_from_fork+0x3a/0x50
 Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep vfat fat btusb btrtl btbcm btintel bluetooth intel_rapl_msr ecdh_generic rfkill ecc snd_usb_audio snd_usbmidi_lib intel_rapl_common snd_rawmidi mc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support mei_wdt mei_hdcp ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_generic crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ghash_clmulni_intel intel_cstate snd_hda_intel snd_hda_codec intel_uncore snd_hda_core snd_hwdep intel_rapl_perf snd_seq snd_seq_device snd_pcm snd_timer intel_wmi_thunderbolt snd e1000e soundcore mxm_wmi i2c_i801 bfq mei_me mei intel_pch_thermal parport_pc parport acpi_pad binfmt_misc hid_lg_g15(E) hid_logitech_dj(E) i915 crc32c_intel i2c_algo_bit drm_kms_helper nvme nvme_core drm wmi video uas usb_storage i2c_dev
 CR2: 0000000000001e88
 ---[ end trace 1d3f8afdcfcbd842 ]---

2) Even if we were to fix 1. by making sure the work is stopped before
failing probe, failing probe is the wrong thing to do, we have
logi_dj_recv_queue_unknown_work to deal with the initial
logi_dj_recv_query_paired_devices failure.

Rather then error-ing out of the probe, causing the receiver to not work at
all we should rely on this, so that the attached devices will get properly
enumerated once the KVM focus is switched back.

Cc: stable@vger.kernel.org
Fixes: 74808f9 ("HID: logitech-dj: add support for non unifying receivers")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue May 19, 2021
[ Upstream commit cc7130b ]

The IOMMU table is divided into pools for concurrent mappings and each
pool has a separate spinlock. When taking the ownership of an IOMMU group
to pass through a device to a VM, we lock these spinlocks which triggers
a false negative warning in lockdep (below).

This fixes it by annotating the large pool's spinlock as a nest lock
which makes lockdep not complaining when locking nested locks if
the nest lock is locked already.

===
WARNING: possible recursive locking detected
5.11.0-le_syzkaller_a+fstn1 #100 Not tainted
--------------------------------------------
qemu-system-ppc/4129 is trying to acquire lock:
c0000000119bddb0 (&(p->lock)/1){....}-{2:2}, at: iommu_take_ownership+0xac/0x1e0

but task is already holding lock:
c0000000119bdd30 (&(p->lock)/1){....}-{2:2}, at: iommu_take_ownership+0xac/0x1e0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(p->lock)/1);
  lock(&(p->lock)/1);
===

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210301063653.51003-1-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue May 19, 2021
[ Upstream commit cc7130b ]

The IOMMU table is divided into pools for concurrent mappings and each
pool has a separate spinlock. When taking the ownership of an IOMMU group
to pass through a device to a VM, we lock these spinlocks which triggers
a false negative warning in lockdep (below).

This fixes it by annotating the large pool's spinlock as a nest lock
which makes lockdep not complaining when locking nested locks if
the nest lock is locked already.

===
WARNING: possible recursive locking detected
5.11.0-le_syzkaller_a+fstn1 #100 Not tainted
--------------------------------------------
qemu-system-ppc/4129 is trying to acquire lock:
c0000000119bddb0 (&(p->lock)/1){....}-{2:2}, at: iommu_take_ownership+0xac/0x1e0

but task is already holding lock:
c0000000119bdd30 (&(p->lock)/1){....}-{2:2}, at: iommu_take_ownership+0xac/0x1e0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(p->lock)/1);
  lock(&(p->lock)/1);
===

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210301063653.51003-1-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue May 19, 2021
[ Upstream commit cc7130b ]

The IOMMU table is divided into pools for concurrent mappings and each
pool has a separate spinlock. When taking the ownership of an IOMMU group
to pass through a device to a VM, we lock these spinlocks which triggers
a false negative warning in lockdep (below).

This fixes it by annotating the large pool's spinlock as a nest lock
which makes lockdep not complaining when locking nested locks if
the nest lock is locked already.

===
WARNING: possible recursive locking detected
5.11.0-le_syzkaller_a+fstn1 #100 Not tainted
--------------------------------------------
qemu-system-ppc/4129 is trying to acquire lock:
c0000000119bddb0 (&(p->lock)/1){....}-{2:2}, at: iommu_take_ownership+0xac/0x1e0

but task is already holding lock:
c0000000119bdd30 (&(p->lock)/1){....}-{2:2}, at: iommu_take_ownership+0xac/0x1e0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(p->lock)/1);
  lock(&(p->lock)/1);
===

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210301063653.51003-1-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
mripard pushed a commit to mripard/rpi-linux that referenced this issue Oct 27, 2021
With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we
get:

BUG: scheduling while atomic: swapper/1/0/0x00000000
no locks held by swapper/1/0.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ raspberrypi#100
Call Trace:
 dump_stack_lvl+0xac/0x108
 __schedule_bug+0xac/0xe0
 __schedule+0xcf8/0x10d0
 schedule_idle+0x3c/0x70
 do_idle+0x2d8/0x4a0
 cpu_startup_entry+0x38/0x40
 start_secondary+0x2ec/0x3a0
 start_secondary_prolog+0x10/0x14

This is because powerpc's arch_cpu_idle_dead() decrements the idle task's
preempt count, for reasons explained in commit a7c2bb8 ("powerpc:
Re-enable preemption before cpu_die()"), specifically "start_secondary()
expects a preempt_count() of 0."

However, since commit 2c669ef ("powerpc/preempt: Don't touch the idle
task's preempt_count during hotplug") and commit f1a0a37 ("sched/core:
Initialize the idle task with preemption disabled"), that justification no
longer holds.

The idle task isn't supposed to re-enable preemption, so remove the
vestigial preempt_enable() from the CPU offline path.

Tested with pseries and powernv in qemu, and pseries on PowerVM.

Fixes: 2c669ef ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015173902.2278118-1-nathanl@linux.ibm.com
popcornmix pushed a commit that referenced this issue Oct 28, 2021
[ Upstream commit 787252a ]

With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we
get:

BUG: scheduling while atomic: swapper/1/0/0x00000000
no locks held by swapper/1/0.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100
Call Trace:
 dump_stack_lvl+0xac/0x108
 __schedule_bug+0xac/0xe0
 __schedule+0xcf8/0x10d0
 schedule_idle+0x3c/0x70
 do_idle+0x2d8/0x4a0
 cpu_startup_entry+0x38/0x40
 start_secondary+0x2ec/0x3a0
 start_secondary_prolog+0x10/0x14

This is because powerpc's arch_cpu_idle_dead() decrements the idle task's
preempt count, for reasons explained in commit a7c2bb8 ("powerpc:
Re-enable preemption before cpu_die()"), specifically "start_secondary()
expects a preempt_count() of 0."

However, since commit 2c669ef ("powerpc/preempt: Don't touch the idle
task's preempt_count during hotplug") and commit f1a0a37 ("sched/core:
Initialize the idle task with preemption disabled"), that justification no
longer holds.

The idle task isn't supposed to re-enable preemption, so remove the
vestigial preempt_enable() from the CPU offline path.

Tested with pseries and powernv in qemu, and pseries on PowerVM.

Fixes: 2c669ef ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015173902.2278118-1-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
abihf pushed a commit to abihf/linux that referenced this issue Oct 29, 2021
[ Upstream commit 787252a ]

With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we
get:

BUG: scheduling while atomic: swapper/1/0/0x00000000
no locks held by swapper/1/0.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ raspberrypi#100
Call Trace:
 dump_stack_lvl+0xac/0x108
 __schedule_bug+0xac/0xe0
 __schedule+0xcf8/0x10d0
 schedule_idle+0x3c/0x70
 do_idle+0x2d8/0x4a0
 cpu_startup_entry+0x38/0x40
 start_secondary+0x2ec/0x3a0
 start_secondary_prolog+0x10/0x14

This is because powerpc's arch_cpu_idle_dead() decrements the idle task's
preempt count, for reasons explained in commit a7c2bb8 ("powerpc:
Re-enable preemption before cpu_die()"), specifically "start_secondary()
expects a preempt_count() of 0."

However, since commit 2c669ef ("powerpc/preempt: Don't touch the idle
task's preempt_count during hotplug") and commit f1a0a37 ("sched/core:
Initialize the idle task with preemption disabled"), that justification no
longer holds.

The idle task isn't supposed to re-enable preemption, so remove the
vestigial preempt_enable() from the CPU offline path.

Tested with pseries and powernv in qemu, and pseries on PowerVM.

Fixes: 2c669ef ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015173902.2278118-1-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
0lxb pushed a commit to 0lxb/rpi_linux that referenced this issue Jan 30, 2024
Fix and update semantics for ops.enable() and ops.disable()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants