Skip to content

RPi B+ No HDMI Output #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
173210 opened this issue May 21, 2016 · 5 comments
Closed

RPi B+ No HDMI Output #30

173210 opened this issue May 21, 2016 · 5 comments

Comments

@173210
Copy link

173210 commented May 21, 2016

I'm using 5bc0a41 with a DVI monitor (K212HQL)
Here is dmesg.
https://gist.github.com/173210/d6e287bd3c45307a37c081218e6a6c09

(snip)
[ 0.000000] Kernel command line: cma=128M@128M dma.dmachans=0x7f35 bcm2708_fb.fbwidth=1920 bcm2708_fb.fbheight=1080 bcm2708.boardrev=0x10 bcm2708.serial=0xa2ce3e8c st...
(snip)
[    0.879274] [drm] Initialized drm 1.1.0 20060810
[    0.886191] [drm:vc4_hdmi_bind] *ERROR* Failed to get pixel clock
[    0.892419] vc4-drm soc:gpu: failed to bind 20902000.hdmi (ops vc4_hdmi_ops): -517
[    0.900297] vc4-drm soc:gpu: master bind failed: -517
(snip)
[    1.549134] bcm2708_i2c 20805000.i2c: BSC2 Controller at 0x20805000 (irq 77) (baudrate 100000)
(snip)
[    1.765351] vc4-drm soc:gpu: bound 20902000.hdmi (ops vc4_hdmi_ops)
[    1.771939] vc4-drm soc:gpu: bound 20206000.pixelvalve (ops vc4_crtc_ops)
(snip)
[    1.794617] vc4-drm soc:gpu: bound 20207000.pixelvalve (ops vc4_crtc_ops)
[    1.801694] vc4-drm soc:gpu: bound 20807000.pixelvalve (ops vc4_crtc_ops)
[    1.823146] vc4-drm soc:gpu: bound 20400000.hvs (ops vc4_hvs_ops)
[    1.843225] vc4-drm soc:gpu: bound 20c00000.v3d (ops vc4_v3d_ops)
[    1.850835] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.873077] [drm] No driver support for vblank timestamp query.
[    1.879278] vc4-drm soc:gpu: No connectors reported connected with modes
[    1.888631] [drm] Cannot find any crtc or sizes - going 1024x768
[    1.925583] Console: switching to colour frame buffer device 128x48
[    1.955738] vc4-drm soc:gpu: fb0:  frame buffer device

So it's failing to detect the monitor?

@173210
Copy link
Author

173210 commented May 21, 2016

Here is vcdbg log msg.

001039.767: HDMI:EDID version 1.3, 0 extensions, screen size 46x26 cm
001039.831: HDMI:EDID features - videodef 0x80 !standby !suspend active off; colour encoding:RGB444|YCbCr422; sRGB is not default colourspace; preferred format is nativF
001039.893: HDMI:EDID found preferred CEA detail timing format: 1920x1080p @ 60 Hz (16)
001039.932: HDMI:EDID found DMT format: code 4, 640x480p @ 60 Hz in established timing I/II
001039.972: HDMI:EDID found DMT format: code 8, 800x600p @ 56 Hz in established timing I/II
001040.009: HDMI:EDID found DMT format: code 9, 800x600p @ 60 Hz in established timing I/II
001040.049: HDMI:EDID found DMT format: code 16, 1024x768p @ 60 Hz in established timing I/II
001040.086: HDMI:EDID found DMT format: code 17, 1024x768p @ 70 Hz in established timing I/II
001040.179: HDMI:EDID standard timings block x 8: 0xD1C0 B300 9500 8180 8100 81C0 714F 0101 
001040.243: HDMI:EDID found DMT format: code 82, 1920x1080p @ 60 Hz (16:9) in standard timing 0
001040.295: HDMI:EDID found DMT format: code 58, 1680x1050p @ 60 Hz (16:10) in standard timing 1
001040.346: HDMI:EDID found DMT format: code 47, 1440x900p @ 60 Hz (16:10) in standard timing 2
001040.398: HDMI:EDID found DMT format: code 35, 1280x1024p @ 60 Hz (5:4) in standard timing 3
001040.446: HDMI:EDID found DMT format: code 28, 1280x800p @ 60 Hz (16:10) in standard timing 4
001040.499: HDMI:EDID found DMT format: code 85, 1280x720p @ 60 Hz (16:9) in standard timing 5
001040.548: HDMI:EDID found DMT format: code 21, 1152x864p @ 75 Hz (4:3) in standard timing 6
001040.582: HDMI:EDID moving support for CEA mode 16 to DMT mode 82 because sink does not support HDMI
001040.615: HDMI:EDID filtering formats with pixel clock > 162 MHz or h. blanking > 1023
001040.751: HDMI:EDID preferred mode is updated to DMT (82) 1920x1080p @ 60 Hz with pixel clock 148500000 Hz
001052.117: HDMI:Setting property pixel encoding to Default
001052.135: HDMI:Setting property pixel clock type to PAL
001052.156: HDMI:Setting property content type flag to No data
001052.175: HDMI:Setting property fuzzy format match to enabled
001232.327: hdmi: HDMI:>>>>>>>>>>>>>Rx sensed, reading EDID<<<<<<<<<<<<<
001244.463: hdmi: HDMI:EDID version 1.3, 0 extensions, screen size 46x26 cm
001244.533: hdmi: HDMI:EDID features - videodef 0x80 !standby !suspend active off; colour encoding:RGB444|YCbCr422; sRGB is not default colourspace; preferred format isF
001244.597: hdmi: HDMI:EDID found preferred CEA detail timing format: 1920x1080p @ 60 Hz (16)
001244.643: hdmi: HDMI:EDID found DMT format: code 4, 640x480p @ 60 Hz in established timing I/II
001244.683: hdmi: HDMI:EDID found DMT format: code 8, 800x600p @ 56 Hz in established timing I/II
001244.725: hdmi: HDMI:EDID found DMT format: code 9, 800x600p @ 60 Hz in established timing I/II
001244.768: hdmi: HDMI:EDID found DMT format: code 16, 1024x768p @ 60 Hz in established timing I/II
001244.811: hdmi: HDMI:EDID found DMT format: code 17, 1024x768p @ 70 Hz in established timing I/II
001244.904: hdmi: HDMI:EDID standard timings block x 8: 0xD1C0 B300 9500 8180 8100 81C0 714F 0101 
001244.975: hdmi: HDMI:EDID found DMT format: code 82, 1920x1080p @ 60 Hz (16:9) in standard timing 0
001245.031: hdmi: HDMI:EDID found DMT format: code 58, 1680x1050p @ 60 Hz (16:10) in standard timing 1
001245.084: hdmi: HDMI:EDID found DMT format: code 47, 1440x900p @ 60 Hz (16:10) in standard timing 2
001245.138: hdmi: HDMI:EDID found DMT format: code 35, 1280x1024p @ 60 Hz (5:4) in standard timing 3
001245.207: hdmi: HDMI:EDID found DMT format: code 28, 1280x800p @ 60 Hz (16:10) in standard timing 4
001245.264: hdmi: HDMI:EDID found DMT format: code 85, 1280x720p @ 60 Hz (16:9) in standard timing 5
001245.315: hdmi: HDMI:EDID found DMT format: code 21, 1152x864p @ 75 Hz (4:3) in standard timing 6
001245.350: hdmi: HDMI:EDID moving support for CEA mode 16 to DMT mode 82 because sink does not support HDMI
001245.389: hdmi: HDMI:EDID filtering formats with pixel clock > 162 MHz or h. blanking > 1023
001245.530: hdmi: HDMI:EDID preferred mode is updated to DMT (82) 1920x1080p @ 60 Hz with pixel clock 148500000 Hz
001245.549: hdmi: HDMI: hotplug attached with DVI support
001245.616: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
001248.078: hdmi: HDMI: power_on to DMT mode 1080p60
001251.253: hdmi: HDMI: Action callback added to queue to happen at frame 0
001268.200: *** Restart logging
001270.492: Read command line from file 'cmdline.txt'
earlyprintk console=tty0 console=ttyAMA0 root=/dev/mmcblk0p2 rootwait
001542.022: Loading 'kernel.img' to 0x8000 size 0x3e1a74
001545.664: Kernel trailer DTOK property says yes
001545.677: Kernel trailer DDTK property says no
001551.468: Loading 'bcm2835-rpi-b-plus.dtb' to 0x3e9a74 size 0x3143
001595.187: dtparam: pwr_led_gpio=35
001601.218: dtparam: uart0_clkrate=3000000
001636.581: Loaded overlay 'vc4-kms-v3d'
001636.598: dtparam: cma-128=true
001713.462: dtparam: arm_freq=700000000
001715.529: Unknown dtparam 'arm_freq' - ignored
001715.565: dtparam: core_freq=250000000
001722.785: dtparam: cache_line_size=32
001737.980: Device tree loaded to 0x1bfec900 (size 0x36c7)
001739.236: gpioman: gpioman_get_pin_num: pin SDCARD_CONTROL_POWER not defined
003550.232: vchiq_core: vchiq_init_state: slot_zero = 0x48080000, is_master = 1
003554.292: hdmi: HDMI:hdmi_get_state is deprecated, use hdmi_get_display_state instead
003559.378: TV service:host side not connected, dropping notification 0x00000002, 0x00000002, 0x00000052

@173210
Copy link
Author

173210 commented May 22, 2016

Added hdmi_ignore_edid=0xa5000080 to config.txt and that made some progress.
https://gist.github.com/173210/d6e287bd3c45307a37c081218e6a6c09

[    1.849413] [drm:drm_minor_register] 
[    1.850035] [drm:drm_minor_register] new minor registered 64
[    1.850063] [drm:drm_minor_register] 
[    1.850076] [drm:drm_minor_register] 
[    1.850536] [drm:drm_minor_register] new minor registered 0
[    1.850739] [drm:drm_sysfs_connector_add] adding "HDMI-A-1" to sysfs
[    1.850762] [drm:drm_sysfs_hotplug_event] generating hotplug event
[    1.850855] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.863089] [drm] No driver support for vblank timestamp query.
[    1.869203] [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:19:HDMI-A-1]
[    1.869238] [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:19:HDMI-A-1] status updated from 3 to 2
[    1.869258] [drm:drm_helper_probe_single_connector_modes_merge_bits] [CONNECTOR:19:HDMI-A-1] disconnected
[    1.869287] vc4-drm soc:gpu: No connectors reported connected with modes

@173210
Copy link
Author

173210 commented May 22, 2016

The log had appeared only once. It's failing again. :/

@173210
Copy link
Author

173210 commented May 26, 2016

I got it work by following the Wiki, so please close this issue if you wontfix or you think it's invalid.

@anholt
Copy link
Owner

anholt commented May 31, 2016

OK, that log just looks like a custom kernel where you missed the power domain driver. Closing.

@anholt anholt closed this as completed May 31, 2016
anholt pushed a commit that referenced this issue Feb 28, 2017
Since commit 5d47ec0 ("firmware: Correct handling of
fw_state_wait() return value") fw_load_abort() could be called twice and
lead us to a kernel crash. This happens only when the firmware fallback
mechanism (regular or custom) is used. The fallback mechanism exposes a
sysfs interface for userspace to upload a file and notify the kernel
when the file is loaded and ready, or to cancel an upload by echo'ing -1
into on the loading file:

echo -n "-1" > /sys/$DEVPATH/loading

This will call fw_load_abort(). Some distributions actually have a udev
rule in place to *always* immediately cancel all firmware fallback
mechanism requests (Debian), they have:

  $ cat /lib/udev/rules.d/50-firmware.rules
  # stub for immediately telling the kernel that userspace firmware loading
  # failed; necessary to avoid long timeouts with CONFIG_FW_LOADER_USER_HELPER=y
  SUBSYSTEM=="firmware", ACTION=="add", ATTR{loading}="-1

Distributions with this udev rule would run into this crash only if the
fallback mechanism is used. Since most distributions disable by default
using the fallback mechanism (CONFIG_FW_LOADER_USER_HELPER_FALLBACK),
this would typicaly mean only 2 drivers which *require* the fallback
mechanism could typically incur a crash: drivers/firmware/dell_rbu.c and
the drivers/leds/leds-lp55xx-common.c driver. Distributions enabling
CONFIG_FW_LOADER_USER_HELPER_FALLBACK by default are obviously more
exposed to this crash.

The crash happens because after commit 5b02962 ("firmware: do not
use fw_lock for fw_state protection") and subsequent fix commit
5d47ec0 ("firmware: Correct handling of fw_state_wait() return
value") a race can happen between this cancelation and the firmware
fw_state_wait_timeout() being woken up after a state change with which
fw_load_abort() as that calls swake_up(). Upon error
fw_state_wait_timeout() will also again call fw_load_abort() and trigger
a null reference.

At first glance we could just fix this with a !buf check on
fw_load_abort() before accessing buf->fw_st, however there is a logical
issue in having a state machine used for the fallback mechanism and
preventing access from it once we abort as its inside the buf
(buf->fw_st).

The firmware_class.c code is setting the buf to NULL to annotate an
abort has occurred. Replace this mechanism by simply using the state
check instead. All the other code in place already uses similar checks
for aborting as well so no further changes are needed.

An oops can be reproduced with the new fw_fallback.sh fallback mechanism
cancellation test. Either cancelling the fallback mechanism or the
custom fallback mechanism triggers a crash.

mcgrof@piggy ~/linux-next/tools/testing/selftests/firmware
(git::20170111-fw-fixes)$ sudo ./fw_fallback.sh

./fw_fallback.sh: timeout works
./fw_fallback.sh: firmware comparison works
./fw_fallback.sh: fallback mechanism works

[ this then sits here when it is trying the cancellation test ]

Kernel log:

test_firmware: loading 'nope-test-firmware.bin'
misc test_firmware: Direct firmware load for nope-test-firmware.bin failed with error -2
misc test_firmware: Falling back to user helper
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: _request_firmware+0xa27/0xad0
PGD 0

Oops: 0000 [#1] SMP
Modules linked in: test_firmware(E) ... etc ...
CPU: 1 PID: 1396 Comm: fw_fallback.sh Tainted: G        W E   4.10.0-rc3-next-20170111+ #30
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
task: ffff9740b27f4340 task.stack: ffffbb15c0bc8000
RIP: 0010:_request_firmware+0xa27/0xad0
RSP: 0018:ffffbb15c0bcbd10 EFLAGS: 00010246
RAX: 00000000fffffffe RBX: ffff9740afe5aa80 RCX: 0000000000000000
RDX: ffff9740b27f4340 RSI: 0000000000000283 RDI: 0000000000000000
RBP: ffffbb15c0bcbd90 R08: ffffbb15c0bcbcd8 R09: 0000000000000000
R10: 0000000894a0d4b1 R11: 000000000000008c R12: ffffffffc0312480
R13: 0000000000000005 R14: ffff9740b1c32400 R15: 00000000000003e8
FS:  00007f8604422700(0000) GS:ffff9740bfc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 000000012164c000 CR4: 00000000000006e0
Call Trace:
 request_firmware+0x37/0x50
 trigger_request_store+0x79/0xd0 [test_firmware]
 dev_attr_store+0x18/0x30
 sysfs_kf_write+0x37/0x40
 kernfs_fop_write+0x110/0x1a0
 __vfs_write+0x37/0x160
 ? _cond_resched+0x1a/0x50
 vfs_write+0xb5/0x1a0
 SyS_write+0x55/0xc0
 ? trace_do_page_fault+0x37/0xd0
 entry_SYSCALL_64_fastpath+0x1e/0xad
RIP: 0033:0x7f8603f49620
RSP: 002b:00007fff6287b788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055c307b110a0 RCX: 00007f8603f49620
RDX: 0000000000000016 RSI: 000055c3084d8a90 RDI: 0000000000000001
RBP: 0000000000000016 R08: 000000000000c0ff R09: 000055c3084d6336
R10: 000055c307b108b0 R11: 0000000000000246 R12: 000055c307b13c80
R13: 000055c3084d6320 R14: 0000000000000000 R15: 00007fff6287b950
Code: 9f 64 84 e8 9c 61 fe ff b8 f4 ff ff ff e9 6b f9 ff
ff 48 c7 c7 40 6b 8d 84 89 45 a8 e8 43 84 18 00 49 8b be 00 03 00 00 8b
45 a8 <83> 7f 38 02 74 08 e8 6e ec ff ff 8b 45 a8 49 c7 86 00 03 00 00
RIP: _request_firmware+0xa27/0xad0 RSP: ffffbb15c0bcbd10
CR2: 0000000000000038
---[ end trace 6d94ac339c133e6f ]---

Fixes: 5d47ec0 ("firmware: Correct handling of fw_state_wait() return value")
Reported-and-Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reported-and-Tested-by: Patrick Bruenn <p.bruenn@beckhoff.com>
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
CC: <stable@vger.kernel.org>    [3.10+]
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anholt pushed a commit that referenced this issue May 16, 2017
smatch says:

  WARNING: please, no spaces at the start of a line
  #30: FILE: lib/zlib_inflate/inftrees.c:112:
  +    for (min = 1; min < MAXBITS; min++)$

  total: 0 errors, 1 warnings, 8 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/zlib-inflate-fix-potential-buffer-overflow.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
anholt pushed a commit that referenced this issue Jun 21, 2017
[ Upstream commit 15a77c6 ]

With >=32 CPUs the userfaultfd selftest triggered a graceful but
unexpected SIGBUS because VM_FAULT_RETRY was returned by
handle_userfault() despite the UFFDIO_COPY wasn't completed.

This seems caused by rwsem waking the thread blocked in
handle_userfault() and we can't run up_read() before the wait_event
sequence is complete.

Keeping the wait_even sequence identical to the first one, would require
running userfaultfd_must_wait() again to know if the loop should be
repeated, and it would also require retaking the rwsem and revalidating
the whole vma status.

It seems simpler to wait the targeted wakeup so that if false wakeups
materialize we still wait for our specific wakeup event, unless of
course there are signals or the uffd was released.

Debug code collecting the stack trace of the wakeup showed this:

  $ ./userfaultfd 100 99999
  nr_pages: 25600, nr_pages_per_cpu: 800
  bounces: 99998, mode: racing ver poll, userfaults: 32 35 90 232 30 138 69 82 34 30 139 40 40 31 20 19 43 13 15 28 27 38 21 43 56 22 1 17 31 8 4 2
  bounces: 99997, mode: rnd ver poll, Bus error (core dumped)

    save_stack_trace+0x2b/0x50
    try_to_wake_up+0x2a6/0x580
    wake_up_q+0x32/0x70
    rwsem_wake+0xe0/0x120
    call_rwsem_wake+0x1b/0x30
    up_write+0x3b/0x40
    vm_mmap_pgoff+0x9c/0xc0
    SyS_mmap_pgoff+0x1a9/0x240
    SyS_mmap+0x22/0x30
    entry_SYSCALL_64_fastpath+0x1f/0xbd
    0xffffffffffffffff
    FAULT_FLAG_ALLOW_RETRY missing 70
  CPU: 24 PID: 1054 Comm: userfaultfd Tainted: G        W       4.8.0+ #30
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
  Call Trace:
    dump_stack+0xb8/0x112
    handle_userfault+0x572/0x650
    handle_mm_fault+0x12cb/0x1520
    __do_page_fault+0x175/0x500
    trace_do_page_fault+0x61/0x270
    do_async_page_fault+0x19/0x90
    async_page_fault+0x25/0x30

This always happens when the main userfault selftest thread is running
clone() while glibc runs either mprotect or mmap (both taking mmap_sem
down_write()) to allocate the thread stack of the background threads,
while locking/userfault threads already run at full throttle and are
susceptible to false wakeups that may cause handle_userfault() to return
before than expected (which results in graceful SIGBUS at the next
attempt).

This was reproduced only with >=32 CPUs because the loop to start the
thread where clone() is too quick with fewer CPUs, while with 32 CPUs
there's already significant activity on ~32 locking and userfault
threads when the last background threads are started with clone().

This >=32 CPUs SMP race condition is likely reproducible only with the
selftest because of the much heavier userfault load it generates if
compared to real apps.

We'll have to allow "one more" VM_FAULT_RETRY for the WP support and a
patch floating around that provides it also hidden this problem but in
reality only is successfully at hiding the problem.

False wakeups could still happen again the second time
handle_userfault() is invoked, even if it's a so rare race condition
that getting false wakeups twice in a row is impossible to reproduce.
This full fix is needed for correctness, the only alternative would be
to allow VM_FAULT_RETRY to be returned infinitely.  With this fix the WP
support can stick to a strict "one more" VM_FAULT_RETRY logic (no need
of returning it infinite times to avoid the SIGBUS).

Link: http://lkml.kernel.org/r/20170111005535.13832-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Shubham Kumar Sharma <shubham.kumar.sharma@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anholt pushed a commit that referenced this issue Jul 19, 2017
generic/361 reports below warning, this is because: once, there is
someone entering into critical region of sbi.cp_lock, if write_end_io.
f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
deadlock.

So this patch changes to use spin_{,un}lock_irq{save,restore} to create
critical region without IRQ enabled to avoid potential deadlock.

 irq event stamp: 83391573
 loop: Write error at byte offset 438729728, length 1024.
 hardirqs last  enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
 hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
 loop: Write error at byte offset 438860288, length 1536.
 softirqs last  enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
 loop: Write error at byte offset 438990848, length 2048.
 ================================
 WARNING: inconsistent lock state
 4.12.0-rc2+ #30 Tainted: G           O
 --------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
  (&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
 {HARDIRQ-ON-W} state was registered at:
   __lock_acquire+0x527/0x7b0
   lock_acquire+0xae/0x220
   _raw_spin_lock+0x42/0x50
   do_checkpoint+0x165/0x9e0 [f2fs]
   write_checkpoint+0x33f/0x740 [f2fs]
   __f2fs_sync_fs+0x92/0x1f0 [f2fs]
   f2fs_sync_fs+0x12/0x20 [f2fs]
   sync_filesystem+0x67/0x80
   generic_shutdown_super+0x27/0x100
   kill_block_super+0x22/0x50
   kill_f2fs_super+0x3a/0x40 [f2fs]
   deactivate_locked_super+0x3d/0x70
   deactivate_super+0x40/0x60
   cleanup_mnt+0x39/0x70
   __cleanup_mnt+0x10/0x20
   task_work_run+0x69/0x80
   exit_to_usermode_loop+0x57/0x85
   do_fast_syscall_32+0x18c/0x1b0
   entry_SYSENTER_32+0x4c/0x7b
 irq event stamp: 1957420
 hardirqs last  enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
 hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
 softirqs last  enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
 softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&sbi->cp_lock)->rlock);
   <Interrupt>
     lock(&(&sbi->cp_lock)->rlock);

  *** DEADLOCK ***

 2 locks held by xfs_io/7959:
  #0:  (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
  #1:  (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]

 stack backtrace:
 CPU: 2 PID: 7959 Comm: xfs_io Tainted: G           O    4.12.0-rc2+ #30
 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
 Call Trace:
  dump_stack+0x5f/0x92
  print_usage_bug+0x1d3/0x1dd
  ? check_usage_backwards+0xe0/0xe0
  mark_lock+0x23d/0x280
  __lock_acquire+0x699/0x7b0
  ? __this_cpu_preempt_check+0xf/0x20
  ? trace_hardirqs_off_caller+0x91/0xe0
  lock_acquire+0xae/0x220
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  _raw_spin_lock+0x42/0x50
  ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
  f2fs_write_end_io+0x147/0x150 [f2fs]
  bio_endio+0x7a/0x1e0
  blk_update_request+0xad/0x410
  blk_mq_end_request+0x16/0x60
  lo_complete_rq+0x3c/0x70
  __blk_mq_complete_request_remote+0x11/0x20
  flush_smp_call_function_queue+0x6d/0x120
  ? debug_smp_processor_id+0x12/0x20
  generic_smp_call_function_single_interrupt+0x12/0x30
  smp_call_function_single_interrupt+0x25/0x40
  call_function_single_interrupt+0x37/0x3c
 EIP: _raw_spin_unlock_irq+0x2d/0x50
 EFLAGS: 00000296 CPU: 2
 EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
 ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  ? inherit_task_group.isra.98.part.99+0x6b/0xb0
  __add_to_page_cache_locked+0x1d4/0x290
  add_to_page_cache_lru+0x38/0xb0
  pagecache_get_page+0x8e/0x200
  f2fs_write_begin+0x96/0xf00 [f2fs]
  ? trace_hardirqs_on_caller+0xdd/0x1c0
  ? current_time+0x17/0x50
  ? trace_hardirqs_on+0xb/0x10
  generic_perform_write+0xa9/0x170
  __generic_file_write_iter+0x1a2/0x1f0
  ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
  f2fs_file_write_iter+0x6e/0x140 [f2fs]
  ? __lock_acquire+0x429/0x7b0
  __vfs_write+0xc1/0x140
  vfs_write+0x9b/0x190
  SyS_pwrite64+0x63/0xa0
  do_fast_syscall_32+0xa1/0x1b0
  entry_SYSENTER_32+0x4c/0x7b
 EIP: 0xb7786c61
 EFLAGS: 00000293 CPU: 2
 EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
 ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Fixes: aaec2b1 ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
Cc: stable@vger.kernel.org
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
lategoodbye pushed a commit that referenced this issue Oct 10, 2019
…robe

Steven reported that a test triggered:

==================================================================
 BUG: KASAN: slab-out-of-bounds in trace_kprobe_create+0xa9e/0xe40
 Read of size 8 at addr ffff8880c4f25a48 by task ftracetest/4798

 CPU: 2 PID: 4798 Comm: ftracetest Not tainted 5.3.0-rc6-test+ #30
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 Call Trace:
  dump_stack+0x7c/0xc0
  ? trace_kprobe_create+0xa9e/0xe40
  print_address_description+0x6c/0x332
  ? trace_kprobe_create+0xa9e/0xe40
  ? trace_kprobe_create+0xa9e/0xe40
  __kasan_report.cold.6+0x1a/0x3b
  ? trace_kprobe_create+0xa9e/0xe40
  kasan_report+0xe/0x12
  trace_kprobe_create+0xa9e/0xe40
  ? print_kprobe_event+0x280/0x280
  ? match_held_lock+0x1b/0x240
  ? find_held_lock+0xac/0xd0
  ? fs_reclaim_release.part.112+0x5/0x20
  ? lock_downgrade+0x350/0x350
  ? kasan_unpoison_shadow+0x30/0x40
  ? __kasan_kmalloc.constprop.6+0xc1/0xd0
  ? trace_kprobe_create+0xe40/0xe40
  ? trace_kprobe_create+0xe40/0xe40
  create_or_delete_trace_kprobe+0x2e/0x60
  trace_run_command+0xc3/0xe0
  ? trace_panic_handler+0x20/0x20
  ? kasan_unpoison_shadow+0x30/0x40
  trace_parse_run_command+0xdc/0x163
  vfs_write+0xe1/0x240
  ksys_write+0xba/0x150
  ? __ia32_sys_read+0x50/0x50
  ? tracer_hardirqs_on+0x61/0x180
  ? trace_hardirqs_off_caller+0x43/0x110
  ? mark_held_locks+0x29/0xa0
  ? do_syscall_64+0x14/0x260
  do_syscall_64+0x68/0x260

Fix to check the difference of nr_args before adding probe
on existing probes. This also may set the error log index
bigger than the number of command parameters. In that case
it sets the error position is next to the last parameter.

Link: http://lkml.kernel.org/r/156966474783.3478.13217501608215769150.stgit@devnote2

Fixes: ca89bc0 ("tracing/kprobe: Add multi-probe per event support")
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants